Journal of Computational and Applied Mathematics

(1)

Contents lists available atSciVerse ScienceDirect

Journal of Computational and Applied Mathematics

journal homepage:www.elsevier.com/locate/cam

Solving large-scale continuous-time algebraic Riccati equations by doubling

Tiexiang Li

^a

, Eric King-wah Chu

^b,^∗

, Wen-Wei Lin

^c

, Peter Chang-Yi Weng

^b

aDepartment of Mathematics, Southeast University, Nanjing, 211189, People’s Republic of China

bSchool of Mathematical Sciences, Building 28, Monash University, VIC 3800, Australia

cDepartment of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan

a r t i c l e i n f o

Article history:

Received 21 July 2011

Received in revised form 22 May 2012 MSC:

15A24 65F50 93C05 Keywords:

Continuous-time algebraic Riccati equation Doubling algorithm

Krylov subspace Large-scale problem

a b s t r a c t

We consider the solution of large-scale algebraic Riccati equations with numerically low- ranked solutions. For the discrete-time case, the structure-preserving doubling algorithm has been adapted, with the iterates forAnot explicitly computed but in the recursive formAk=A²_k₋₁−D⁽_k¹⁾S_k⁻¹[D⁽_k²⁾]^⊤, withD⁽_k¹⁾ and D⁽_k²⁾ being low-ranked and S_k⁻¹ being small in dimension. For the continuous-time case, the algebraic Riccati equation will be first treated with the Cayley transform before doubling is applied. Withnbeing the dimension of the algebraic equations, the resulting algorithms are of an efficientO(ⁿ) computational complexity per iteration, without the need for any inner iterations, and essentially converge quadratically. Some numerical results will be presented. For instance in Section 5.2, Example 3, of dimensionn=20 209 with 204 million variables in the solutionX, was solved using MATLAB on a MacBook Pro within 45 s to a machine accuracy ofO(¹⁰⁻¹⁶)^.

1. Large-scale algebraic Riccati equations

Let the system matrixAbe large and sparse, possibly with band structures. The discrete-time algebraic Riccati equation (DARE):

D

(

^X

) ≡ −

X

+

A^⊤X

(

^I

+

GX

)

⁻¹^A

+

H

=

0

,

^(1a) and the continuous-time algebraic Riccati equation (CARE):

C

(

^X

) ≡

A^⊤X

+

XA

−

XGX

+

H

=

0

,

^(1b) with the low-ranked

G

=

BR⁻¹B^⊤

,

^H

=

CT⁻¹C^⊤

,

^(1c)

whereB

∈

_Rⁿ^×^m

,

^C

∈

_Rⁿ^×^landm

,

^l

≪

n, arise often in linear–quadratic optimal control problems [1,2].

The solution of CAREs and DAREs has been an extremely active area of research; see, e.g., [3,1,2]. The usual solution methods such as the Schur vector method, symplectic SR methods, the matrix sign function, the matrix disk function or the

∗Corresponding author. Tel.: +61 3 99054480; fax: +61 3 99054403.

E-mail addresses:[email protected](T. Li),[email protected](E.K.-w. Chu),[email protected](W.-W. Lin),[email protected] (P.C.-Y. Weng).

doi:10.1016/j.cam.2012.06.006

(2)

doubling method have not made (full) use of the sparsity and structure inA

,

^G^andH. Requiring in generalO

(

ⁿ³

)

^{flops and} workspace of sizeO

(

ⁿ²

)

, these methods are obviously inappropriate for the large-scale problems we are interested in here.

For control problems for parabolic PDEs and the balancing based model order reduction of large linear systems, large- scale CAREs and DAREs have to be solved [4–9]. As stated in [10,11], ‘‘the basic observation on which all methods for solving such kinds of matrix equations are based, is that often the (numerical) rank of the solution is very small compared to its actual dimension and therefore it allows for a good approximation via low rank solution factors’’. Importantly, without solving the corresponding algebraic Riccati equations, alternative solutions to the optimal control problem require the deflating subspace of the corresponding Hamiltonian matrices or (generalized) symplectic pencils which are prohibitively expensive to compute.

Benner, Fassbender and Saak have done much on large-scale algebraic Riccati equations; see [10–13] and the references therein. They built their methods on (inexact) Newton’s methods with inner iterations for the associated Lyapunov and Stein equations. We shall adapt the structure-preserving doubling algorithm (SDA) [14–16], making use of the sparsity inAand the low-ranked structures inGandH. For other applications of the SDA, see [17].

2. Structure-preserving doubling algorithm for DAREs

We shall abbreviate the discussion for DAREs; please consult [18] for details.

The structure-preserving doubling algorithm (SDA) [15], assuming

(

^I

+

GH

)

⁻¹exists, has the following form:







G

←

G

+

A

(

^I

+

GH

)

⁻¹^GA^⊤

,

H

←

H

+

A^⊤H

(

^I

+

GH

)

⁻¹^A

,

A

←

A

(

^I

+

GH

)

⁻¹^A

.

(2)

We shall apply the Sherman–Morrison–Woodbury formula (SMWF) to

(

^I

+

GH

)

⁻¹and make use of the low-ranked forms ofGandHin(1c).

2.1. Large-scale SDA

From the first glance, the iteration forAin the SDA in(2)appears doomed, withO

(

ⁿ³

)

operations for the products of full matrices. However, with the low rank form in(1c), we shall organize the SDA into the form: (fork

=

1

,

²

, . . .

⁾







A_k

=

A²_k₋₁

−

D⁽_k¹⁾S_k⁻¹

 D⁽_k²⁾

⊤

,

G_k

=

B_kR⁻_k¹B^⊤_k

,

H_k

=

C_kT_k⁻¹C_k^⊤

.

(3)

The application of the SMWF on

(

^In

+

G_kH_k

)

⁻¹^yields A_k+1

=

A_k

(

^In

+

G_kH_k

)

⁻¹^Ak

=

A_k



I_n

−

G_kC_kT_k⁻¹

I_l_k

+

C_k^⊤G_kC_kT_k⁻¹⁻1

C_k^⊤

 A_k

=

A_k

 I_n

−

B_k

I_m_k

+

R⁻_k¹B^⊤_kH_kB_k−1

R⁻_k¹B^⊤_kH_k

 A_k

,

whereC_kandB_khave respectivelyl_kandm_kcolumns. It will be obvious that it is more convenient to work withS_k⁻¹

,

^R⁻k¹

andT_k⁻¹, and we retain the inverse notation only for historical reasons, although there is no actual inversion involved.

Consequently, withC_k

∈

_Rⁿ^×^l^kandB_k

∈

_Rⁿ^×^m^k, we have A_k+1

=

A²_k

−

D⁽_k¹₊⁾₁S_k⁻₊¹₁

 D⁽_k²₊⁾₁

⊤

,

⁽⁴⁾

with the update of ‘‘size’’l_kdefined by

D⁽_k¹₊⁾₁

=

A_kG_kC_k

,

^D⁽k²+⁾1

=

A^⊤_kC_k

,

^Sk⁻+¹1

=

T_k⁻¹

I_l_k

+

∈

_R^l^k^×^l^k

,

^(5a) or the update of ‘‘size’’m_kdefined by

D⁽_k¹₊⁾₁

=

A_kB_k

,

^D⁽k²+⁾1

=

A^⊤_kH_kB_k

,

^Sk⁻+¹1

=



I_m_k

+

R⁻_k¹

∈

_R^m^k^×^m^k

,

^(5b) all involvingO

(

ⁿ³

)

operations for a denseA. The operation counts will be reduced toO

(

ⁿ

)

with the assumption that the maximum number of nonzero components in any row or column ofAis much less thann(seeTable 2in Section4.2). The trick isnotto formA_kexplicitly. Note that we have to store all theB_i

,

^Ci,R⁻_i¹andT_i⁻¹fori

=

0

,

¹

, . . . ,

^k

−

1 to facilitate the multiplication of low-ranked matrices byA_korA^⊤_k.

(3)

We may choose between(5a)and(5b)based on the sizesl_kandm_k. Ignoring the small saving in the inversion of smaller matrices, the compression and truncation in the next section produces the leanerB_kandC_k, which makes the choice here irrelevant. However, this choice may be important whenGorHare not low-ranked.

The induction proof of the general form ofA_kin(4)–(5b)can be completed by considering the initialk

=

1 case, which is trivial.

ForB_k

,

^CkandR_k, applying the SMWF to

(

^I

+

G_kH_k

)

⁻¹in the SDA, we have G_k+1

=

G_k

+

A_kG_kA^⊤_k

−

A_kG_kC_kT_k⁻¹

I_l_k

+

C_k^⊤G_kA^⊤_k

=

G_k

+

A_kG_kA^⊤_k

−

A_kB_k

I_m_k

+

R⁻_k¹B^⊤_kH_kG_kA^⊤_k

,

⁽⁶⁾

and

H_k+1

=

H_k

+

A^⊤_kH_kA_k

−

A^⊤_kH_kG_kC_kT_k⁻¹

I_l_k

+

C_k^⊤G_kC_kT_k⁻¹−1

C_k^⊤A_k

=

H_k

+

A^⊤_kH_kA_k

−

A^⊤_kH_kB_k

(

^Imk

+

R⁻_k¹B^⊤_kH_kB_k

)

⁻¹^R⁻k¹B^⊤_kH_kA_k

.

⁽⁷⁾ These imply that

B_k+1

= [

B_k

,

^AkB_k

] ,

^Ck+1

= [

C_k

,

^A^⊤kC_k

] ,

⁽⁸⁾ R⁻_k₊¹₁

=

R⁻_k¹

⊕



R⁻_k¹

−

R⁻_k¹B^⊤_kC_kT_k⁻¹

I_l_k

+

C_k^⊤B_kR⁻_k¹



(9a)

=

R⁻_k¹

⊕

 R⁻_k¹

−



I_m_k

+

R⁻_k¹B^⊤_kH_kB_k⁻1

R⁻_k¹B^⊤_kH_kB_kR⁻_k¹



,

^(9b)

T_k⁻₊¹₁

=

T_k⁻¹

⊕



T_k⁻¹

−

T_k⁻¹C_k^⊤G_kC_kT_k⁻¹

I_l_k

+

C_k^⊤G_kC_kT_k⁻¹−1

(10a)

=

T_k⁻¹

⊕



T_k⁻¹

−

T_k⁻¹C_k^⊤B_k

I_m_k

+

R⁻_k¹B^⊤_kH_kB_k⁻1

R⁻_k¹B^⊤_kC_kT_k⁻¹

(10b) with the initial values

A₀

=

_A

,

^B0

=

_B

,

^C0

=

_C

,

^R0

=

_R

,

^T0

=

_T

.

⁽¹¹⁾

We have shown that the SDA can be organized into the form(3). The existence ofR⁻_k¹

,

^Tk⁻¹and

(

^In

+

G_kH_k

)

⁻¹^guarantees the same for other inverses in(9a)–(10b). Note thatR⁻_k¹

,

^Sk⁻¹andT_k⁻¹are symmetric for allk. Again, the choice in(9a)–(10b) may be relevant whenGorHare not low-ranked.

For well-behaved DAREs [14,15], we haveH_k

=

C_kT_k⁻¹C_k^⊤

→

XandG_k

=

B_kR⁻_k¹B^⊤_k

→

Y (solution of the dual DARE) as k

→ ∞

.

Note that the ranks ofXandY have been observed to be numerically low-ranked. Under suitable assumptions [14,15], the convergence of the SDA implies the convergence ofA_k

=

O

( | λ |

²^k

) →

0, for some

| λ | <

1. Together with(8)–(10b), we see thatB_k+1andC_k+1equal, respectively, the sums ofB_kandC_kand the diminishing componentsA_kB_kandA^⊤_kC_k. Thus the observation about the low numerical ranks ofXandYhas been shown to be true.

2.2. Compression and truncation of B_kand C_k

Now we shall consider an important aspect of the SDA for large-scale DAREs (SDA_ls)—the growth ofB_kandC_k. Obviously, as the SDA converges, increasingly smaller components are added toB_kandC_k. As is apparent from(8), the growth in the sizes and ranks of these iterates is potentially exponential. Let the computational complexity of the SDA_ls beO

(

ⁿ

) = α

ⁿ

+

O

(

¹

)

^. If the convergence is slow relative to the growth inB_kandC_k, the algorithm will fail, with

α

growing exponentially (see Table 2in Section4.2). In such cases,Xis obviously no longer numerically low-ranked, with respect to some given truncation tolerance (see

τ

1

, τ

2 in (10)and(11)). It will then be extremely challenging to approximateX in O

(

ⁿ

)

computational complexity to high accuracy, by any method. One possibility will be to accept approximations toXto lower accuracies with a higher truncation tolerance, thus lowering the corresponding numerical rank ofX.

To reduce the dimensions ofB_k

,

^Ck

,

^D⁽k¹⁾andD⁽_k²⁾, we shall compress their columns by orthogonaization. Consider the QR decompositions with column pivoting:

B_k

=

Q_1kM_1k

+

Q_1kM_1k

,

C_k

=

Q_2kM_2k

+

Q_2kM_2k with

∥

M_1k

∥ ≤ τ

1

, ∥

M_2k

∥ ≤ τ

2

(4)

where

τ

i

(

ⁱ

=

1

,

²

)

are some small tolerances controlling the compression and truncation process,l_kandm_kare respectively the numbers of columns inB_kandC_kbounded from above by some correspondingm_maxandl_max,

r_1k

=

rankB_k

≤

l_k

≤

m_max

≪

n

,

r_2k

=

rankC_k

≤

m_k

≤

l_max

≪

n

,

and fori

=

1

,

²

,

^Qik

∈

_Rⁿ^×^r^ikare unitary andM_ik

∈

_R^r^ik^×ⁿ^ikare full-ranked and upper triangular. We then have B_kR⁻_k¹B^⊤_k

=

Q_1k

M_1kR⁻_k¹M_1k^⊤

Q_1k^⊤

+

O

(τ

1

),

⁽¹²⁾

C_kT_k⁻¹C_k^⊤

=

Q_2k

M_2kT_k⁻¹M_2k^⊤

Q_2k^⊤

+

O

(τ

2

),

⁽¹³⁾

and we should replaceB_kandR⁻_k¹(or,C_kandT_k⁻¹) respectively by the leanerQ_1kandM_1kR⁻_k¹M_1k^⊤(or,Q_2kandM_2kT_k⁻¹M_2k^⊤).

We may ignore compressing and truncatingD⁽_k¹⁾andD⁽_k²⁾after compressing and truncatingB_kandC_k. As a result, we ignore theO

(τ

i

)

terms and control the growth ofr_ikwhile sacrificing a hopefully negligible bit of accuracy.

Interestingly, we need onlyR

,

^T^and^I

+

G_kH_kto be invertible (which imply the invertibility ofR_kandT_kfor allk), opening up the possibility of dealing with DAREs with indefiniteRs andTs [19].

Eqs.(4)(used recursively butnotexplicitly),(5a)(or(5b)),(8),(9a)(or(9b)),(10a)(or(10b)),(12)and(13), together with the corresponding initial values in(11), constitute the SDA_ls.

2.3. SDA and Krylov subspaces

There is an interesting relationship between the SDA_ls and Krylov subspaces. Define the Krylov subspaces Kk

(

^A

,

^B

) ≡

span

{

B

} (

^k

=

0

),

span

{

B

,

^AB

,

^A²^B

, . . . ,

^A²^k⁻¹^B

} (

^k

>

⁰

).

From(4)and(8), we can see that

B₀

=

B

∈

_K₀

(

^A

,

^B

),

^B1

= [

B

,

^AB

] ∈

_K₁

(

^A

,

^B

)

and, for some low-rankedF,

B₂

=



B₁

,

^A1B₁

] = [

B

,

^AB

, (

^A²

−

ABF^⊤

)(

^B

,

^AB

)



∈

_K₂

(

^A

,

^B

).

(We have abused notations, withV

∈

_K_k

(

^A

,

^B

)

meaning span

{

V

} ⊆

_K_k

(

^A

,

^B

)

.) Similarly, it is easy to show that B_k

∈

_K_k

(

^A

,

^B

),

^Ck

∈

_K_k

(

^A^⊤

,

^C

).

In other words, the general SDA is closely related to approximating the solutionsXandY using Krylov subspaces, with additional components vanishing quadratically. However, for problems of small sizen

,

^BkandC_kbecome full-ranked after a few iterations.

The Krylov subspacesKk

(

^A

,

^B

)

play a vital part in the fast convergence of the SDA, which comes from two sources. Apart from the diminishingA_kcontributing in(2)in the updating ofGandH, the power of approximation of the corresponding Krylov subspaces also contributes, creating cancellations inG_k+1 and H_k+1 in(6) and (7). This phenomenon has been confirmed in some extreme examples, with some eigenvalue

λ

of the symplectic matrix pencil associated with the DARE nearly on the unit circle [16]. Instead of the number of iterations predicted purely from

λ

for convergence, the SDA requires significantly less.

2.4. Errors of SDA_ls

The SDA_ls can be interpreted as a Galerkin method, or directly from(2). With

δ

k

≡

max

{∥ δ

^Gk

∥ , ∥ δ

^Hk

∥ , ∥ δ

^Ak

∥} ,

where

δ

^Gk

, δ

^Hkand

δ

^Akare respectively the truncation/round-off errors inG_k

,

^HkandA_k, we can show

δ

k+1

≤ (

¹

+

c_k

)δ

k

+

O

(δ

²k

),

⁽¹⁴⁾

withc_k

→

0 ask

→ ∞

. A more detailed discussion can be found in [18, Section 2.5]. Essentially, we limit the rank of the approximation toX, trading off the accuracy inXwith the efficiency of the SDA_ls. Assume that the compression and truncation in(12)and(13)create errors ofO

(τ

i

) (

ⁱ

=

1

,

²

)

ⁱⁿ^GkandH_k, respectively. It is easy to see from(14)that errors of the same magnitude will propagate through toA_k+1

,

^Gk+1andH_k+1. The fact thatA_k

→

0 impliesc_k

→

0 and contributes towards diminishing these errors. From our numerical experience, the trade-off between the ranks ofG_kandH_kand the accuracy of the approximate solutions toXandY is the key to the success of our computation. If these ranks grow out of control, unnecessary and insignificant small additions to the iterates overwhelm the computation in terms of flop counts and memory requirement. Limiting the ranks will obviously reduce the accuracy of the approximate solution. We found we do not have to experiment much with the tolerances for the compression/truncation and convergence while trying to achieve a balance between accuracy and the feasibility/efficiency of the SDA.

(5)

Table 1

Krylov subspaces for solutionXand adjoint solutionY.

Equation X Y

DARE, Stein equation Kk(Â^⊤,^C) ^Kk(Â,^B) CARE, Lyapunov equation Kk(Â^−⊤_γ ,Â^−⊤_γ ^C) Kk(Â⁻_γ¹,Â⁻_γ¹^B)

3. CAREs

One possible approach for large-scale CAREs is to transform them to DAREs using Cayley transforms.

3.1. SDA after Cayley transform

From [14], the matricesA

,

^G^and^Hin the CARE(1b)are first treated with the Cayley transform:

A₀

=

I

+

2

γ



A_γ

+

GA^−⊤_γ H−1

,

⁽¹⁵⁾

G₀

=

2

γ

^A⁻_γ¹^G

A^⊤_γ

+

HA⁻_γ¹G⁻1

,

⁽¹⁶⁾

H₀

=

2

γ



A^⊤_γ

+

HA⁻_γ¹G−1

HA⁻_γ¹

,

⁽¹⁷⁾

withA_γ

≡

A

− γ

^Iand a suitable

γ >

0 chosen to optimize the condition of various matrix inversions. A simple application of the SMWF implies

(

^Aγ

+

GA^−⊤_γ H

)

⁻¹

=

A⁻_γ¹

−

A⁻_γ¹GA^−⊤_γ C

·

T⁻¹

(

^Il

+

C^⊤A⁻_γ¹GA^−⊤_γ CT⁻¹

)

⁻¹

·

C^⊤A⁻_γ¹ (18a)

=

A⁻_γ¹

−

A⁻_γ¹B

·



I_m

+

R⁻¹B^⊤A^−⊤_γ HA⁻_γ¹B⁻1

R⁻¹

·

B^⊤A^−⊤_γ HA⁻_γ¹

.

^(18b) It is not hard to see, with the above initialA₀

,

^G0andH₀, that the SDA_ls still works, again with exactly the same forms and updating formulae forA_k

,

^Bk

,

^Ck

,

^D⁽k¹⁾

,

^D⁽k²⁾and the inverses ofR_k

,

^SkandT_k. One relevant difference for CAREs is thatA₀

̸=

A but satisfies, from(15),(18a)and(18b),

A₀

=



I_n

+

₂

γ

^A⁻_γ¹

−

_D⁽₀¹⁾_S₀⁻¹ D⁽₀²⁾

⊤

(19) with

B₀

=

A⁻_γ¹B

,

^C0

=

A^−⊤_γ C

.

⁽²⁰⁾

The corresponding sizelandmperturbed updates have the forms, respectively, D⁽₀²⁾

=

C₀

,

^D⁽0¹⁾

=

A⁻_γ¹GC₀

,

^S0⁻¹

=

2

γ



I_l

+

T⁻¹C₀^⊤GC₀−1

T⁻¹

;

(21a)

D⁽₀¹⁾

=

B₀

,

^D⁽0²⁾

=

A^−⊤_γ HB₀

,

^S⁻0¹

=

2

γ



I_m

+

R⁻¹B^⊤₀HB₀⁻1

R⁻¹

.

^(21b)

Note that all computations can be realized inO

(

ⁿ

)

operations, assuming that the operationsA⁻_γ¹BandA^−⊤_γ Care achievable inO

(

ⁿ

)

flops; see [20, Section 9.1] for a bandedA.

Similarly, we have R⁻₀¹

=

2

γ



R⁻¹

−

R⁻¹B^⊤C₀

·



I_l

+

T⁻¹C₀^⊤GC₀⁻1

T⁻¹

·

C₀^⊤BR⁻¹



(22a)

=

2

γ



R⁻¹

−

R⁻¹B^⊤₀HB₀

I_m

+

R⁻¹B^⊤₀HB₀−1

R⁻¹



,

^(22b)

and

T₀⁻¹

=

2

γ



T⁻¹

−

T⁻¹

I_l

+

C₀^⊤GC₀T⁻¹−1

C₀^⊤GC₀T⁻¹



(23a)

=

2

γ



T⁻¹

−

T⁻¹C^⊤B₀

·

R⁻¹

I_m

+

B^⊤₀HB₀R⁻¹−1

·

B^⊤₀CT⁻¹

.

^(23b)

For CAREs, we have

B_k

∈

_K_k

(

^A⁻_γ¹

,

^A⁻_γ¹^B

),

^Ck

∈

_K_k

(

^A^−⊤_γ

,

^A^−⊤_γ ^C

).

⁽²⁴⁾ Note that the Krylov subspacesKk

(

^A^±¹

,

^B

)

^and^Kk

(

^A^±⊤

,

^C

)

have been used in the solution of CAREs and Lyapunov equations in [21–26], quite different from the subspaces associated with the SDA here. This difference may explain the superiority of our methods. From(24)and [18,27], we can see clearly the appropriate choices of Krylov subspaces for DAREs and CAREs, as well as the corresponding Stein and Lyapunov equations. A summary is contained inTable 1.

(6)

We summarize the algorithm below, with the particular choice of(4),(5a),(8),(9a),(10b),(12)and(13). We would like to emphasize that care has to be exercised in Algorithm 1 below, with the multiplications byA_k+1andA^⊤_k₊₁carried out recursively using(4)and(5a)or(5b). Otherwise, computations cannot be carried out inO

(

ⁿ

)

complexity. Similar care has to be taken in the computation of residuals (used in Algorithm 1 below) or differences of iterates (as an alternative convergence control), as discussed in Section4.2later.

Algorithm 1 (SDA_ls)

Input: A

∈

_Rⁿ^×ⁿ

,

^B

∈

_Rⁿ^×^m

,

^R⁻¹

=

R^−⊤

∈

_R^m^×^m

,

^C

∈

_Rⁿ^×^l

,

^T⁻¹

=

T^−⊤

∈

_R^l^×^lshift

γ >

^0, positive tolerances

τ

1

, τ

2and

ϵ

^{, and}^mmax

,

^lmax;

Output: B_ϵ

∈

_Rⁿ^×^m^ϵ

,

^R⁻_ϵ¹

=

R^−⊤_ϵ

∈

_R^m^ϵ^×^m^ϵ

,

^Cϵ

∈

_Rⁿ^×^l^ϵandT_ϵ⁻¹

=

T_ϵ^−⊤

∈

_R^l^ϵ^×^l^ϵ, withC_ϵT_ϵ⁻¹C_ϵ^⊤and B_ϵR⁻_ϵ¹B^⊤_ϵ approximating, respectively, the solutionsXandYto the large-scale CARE(1b) and its adjoint;

ComputeA_γ

=

A

− γ

^I;

Setk

=

0

,

^r0

=

2

ϵ ;

B₀

=

A⁻_γ¹B

,

^C0

=

A^−⊤_γ C; R⁻₀¹

=

2

γ



R⁻¹

−

R⁻¹B^⊤C₀

·



I_l

+

T⁻¹C₀^⊤GC₀−1

T⁻¹

·

C₀^⊤BR⁻¹

 , T₀⁻¹

=

2

γ



T⁻¹

−

T⁻¹C^⊤B₀

·

R⁻¹

I_m

+

B^⊤₀HB₀R⁻¹−1

·

B^⊤₀CT⁻¹



; D⁽₀²⁾

=

C₀

,

^D⁽0¹⁾

=

A⁻_γ¹GC₀

,

^S0⁻¹

=

2

γ



I_l

+

T⁻¹C₀^⊤GC₀−1

T⁻¹, A₀

=

I_n

+

2

γ

^A⁻_γ¹

−

D⁽₀¹⁾S₀⁻¹

 D⁽₀²⁾

⊤

; Computeh

= ∥

H₀

∥ = ∥

C₀T₀⁻¹C₀^⊤

∥

; Dountil convergence:

Ifthe relative residual_r

˜

_k

= |

d_k

/(

^hk

+

^mk

+

h

) | < ϵ

^,

SetB_ϵ

=

B_k

,

^R⁻_ϵ¹

=

R⁻_k¹

,

^Cϵ

=

C_kandT_ϵ⁻¹

=

T_k⁻¹; Exit

End If

ComputeB_k+1

= [

B_k

,

^AkB_k

] ,

^Ck+1

= [

C_k

,

^A^⊤kC_k

]

; R⁻_k₊¹₁

=

R⁻_k¹

⊕



R⁻_k¹

−

R⁻_k¹B^⊤_kC_kT_k⁻¹

I_l_k

+

C_k^⊤B_kR⁻_k¹

 , T_k⁻₊¹₁

=

T_k⁻¹

⊕



T_k⁻¹

−

T_k⁻¹C_k^⊤B_k

I_m_k

+

R⁻_k¹B^⊤_kC_kT_k⁻¹



; withA_k+1

=

A²_k

−

D⁽_k¹₊⁾₁S_k⁻₊¹₁

 D⁽_k²₊⁾₁

⊤ , D⁽_k¹₊⁾₁

=

A_kG_kC_k

,

^D⁽k²+⁾1

=

A^⊤_kC_k

,

^Sk⁻+¹1

=

T_k⁻¹

I_l

+

; CompressB_k+1andC_k+1, using the tolerances

τ

1and

τ

2, and modify

R⁻_k₊¹₁andT_k⁻₊¹₁, as in(12)and(13);

Computek

←

k

+

1

,

^dk

= ∥

_D

(

^Hk

) ∥ ,

^hk

= ∥

H_k

∥

and^mk

= ∥

M_k

∥

, as in Section4.2;

End Do

4. Computational issues

4.1. Residuals and convergence control

Consider the difference of successive iterates:

dG_k

≡

B_kR⁻_k¹B^⊤_k

−

B_k+1R⁻_k₊¹₁B^⊤_k₊₁

=

B_k+1R⁻_k₊¹₁B^⊤_k₊₁

,

we have

B_k+1

≡

[B_k

,

^Bk+1]

,

R⁻_k₊¹₁

≡

R⁻_k¹

⊕



−

R⁻_k₊¹₁

.

Similarly, withdH_k

≡

C_kT_k⁻¹C_k^⊤

−

C_k+1T_k⁻₊¹₁C_k^⊤₊₁, we have

dH_k

=

C_k+1T_k⁻₊¹₁C_k^⊤₊₁ with

C_k+1

≡

[C_k

,

^Ck+1]

,

T_k⁻₊¹₁

≡

T_k⁻¹

⊕



−

T_k⁻₊¹₁

.

Alternatively,(6)and(7)imply similar results.