These extremal problems enable us to compare the size of the residual for the above method with that obtained by ADI

(1)

APPLIED TO THE SYLVESTER EQUATION

BERNHARD BECKERMANN^†

Abstract. In this paper we suggest a new formula for the residual of Galerkin projection onto rational Krylov spaces applied to a Sylvester equation, and establish a relation to three different underlying extremal problems for rational functions.

These extremal problems enable us to compare the size of the residual for the above method with that obtained by ADI. In addition, we deduce several new a priori error estimates for Galerkin projection onto rational Krylov spaces, both for the Sylvester and for the Lyapunov equation.

Key words. Sylvester equation, Lyapunov equation, Galerkin projection, rational Krylov spaces, ADI

AMS subject classifications. 15A24, 65F30, 30E10

1. Introduction. Given two square matricesA∈C^M^×M, B∈C^N×N, not necessarily of the same size, with disjoint spectra Λ(A), Λ(B), the Sylvester equation

SA,BX:=AX−XB=C (1.1)

has a unique solution X =S_A,B⁻¹ C ∈C^M×N for all C∈C^M×N. One obtains for the special case B =−A^∗ and C =C^∗ the so-called Lyapunov equation. Here the star denotes complex conjugation and transposition.

Sylvester equations appear frequently in many areas of applied mathematics, see for instance the survey [10] and the references therein. For instance, Sylvester equations occur naturally in matrix eigendecompositions [22], control theory [14], model reduction [1, 2, 33], but also numerical solutions of Riccati equations [18], or image processing [12]. In many of these applications, A, B are fairly large and sparse, and C is of low rankdM, N, so that direct methods for solving (1.1) are not suitable.

In this case we will use a full rank factorization of the right-hand side C=ab^∗, a∈C^M^×d, b∈C^N^×d.

In order to simplify presentation, we will deal in this paper with the cased= 1, the statements for the case d > 1 are similar, but since some tangential interpolation problems are involved we leave this for another paper.

The aim of this paper is to compare error estimates for two popular iterative methods for solving (1.1). The ADI iteration due to Peaceman and Rachford [27] has been adapted by Wachspress [35, 36] for solving (1.1), see also related work by Birkhoff and Varga [11]. Roughly speaking, this method requires parameterszA,1, ..., zA,n ∈ C\Λ(A) andzB,1, ..., zB,n∈C\Λ(B) and computes an approximationX_n^ADI of rank ndby solving shifted systems with coefficient matrices (z_A,jI−A) and (z_B,jI−B).

Two aspects make the ADI method particularly attractive. First, it is possible to implement the method through full rank decompositions of the iterates X_n^ADI, and thus it essentially remains to solve 2nshifted systems withdright-hand sides, see [9]

and the references therein. Another attractive aspect is that the ADI errorX−X_n^ADI

∗REVISED JULY 18, 2011

†Laboratoire Painlev´e UMR 8524 (ANO-EDP), UFR Math´ematiques – M3, UST Lille, F-59655 Villeneuve d’Ascq CEDEX, France. E-mail: bbecker@math.univ-lille1.fr.

1

(2)

and the ADI residual SA,B(X −X_n^ADI) can be described via very simple formulas, namely

X−X_nÂDI =RÂDI(A)⁻¹XRÂDI(B), (1.2)

SA,B(X−X_nÂDI) =RÂDI(A)⁻¹ab^∗RÂDI(B), (1.3)

with the rational function

R^ADI(z) =

n

Y

j=1

z−zA,j

z−z_B,j.

It turns out that with a “nearly” optimal choice of parameters the ADI error is small even for modest values ofn, but this is no longer true if the parameters are not well chosen, and here it makes sense to consider instead projection methods.

Given matrices with orthonormal columns

U ∈C^M×m, U^∗U =I, V ∈C^N^×n, V^∗V =I, (1.4)

for the Galerkin approach one looks for an approximant of the formX_m,n^G =U Y^GV^∗ withY^G∈C^m×n by requiring that the residual satisfies the orthogonality conditions

U^∗SA,B(X−X_m,n^G )V = 0.

Introduce the projected Rayleigh matrices

Am:=U^∗AU ∈C^m×m, Bn :=V^∗BV ∈C^n×n, (1.5)

and suppose that Λ(A_m)∩Λ(B_n) is empty. Then it is not difficult to check the following formula for the Galerkin approximant

X_m,n^G =U Y^GV^∗, Y^G=S_A⁻¹

m,B_n(amb^∗_n), (1.6)

where

am:=U^∗a, bn:=V^∗b.

(1.7)

Here we suppose thatm, nM, N, making it possible to solve the Sylvester equation SA_m,B_nY =AmY −Y Bn =amb^∗_n by, e.g., some direct method, and to compute the above quantitiesAm, U, am, Bn, V, bn in reasonable time. One big advantage of such projection methods is that for the fields of values we have the inclusionsW(Am)⊂ W(A), andW(Bn)⊂W(B).

For parameterszA,1, ..., zA,m∈C\Λ(A) andzB,1, ..., zB,n∈C\Λ(B),Cdenoting the extended complex planeC∪ {∞}, we introduce the polynomials

QA(z) =

m

Y

j=1,z_A,j6=∞

(z−zA,j), QB(z) =

n

Y

j=1,z_B,j6=∞

(z−zB,j), (1.8)

and denote byPk the space of polynomials with complex coefficients of degree at most k. The rational Krylov spaces

K_A,m={R_A(A)a:R_A∈ P_m−1/Q_A}, K_B^∗_,n={R_B(B)^∗b:R_B ∈ P_n−1/Q_B}, (1.9)

(3)

built with help of rational functions of fixed denominator play a particular role in the construction of the ADI method, indeed in the casem=nwe find that

colspan(X_n^ADI)⊂ KA,n, colspan((X_n^ADI)^∗)⊂ KB^∗,n,

where up to some degenerate cases there holds equality in both inclusions. Hence in what follows we will always suppose that the columns of U, and of V, form an orthonormal basis ofKA,m, and ofKB^∗,n, respectively, which makes sense to compare the errors for ADI and for Galerkin projection onto rational Krylov spaces (or shorter rational Galerkin).

The original idea of projecting onto polynomial Krylov spaces (all parameters z_A,j, z_B,j =∞) probably goes back to Saad [29], the case of rational Krylov spaces has been considered subsequently by many authors, see for instance [19, 20, 21] and the references therein. However, to our knowledge, until recently little was known about a priori error estimates in the spirit of the famous CG estimate in terms of the condition number. We found first results in this direction by Penzl [28], followed by Druskin and Simoncini [32] and by Kressner and Tobler [25]. The work of [32, 25] is based on an integral representation for the solutionX of the Sylvester equation (1.1) in terms of the exponential function. There exists another integral representation in terms of resolvents

X = 1 2πi

Z

Γ

(zI−A)⁻¹ab^∗(zI−B)⁻¹dz, (1.10)

the curve Γ encircling once the eigenvalues of A but not those ofB. This integral formula has been the starting point for similar a priori error estimates for so-called extended Krylov spaces (all parameterszA,j, zB,j∈ {0,∞}) in [24], see also [25]. The case of arbitrary rational Krylov spaces was considered by Druskin, Knizhnerman and Simoncini in [16] using elegant but deep tools from complex approximation theory like Takenaka-Malmquist orthogonal rational functions and Faber-Dzhrbashyan rational functions; we will summarize all these findings in§2.4 below. The results in [16] are more general but weaker than those of [32, 25, 24] in the sense that the authors only obtainnth root asymptotic upper bounds for the error, with an explicit convergence factor given in terms of the parametersz_A,j and the fields of values of the matricesA andB.

Tools from approximation theory also have been an important argument in [25, Theorem 4.3 and Eqn. (39)] where the authors suggest for the Lyapunov equation with selfadjoint positive definiteA=−B^∗to consider a bivariate polynomial approximation problem.

Our approach is based on an orthogonal decomposition of the residual for rational Galerkin in Theorem 2.1 below, which to our knowledge is new, and which conceptu- ally differs from the one given for instance in [17, Proposition 4.1], the latter being an immediate consequence of the representation of AU −U Am for rational Krylov projections. Each of the three terms in our new representation of the residual can be related to some extremal problem for (univariate) rational functions with prescribed poles. The starting point for this new residual formula is the observation that enforc- ing the residual to be orthogonal reminds of the well-known FOM method for solving systems of linear equations. Thus we may use in (1.10) a well-known formula for the FOM error for shifted systems, given, e.g., in [5].

As a consequence, we are able to relate in Corollary 2.2 the residual norms of both ADI and the rational Galerkin method. We obtain in Theorem 2.3 an explicit a

(4)

priori estimate for the rational Galerkin residual for couplesA, Bhaving disjoint field of values, and state an improved formula forA, B being selfadjoint. The particular case of a Lyapunov equation is studied in Corollary 2.5, which enables us to compare in §2.4 the upper bounds proposed in this paper to existing work. In particular, our upper bounds describe a geometric convergence behavior, with the same convergence factor as that discussed in [16].

The rest of the paper is organized as follows. Section 2 contains further definitions and the statements of our main results, together with motivating remarks. The proofs for these statements can be found in Section 3. In particular, good candidates for our rational and matrix-valued extremal problems are constructed in Theorem 3.4, generalizing previous work [4] for polynomial matrix–valued extremal problems like matrix Chebyshev polynomials.

Notation. In what follows, k · k denotes the Euclidian vector norm and the spectral matrix norm, whereas by k · kF we denote the Frobenius norm. The field of values of a square matrix A is defined by W(A) = {y^∗Ay : kyk = 1}, which is known to be always convex and compact. By stacking the columns ofX, C in a large column vector, one may rewrite (1.1) as an ordinary system of equationsMx=cwith M=IN⊗A−B^T⊗IM. It is not difficult to show thatW(M) =W(A)−W(B), see for instance [16, Proof of Theorem 4.2]. As a consequence, in the caseW(A)∩W(B) being empty, we may give simple estimates for the norm of our Sylvester operator and its inverse

kSA,Bk= sup

C

kSA,BCkF

kCkF

=kMk ≤2 diam(W(A), W(B)), (1.11)

kS_A,B⁻¹ k= sup

C

kCkF

kSA,BCkF

=kM⁻¹k ≤ 1

dist(W(A), W(B)) (1.12)

where diam(W(A), W(B)) := max

x∈W(A) y∈W(B)

|x−y|.

(1.13)

2. Statement of results.

2.1. A new formula for the residual for rational Galerkin. In order to state our main results, we need to introduce some particular rational functions

R^G_A(z) =det(zI−A_m)

QA(z) ∈ Pm/QA, R^G_B(z) = det(zI−B_n)

QB(z) ∈ Pn/QB, (2.1)

whereQ_A, Q_B are as in (1.8). From the theory of rational Krylov spaces it is known that

U^∗R^G_A(A)a= 0, V^∗R_B^G(B)^∗b= 0, (2.2)

in other words,R^G_A(A)a∈ K_A,m+1together with the columns ofU form an orthogonal basis ofKA,m+1, and, similarlyR_B^G(B)^∗b∈ KB^∗,n+1 together with the columns ofV form an orthogonal basis¹ofKB^∗,n+1.

Notice that, by construction, the residual ρ=SA,B(X−X_m,n^G ) has columns in KA,m+1, and its adjoint has columns inKB^∗,n+1. The following result tells us how to represent the residual in these orthogonal bases.

1We keep the same denominators in the rational Krylov spaces, or, equivalently, we take the new poleszA,m+1=zB,n+1=∞.

(5)

Theorem 2.1. Letd= 1, and write shorterρ=SA,B(X−X_m,n^G )for the rational Galerkin residual. Thenρ=ρ1,2+ρ2,1+ρ2,2, with

ρ_1,2=U 1

R^G_B(A_m)a_mb^∗R^G_B(B), ρ_2,1=R^G_A(A)ab^∗_n 1

R^G_A(B_n)V^∗, (2.3)

ρ2,2= R^G_A(A)ab^∗R^G_B(B) R_A^G(∞)R_B^G(∞) , and in particular

kρk²_F =kρ1,2k²_F +kρ2,1k²_F +kρ2,2k²_F. (2.4)

If W(A)∩W(B)is empty, each term is the minimal value of some extremal problem kρ_2,2k_F = inf

RA∈Pm QA RB∈Pn QB

R_A(A)ab^∗R_B(B) RA(∞)RB(∞)

_F =k(I−U U^∗)ab^∗(I−V V^∗)k_F, (2.5)

kρ1,2kF = min

R_B∈^Pⁿ

QB

1 RB

(Am)amb^∗RB(B) _F+c0

1 RB

(Am)amb^∗_nRB(Bn) _F

, (2.6)

kρ_2,1k_F = min

RA∈^P_QA^m

R_A(A)ab^∗_n 1 R_A(B_n)

_F +c₀

R_A(A_m)a_mb^∗_n 1 R_A(B_n)

_F

, (2.7)

where c0 := 2diam(W(A), W(B))/dist(W(A), W(B)). For each extremal problem (2.5), (2.6), and (2.7), the minimum is attained for RA=R^G_A andRB=R_B^G.

From (2.5) we see that ρ2,2 = 0 provided that one of the poles zA,j or zB,j is chosen to be ∞ (as for instance in [24, 25, 32] wherezA,1 =zB,1 =∞), since then eithera∈ KA,m orb∈ KB^∗,n.

2.2. Rational Galerkin versus ADI. By comparing the expression (1.3) with Theorem 2.1 it is not difficult to see that X_n^ADI = X_n,n^G provided that the poles z_A,j, and z_B,j, coincide with thenth Ritz values of B and A, respectively (i.e., the eigenvalues of B_n andA_n), since thenR^ADI = R^G_B = 1/R^G_A. This observation was already mentioned, e.g., in [16, Theorem 3.4]. However, it is quite difficult to choose such poles since of course the Ritz values depend on the poles.

Theorem 2.1 allows us to compare the size of the residual for the ADI method with that of rational Galerkin for arbitrary poles. A weaker result in this direction can be found in [16, Theorem 4.2] where the authors show that the rational Galerkin error is always smaller than a (classical) upper bound for the ADI error.

Corollary 2.2. Provided thatW(A)∩W(B)is empty, we have for the residuals kSA,B(X−X_n,n^G )kF ≤CkSA,B(X−X_n^ADI)kF,

with a constantC≤3 + 2c0 withc0from Theorem 2.1 independent of the parameters zA,j, zB,j.

We do not claim of having found the optimal value of the constantC. However, it becomes clear from Corollary 2.2 that, even for optimal poles, ADI cannot give much better results as rational Galerkin. Recall that for poor poles ADI is known to give much larger residuals [9].

(6)

2.3. Explicit bounds for the residual of rational Galerkin. We now state some a priori upper bounds for the residual of the rational Galerkin method in terms of the fields of values of A and B. In case of selfadjoint A, recall that W(A) = [λmin(A), λmax(A)], and that kf(A)k ≤ kfk_L^∞_(W_(A)) for any functionf analytic on W(A). Crouzeix [13] showed recently the deep result that there exists a universal constant CCrouzeix ≤11.08 (he conjectures thatCCrouzeix= 2) such thatkf(A)k ≤ CCrouzeixkfk_L^∞_(W(A)) for any matrixAand any functionf analytic on W(A). This makes the field of values quite attractive for matrix function analysis, though in general field–of–value estimates may be pessimistic.

Theorem 2.1 tells us that for upper bounds of the rational Galerkin residual it is useful to have upper bounds for the matrix-valued rational extremal problem

Em(A, QA, z) := min

P∈Pm

k_Q^P

A(A)k

|_Q^P

A(z)| , z∈C, (2.8)

which forz =∞ andzA,j → ∞is related to a matrix Chebyshev extremal problem studied by several authors, see, e.g., [34] and the references therein. Indeed, according to (2.5), forρ2,2we requireEm(A, QA,∞) andEn(B, QB,∞), whereas, according to (2.7), for boundingρ_2,1 (and similarlyρ_1,2) we can use the upper bound

Pmin∈Pm

k P QA

(A)k kQA

P (B)k ≤CCrouzeix max

z∈W(B)Em(A, QA, z).

Notice that such kind of estimates are potentially not very sharp, though (up to the factor CCrouzeix) there exist sequences of normal matrices for which asymptotically equality is attained. Again using the Crouzeix estimate we may relate (2.8) with the quantity

E_m(E, Q_A, z) := min

P∈Pm

k_Q^P

AkL^∞(E)

|_Q^P

A(z)| , z∈C, E⊂C, (2.9)

here forE=W(A). As we will see in Theorem 3.4 below, it is possible for convexEand in particular for E= W(A) to give an explicit upper bound both for Em(A, QA, z) and for Em(E, QA, z) which (knowing only W(A)) can be at most improved by a modest factor. Notice that lower bounds forEm(E, QA, z) and arbitraryEare easily obtained by the rational version of the Bernstein-Walsh inequality due to Gonchar [23], see also [16, Lemma 4.4], and Theorem 3.4 below.

In order to describe the rate of convergence, we denote by g_A(·, ζ) the Green function ofC\W(A) with pole atζ∈C, see [31], and define

u_A,m(z) = exp

−

m

X

j=1

g_A(z, z_A,j) .

Recall from [31, §I.1.4] that Green functions satisfy gA(z, ζ) > 0 for z, ζ 6∈ W(A), and g_A(z, ζ) = 0 else. If we disregard the particular case z_A,1, ..., z_A,m ∈W(A), we may conclude that 0 ≤ u_A,m(z) < 1 for all z 6∈ W(A) including z = ∞, and that u_A,m(z_A,j) = 0 ifz_A,j 6∈W(A).

Similarly, we define

u_B,n(z) = exp

−

n

X

j=1

g_B(z, z_B,j) .

(7)

We remark that in case of selfadjoint matrices we may make the expressions foruA,m, anduB,n, more explicit: the Green functiong for a real interval [α, β] is known to be

g(z, ζ) = log

qz−β z−α

qζ−α ζ−β+ 1 qz−β

z−α

qζ−α ζ−β−1

with the principal branch of the square root,√

1 = 1, and thus

uA,m(z) =

m

Y

j=1

q_z−λ

max(A) z−λmin(A)

q_z

A,j−λmin(A) z_A,j−λmax(A)−1 qz−λmax(A)

z−λ_min(A)

q_z

A,j−λmin(A) z_A,j−λ_max(A)+ 1

. (2.10)

We have the following result.

Theorem 2.3. Let d= 1 andW(A)∩W(B) be empty, and define² γ_A,B:= max

z∈W(B)

u_A,m(z), γ_B,A:= max

z∈W(A)

u_B,n(z), then we have for the rational Galerkin residual and for generalA, B,

kSA,B(X−X_m,n^G )kF

kak kbk ≤4c₁max{γA,B, γ_B,A}+c₂ 2γ_A,B 1−γA,B

+ 2γ_B,A 1−γB,A

,

where

c₁= min

j=1,...,m k=1,...,n

1

(1−exp(−gA(zA,j,∞)))(1−exp(−gB(zB,k,∞)))

measuring how far the “furthest” pole is from the fields of values, and c2 = (1 + c0)CCrouzeix with c0 from Theorem 2.1. For selfadjoint A, B and {zA,1, ..., zA,m}, {zB,1, ..., z_B,n} closed under complex conjugation, we have the improvement

kSA,B(X−X_m,n^G )kF

kak kbk ≤4 max{γA,B, γB,A}+c3(γA,B+γB,A) with

c3= 2 s

2max{|λmin(B)−λ_max(A)|,|λmin(A)−λ_max(B)|}

min{|λmin(B)−λmax(A)|,|λmin(A)−λmax(B)|}.

Remark 2.4. We learn from the upper bounds of Theorem 2.3 that we should keep both γ_A,B and γ_B,A small. Optimizingγ_A,B can be done by choosing z_A,j, and optimizingγ_B,A can be done by choosing z_B,j. We claim here without proof (being a consequence of [31, Theorem VIII.2.3]) that

γ^1/m_A,B ≥R(W(A), W(B))⁻¹, γ_B,A^1/n ≥R(W(A), W(B))⁻¹,

2For simplifying notation we do not display the dependency onm, nwhich in what follows are always supposed to be fixed. However, all appearing explicit constantsc0, c1, c2 do not depend on mandn.

(8)

with R(W(A), W(B)) the Riemann modulus of the doubly connected domain C\ (W(A)∪W(B)), and that both inequalities are asymptotically sharp, since they can be attained by appropriate choices of the poles. 2

It is worthy to write down the bounds of Theorem 2.3 for the special caseB=−A^∗ of a Lyapunov equation andm=n. We will make some simplifying assumptions.

Corollary 2.5. Consider a Lyapunov equation (B = −A^∗) with d = 1 and m=n andW(A)∩(−W(A)) being empty. We suppose in addition that A has real entries, and that the poles are chosen such thatzA,j =−zB,j occur in conjugate pairs.

Then withc0,c2 as before

kS_A,B(X−X_n,n^G )k_F

kak kbk ≤(4 + 4c2) γA,B

(1−√ γA,B)².

If in addition A is selfadjoint (and hence positive or negative definite) then we have the improvement

kSA,B(X−X_n,n^G )kF

kak kbk ≤(4 + 4p

2cond(A))γ_A,B.

Proof. SinceAhas real entries, we get thatW(A) is symmetric with respect to the real axis, and thusW(B) =W(−A^∗) =−W(A). Also, by our symmetry assumptions on the poles,

u_A,n(z) =u_A,n(z) =u_B,n(−z), γ_A,B=γ_B,A.

Therefore, as in the proof of the first part of Theorem 2.3 we conclude thatuA,n(∞) = uB,n(∞)≤√

γA,B, and we may relate kρ2,2kF to γA,B without introducingc1. Ob- serving that (1−√

γA,B)²≤1−γA,B, we arrive at the first claim. The second follows by observing thatc₃= 2p

2 cond(A).

Notice that one may improve slightly all upper bounds in Theorem 2.3 and Corol- lary 2.5 (namely, drop each time the first of the two or three terms on the right) if one of the poleszA,j orzB,jis chosen to be∞(since thenρ2,2= 0). Also, we should men- tion that each upper bound also implies an upper bound for the errorkX−X_m,n^G kF by using (1.12). Finally, comparing with Theorem 3.4 it is possible to get upper bounds if one replacesW(A) and W(B) in the definition ofuA,m, uB,n, γA,B, γB,A by larger convex sets.

2.4. Comparison with existing work. We have not seen elsewhere in the literature a priori upper bounds for rational Galerkin applied to a Sylvester equation as in Theorem 2.3. Let us therefore compare the findings of Corollary 2.5 for the Lyapunov equation with other results from the literature. In what follows we suppose that A is real, andα:=λ_min((A+A^∗)/2) >0. Also, we denote byφ the Riemann map ofW(A).

We first have a look at polynomial Galerkin, that is, allzA,j =zB,j =∞. Then uA,n(z) = 1/|φ(z)|ⁿ. Since with W(A) also its level lines (where|φ(z)| is constant) are convex, we see that the first level line hitting −W(A) passes through −α, and thusγA,B = 1/|φ(−α)|ⁿ. Druskin and Simoncini [32, Proposition 3.1] and Kressner and Tobler [25, Corollary 4.4] consider the particular caseA=A^∗, hereα=λmin(A), and we find using (2.10) that

γA,B=

√κ−1

√κ+ 1 ⁿ (2.11)

(9)

with

κ= λmax(A) +λmin(A) 2λmin(A) ,

the same rate of convergence as that found by these authors. However, our constant in front is somehow different. For not necessarily selfadjointA, ifW(A) is included in some ellipseEthen the level lines are explicitly known andγ_A,Bcan be computed. In this case we find the rates of convergence of [32, Corollary 4.5] and [25, Theorem 4.8], though it seems that our constants are somehow better. In particular, compared to [24, Remark 3.3] we gain a factorn.

The case of extended Krylov spaces, that is, evenn, and half of the polesz_A,jbeing 0 and the other half being∞, has been discussed in [25, Lemma 6.1] and [24, Section 4]

for selfadjointA. The authors find the rate given in (2.11) with κ=p

cond(A), in accordance with (2.10). For generalA, we have that

γ_A,B= max

z∈−W(A)

s

1 φ(z)

φ(z)−φ(0) 1−φ(0)φ(z)

n

,

which at least for disks has been computed in [24, Proposition 5.1]. In the general case the authors show in [24, Theorem 3.2] that the error is bounded by C n γ_A,B with a not explicitly given constant C depending on all the data but not on n. We are able to drop this factorn.

Finally, the only result which we are aware of for general rational Galerkin is [16, Proposition 4.5]: for infinite dimensional matrices (operators) A the authors show that the limsup of the nth root of the error is bounded above by the limsup of [γA,B]^1/n. They conclude in [16, Theorem 4.6 and Proposition 4.7] that if the poles are distributed asymptotically like the equilibrium measure of the condenser consisting of the platesW(A) andW(−A) then thenth root of the error behaves at worst likeR(W(A), W(−A))⁻¹, whereR(W(A), W(−A)) is the Riemann modulus of this condenser. This is exactly the classical upper bound for ADI with poles which are optimal forW(A).

We recall the conformal invariance

R(W(A), W(−A)) =R(φ(W(A)), φ(W(−A))) =R(D, φ(W(−A))),

the latter being related to some classical well-studied Zolotarev problem. In what follows we use some results from [5, Section 6.2], we refer the reader to the references of that paper for more details. According to [5, Eqn. (6.9)] one has

γA,B≥ 1

R(W(A), W(−A))ⁿ (2.12)

for any choice of the poles.

For the particular case W(A)⊂(0,∞) a real interval and hence alsoφ(W(−A)) an (explicitly given) interval ⊂ (−∞,−1), one may give an explicit expression of R(W(A), W(−A)) in terms of the condition number of A, a formula which involves elliptic integrals. Moreover, it is known that there are poles achieving the lower bound up to a factor 2: according to [5, Proof of Theorem 6.6], one should take as φ(z_A,j)∈φ(W(−A)) the elliptic points of the two intervalsφ(−W(A)), φ(−W(A))⁻¹,

(10)

that is, the zeros of the explicitly known rational function solving the corresponding Zolotarev problem

Zn,n(E,F) := min

P,Q∈Pn

kP

Qk_L^∞₍_E₎kQ

Pk_L^∞₍_F₎, E=φ(−W(A)) =F⁻¹. (2.13)

It is known that this extremal rational function is a Blaschke product, and hence γA,B=p

Zn,n(E,F)≤2/R(W(A), W(−A))ⁿ.

For not necessarily selfadjoint A, in general the shape of φ(W(−A)) might be complicated, and hardly any results on the solution of (2.13) are known. However,

“good” poles with smallγ_A,Bare obtained by the same principle to choose asφ(z_A,j)∈ φ(W(−A)) the zeros of a Blaschke productP/Q giving a small value in (2.13), the choice of poles suggested in [16, Theorem 4.6 and Proposition 4.7].

2.5. Linear versus superlinear convergence. For selfadjoint A, B we have used in this paper the simple upper boundkr(A)k ≤ krkL^∞(W(A)) for rational func- tionsr. This sometimes severe overestimation might be acceptable for many applications. However, for instance for a cyclic choice of a few multiple poles this leads to upper bounds describing a linear convergence behavior, that is, we consider a “worst case” eigenvalue distribution.

The authors in [3] work instead with discrete maximum norms on Λ(A). For sequences of selfadjoint matricesA=A(N) of orderN, which forN → ∞have a joint eigenvalue distribution, they are able to quantify superlinear convergence behavior for the (nth root of the) error afternCG iterations applied toA(N) forn, N→ ∞such thatn/N tends to somet >0.

Recently, the asymptotic behavior of rational Ritz values and thus the nth root behavior of |R^G_n(z)|/kR^G_n(A)k has been found in [7] in a similar setting, if one sup- poses in addition that also the sets of poles have some joint distribution (such as a cyclic repetition of few multiple poles). A nice Buyarov–Rakhmanov type formula for this expression in terms of marginal condition numbers can be found in [8]: roughly speaking, instead ofuA,m one takes a mean of the formula of uA,m over a family of decreasing sets instead of the fixed set W(A). Inserting these formulas directly in (2.4) allows us to describe superlinear convergence for rational Galerkin applied to a Sylvester or a Lyapunov equation. However, in order to make such a formula more explicit, one has to solve some extremal problems in logarithmic potential theory, involving the asymptotic distribution of eigenvalues and of poles.

In such an (asymptotic) superlinear convergence theory, in order to find optimal poles for ADI or rational Galerkin one should solve (approximately) the discrete Zolotarev problem Zn,n(Λ(A),Λ(B)) instead of the continuous Zn,n(W(A), W(B)).

Thenth root asymptotics for the corresponding optimal rate of convergence can be found in [6].

3. Proofs.

3.1. Proof of Theorem 2.1. For a proof of Theorem 2.1, we need three auxil- iary results, which we state each time forKA,m and for KB^∗,n (however, the second statement follows from the first by taking adjoints). The first and the third property are classical facts for rational Krylov spaces.

The first property is usually referred to as the exactness property of rational Krylov spaces, a proof may be found in [5, Proof of Theorem 5.2] forz_A,1=∞, or in [15, Lemma 3.1] for general poles and selfadjointA. For the sake of completeness we add a proof.

(11)

Lemma 3.1. For any RA ∈ P_m−1/QA we have that RA(A)a = U RA(Am)am. Similarly, for anyRB ∈ P_n−1/QB we have thatb^∗RB(B) =b^∗_nRB(Bn)V^∗.

Proof. By linearity, it is sufficient to considerR_A(z) = 1/(z−z_j)^`∈ P_m−1/Q_Afor some integer`≥1 (andRA(z) =z^`∈ P_m−1/QAin case degQA< m). By definition ofKA,m andU, there exists a vectorc∈C^msuch that

R_A(A)a= (A−z_jI)^−`a=U c.

In case`= 1 we multiply on the left byU^∗(A−zjI), leading toam=U^∗a=U^∗(A− zjI)U c= (Am−zjI)c, orc= (Am−zjI)⁻¹am, as required. In case` >1 we argue by recurrence on `, and obtain (A_m−z_jI)c=U^∗(A−z_jI)^−`+1a= (A_m−z_jI)^−`+1a_m, again as required. The analysis forR_A(z) =z^` for`≥0 is similar, we omit details.

The second property is required for the second part of Theorem 2.1.

Lemma 3.2. For any RA∈ Pm/QA we have that (I−R_A(A)

RA(z))(zI−A)⁻¹a=U(I−R_A(A_m)

RA(z) )(zI−Am)⁻¹am. Similarly, for anyRB ∈ Pn/QB,

b^∗(zI−B)⁻¹(I−R_B(B)

RB(z)) =b^∗_n(zI−B_n)⁻¹(I−R_B(B_n) RB(z) )V^∗. Proof. It is sufficient to observe that, for fixedz, the function

x7→ 1−RA(x)/RA(z) z−x is an element ofPm−1/Q_A, and to apply Lemma 3.1.

Our third property is the classical representation of the FOM error for shifted systems in terms of the rational functions (2.1). It follows immediately from Lemma 3.2 by observing thatR_A^G(A_m)a_m= 0, b^∗_nR_B^G(B_n) = 0.

Lemma 3.3. We have that

(zI−A)⁻¹a−U(zI−Am)⁻¹am= R_A^G(A)

R^G_A(z)(zI−A)⁻¹a.

Similarly,

b^∗(zI−B)⁻¹−b^∗_n(zI−Bn)⁻¹V^∗=b^∗(zI−B)⁻¹R^G_B(B) R_B^G(z).

Proof of the first part of Theorem 2.1. Write shorter x = (zI −A)⁻¹a, xe = (zI−An)⁻¹an, y =b^∗(zI−B)⁻¹, ye=b^∗_n(zI−Bn)⁻¹, then according to (1.6) and (1.10)

X−X_m,n^G = 1 2πi

Z

ΓA

xy−UxeeyV^∗ dz

where the compact contour Γ_A encircles once in mathematical positive orientation Λ(A) and Λ(A_m), but not Λ(B) and Λ(B_n). We also consider Γ_B, a compact contour

(12)

encircling once in mathematical positive orientation Λ(B) and Λ(Bn), but not Λ(A) and Λ(Am). Then we can write using Lemma 3.3

X−X_m,n^G = 1 2πi

Z

Γ_A

(x−Ux)ye +x(y−yVe ^∗)−(x−Uex)(y−eyV^∗) dz

= 1 2πi

Z

Γ_A

R^G_A(A)

R^G_A(z)xy+xyR^G_B(B)

R^G_B(z) −R^G_A(A)

R^G_A(z)xyR_B^G(B) R^G_B(z)

dz.

Let us integrate each term separately. For the first term, observing that the integrand isO(z⁻²)_z→∞, we replace Γ_A by Γ_B which by the residual theorem gives a change of sign, and thus

1 2πi

Z

Γ_A

R^G_A(A)

R^G_A(z)Axy dz− 1 2πi

Z

Γ_A

R^G_A(A)

R^G_A(z)xyB dz=I_1,1−I_1,2, I_1,1:= 1

2πi Z

ΓB

R^G_A(A)

R^G_A(z)ab^∗(zI−B)⁻¹dz, I_1,2:= 1 2πi

Z

ΓB

R_A^G(A)

R^G_A(z)(zI−A)⁻¹ab^∗dz, where, again by the residual theorem,I_1,2 vanishes.

Similarly, we obtain for the second term the expression−I2,1+I2,2, with I2,1:= 1

2πi Z

ΓA

ab^∗(zI−B)⁻¹R^G_B(B)

R^G_B(z) dz, I2,2:= 1 2πi

Z

ΓA

(zI−A)⁻¹ab^∗R_B^G(B) R^G_B(z) dz, and here the first integralI2,1vanishes by the residual theorem. Finally, for the third term we get−I3,1+I3,2, with

I3,1:= 1 2πi

Z

Γ_A

R_A^G(A)

R^G_A(z)ab^∗(zI−B)⁻¹R^G_B(B) R^G_B(z) dz, I_3,2:= 1

2πi Z

Γ_A

R_A^G(A)

R^G_A(z)(zI−A)⁻¹ab^∗R^G_B(B) R^G_B(z) dz, where by the residual theorem

I_3,2= R^G_A(A)

R^G_A(∞)ab^∗R^G_B(B) R^G_B(∞)− 1

2πi Z

Γ_B

R^G_A(A)

R^G_A(z)(zI−A)⁻¹ab^∗R^G_B(B) R^G_B(z) dz.

By putting the three terms together and applying twice Lemma 3.3, we find that SA,B(X−X_m,n^G )− R^G_A(A)

R^G_A(∞)ab^∗R^G_B(B) R^G_B(∞)

= 1 2πi

Z

Γ_B

R^G_A(A)

R^G_A(z)ab^∗(zI−B)⁻¹(I−R^G_B(B) R^G_B(z))dz + 1

2πi Z

ΓB

(I−R_A^G(A)

R^G_A(z))(zI−A)⁻¹ab^∗R^G_B(B) R^G_B(z) dz

= 1 2πi

Z

Γ_B

R^G_A(A)

R^G_A(z)ab^∗_n(zI−Bn)⁻¹V^∗dz+ 1 2πi

Z

Γ_A

U(zI−Am)⁻¹amb^∗R^G_B(B) R^G_B(z) dz

=R^G_A(A)ab^∗_n 1

R^G_A(Bn)V^∗+U 1

R^G_B(Am)amb^∗R^G_B(B),

(13)

as claimed in (2.3). Introducing the orthogonal projectors ΠA=U U^∗, ΠB=V V^∗, (3.1)

we know from (2.2) that ΠAρ(I−ΠB) =ρ1,2, (I−ΠA)ρΠB =ρ2,1, and (I−ΠA)ρ(I− ΠB) =ρ2,2, from which (2.4) follows. 2

Proof of equation (2.5) of Theorem 2.1. For anyRA∈ Pm/QAthere existscA∈C such that

R_A−c_AR^G_A∈ P_m−1/Q_A, cA

R_A(∞) = 1 R^G_A(∞).

Taking into account the definition ofU, (2.2), and Lemma 3.1, we conclude that (I−ΠA)RA(A)

R_A(∞)a= R^G_A(A) R^G_A(∞)a and, similarly, forRB ∈ P_n−1/QB,

b^∗RB(B)

RB(∞)(I−ΠB) =b^∗R^G_B(B) R^G_B(∞)

implying the first part of (2.5). For the second part one has to distinguish two cases:

ifR^G_A(∞) = ∞(at least one of the zA,j equals ∞) thena∈ KA,m and the property is trivially true. Otherwise, _R^RG^G^A^(z)

A(∞)−1 ∈ Pm−1/QA which together with Lemma 3.1 implies that

R^G_A(A)

R^G_A(∞)a= (I−ΠA)R^G_A(A)

R^G_A(∞)a= (I−ΠA)a.

By the same argument,b^∗_R^RG^G^B^(B)

B(∞) =b^∗(I−ΠB), as claimed in (2.5). 2

Proof of equations (2.6), (2.7) of Theorem 2.1. We only show (2.6), the proof of (2.7) is obtained by exchanging the roles ofAandB. Sinceρ1,2= ΠAρ(I−ΠB), we find thatkρ1,2kF =kU^∗ρ1,2kF. Let us show that, for anyRB ∈ Pn/QB,

kU^∗ρ_1,2kF ≤

1 RB

(A_m)a_mb^∗R_B(B)

_F+c₀

1 RB

(A_m)a_mb^∗_nR_B(B_n) _F. (3.2)

Since R^G_B(Bn) = 0 by (2.1), we see from (2.3) that there is equality in (3.2) for RB =R^G_B, and hence (2.6) follows.

We have U^∗ρ_1,2= 1

R^G_B(A_m)a_mb^∗R^G_B(B) = 1 2πi

Z

ΓA

(zI−A_m)⁻¹a_mb^∗R^G_B(B) R^G_B(z) dz

= 1 2πi

Z

Γ_A

(zI−Am)⁻¹am

b^∗(zI−B)⁻¹−b^∗_n(zI−Bn)⁻¹V^∗

(zI−B)dz

=I1−I2, I1:= 1

2πi Z

ΓA

(zI−Am)⁻¹am

b^∗(zI−B)⁻¹RB(B) R_B(z)

(zI−B)dz, I2:= 1

2πi Z

Γ_A

(zI−Am)⁻¹am

b^∗_n(zI−Bn)⁻¹R_B(B_n) RB(z) V^∗

(zI−B)dz,

(14)

where we have applied first Lemma 3.3 and then Lemma 3.2. Notice that if one of the roots of RB coincides with one of the eigenvalues of Am then we adapt the convention thatk_R¹

B(Am)k has norm∞. Otherwise, we may deform ΓAsuch that it encircles Λ(Am) but not the roots ofRB. In this case,I1=_R¹

B(Am)amb^∗RB(B), and it remains to show that

kI2kF ≤c0k 1 RB

(Am)amb^∗_nRB(Bn)kF. (3.3)

By (1.10) and the Cauchy residual formula, S_A⁻¹

m,BI₂= 1 2πi

Z

ΓB

(ζI−A_m)⁻¹ 1 2πi

Z

ΓA

(zI−A_m)⁻¹a_m

b^∗_n(zI−Bn)⁻¹R_B(B_n) RB(z) V^∗

(zI−B)dz(ζI−B)⁻¹dζ

= 1 2πi

Z

ΓB

(ζI−Am)⁻¹ 1 2πi

Z

ΓA

(zI−Am)⁻¹am

b^∗_n(zI−B_n)⁻¹R_B(B_n) RB(z) V^∗

(z−ζ)dz(ζI−B)⁻¹dζ

= 1 2πi

Z

Γ_B

1 2πi

Z

Γ_A

(ζI−Am)⁻¹−(zI−Am)⁻¹ am

b^∗_n(zI−Bn)⁻¹RB(Bn) R_B(z) V^∗

dz(ζI−B)⁻¹dζ

=− 1 2πi

Z

Γ_A

(zI−A_m)⁻¹a_mb^∗_n(zI−B_n)⁻¹R_B(B_n) RB(z) V^∗dz.

As above we deduce that SA_m,B_n

S_A⁻¹

m,BI2

=− 1 2πi

Z

Γ_A

(zI−Am)⁻¹amb^∗_nRB(Bn) RB(z) V^∗dz

=− 1 RB

(Am)amb^∗_nRB(Bn)V^∗. Hence

kI2kF ≤ kS_A⁻¹

m,B_nk kSA_m,Bk k 1 RB

(Am)amb^∗_nRB(Bn)kF, and our claim (3.3) follows from (1.11) and (1.12). 2

3.2. Proof of Corollary 2.2. Comparing (1.3) with Theorem 2.1 for m=n, we may chooseR^ADI =R_B = 1/R_A in (2.5), (2.6), and (2.7).

ThenRA(∞)RB(∞) = 1, and thus by (2.5)

kρ2,2kF ≤ kRA(A)ab^∗R_B(B)kF =kSA,B(X−X_n^ADI)kF.

For the upper bound for kρ_1,2k given in (2.6), we first observe that R_A−_R_G^R^G^A

A(∞) ∈ Pm−1/Q_A, and hence by Lemma 3.1, (2.1), (2.2), and (3.1),

Π_AR_A(A)a=U U^∗R_A(A)a=U R_A(A_n)a_n,

(15)

and similarlyb^∗RB(B)ΠB=b^∗_nRB(Bn)V^∗. Hence by (2.6)

kρ1,2kF ≤ kRA(A_n)a_nb^∗R_B(B)kF+c₀kRA(A_n)a_nb^∗_nR_B(B_n)kF

≤(1 +c0)kSA,B(X−X_n^ADI)kF.

The same conclusion is obtained forkρ2,1k, and thus Corollary 2.2 holds.

3.3. Proof of Theorem 2.3. We start by giving a result related to the extremal quantities E(A, QA, z) andE(W(A), QA, z) defined in (2.8), and (2.9), respectively.

In our proof of Theorem 3.4 we do not use field–of–value estimates of Crouzeix [13], but merely generalize the techniques from [4].

Theorem 3.4. Let E⊂C be some convex compact set (not a single point), and A a square matrix withW(A)⊂E. For all R_A∈ Pm/Q_A and for all z6∈E

kRAkL^∞(E)

|RA(z)| ≥u(z) := exp

−

m

X

j=1

g(z, zA,j) ,

withg(·, ζ) the Green function ofC\Ewith pole atζ. This inequality is sharp up to some modest constant in the sense that there existsR_A^#∈ Pm/QA havingm zeros in Ewith

kR^#_Ak_L^∞₍_E₎≤2, kR^#_A(A)k ≤2,

∀z6∈E: 1

|R^#_A(z)| ≤ u(z) 1−u(z).

Proof. The first inequality is known as the rational version of the Bernstein–Walsh inequality due to Gonchar [23]. For a proof, it is sufficient to notice that the function f(z) = log|RA(z)u(z)|is subharmonic inC\E, and to apply the maximum principle for subharmonic functions, leading tof(z)≤ kfk_L∞(∂E)≤ klog|R_A| k_L∞(E).

For showing the second part, we denote byφthe Riemann conformal mapping of C\EontoC\D,Dthe closed unit disk, withφ(∞) =∞andφ⁰(∞)>0. Also, denote the inverse map by ψ = φ⁻¹. If all zA,j ∈ E, then u(z) = 1 (all Green functions vanish), and the second statement is trivially true. Otherwise, we may chooseR^#_A by prescribing as roots allz_A,j ∈E. Since the corresponding terms in uvanish, we may suppose without loss of generality for our construction of R^#_A that all roots z_A,j of Q_A are outsideE. Then it is known that one can write the function uin terms of Blaschke products

1

u(z) =|h(φ(z))|, h(w) =

m

Y

j=1

1−φ(zA,j)w w−φ(zA,j). Notice thathis analytic inD, and meromorphic outside ofD.

The Faber mapF identifies functions analytic in Dwith functions analytic inE via the formula

z∈Int(E) : F(p)(z) = Z

|w|=1

p(w) wψ⁰(w) ψ(w)−z

dw 2πiw, in particularF(P_m) =P_m, andF(P_m/Q) =P_m/Q_A, withQ(z) =Qm

j=1(w−φ(z_A,j)), see for instance [5] and the references therein. In particular,

R^#_A(z) :=F(h)(z) +h(0)∈ P_m/Q_A.

(16)

By deforming the path of integration towards∞we also see that Z

|w|=1

h(w) wψ⁰(w) ψ(w)−z

dw 2πiw =

Z

|w|=1

1 h(w)

wψ⁰(w) ψ(w)−z

dw

2πiw = 1

h(∞) =h(0) since 1/his analytic outside D. Thus

R^#_A(z) = Z

|w|=1

h(w)Re

2 wψ⁰(w) ψ(w)−z

dw 2πiw. (3.4)

Since the real part is positive for allz∈Int(E), we obtain kR^#_Ak_L^∞₍_E₎≤ khk_L^∞₍_D₎ sup

z∈Int(E)

Z

|w|=1

Re

2 wψ⁰(w) ψ(w)−z

dw 2πiw = 2.

As explained in more detail, e.g., in [5, Proof of Theorem 2.1], the above reasoning remains true after replacing the scalar argument z by the matrix A, leading to the similar conclusion kR^#_A(A)k ≤ 2. Finally, for |v| < 1, we have the Poisson kernel formula

h(v) = Z

|w|=1

h(w)Rew+v w−v

dw 2πiw, (3.5)

which can also be shown directly as in (3.4). The above convexity argument shows that

Re

2 wψ⁰(w)

ψ(w)−ψ(v)−w+v w−v

is ≥ 0 for |v| = 1 = |w|, v 6= w, and thus also for |v|,|w| ≥ 1 by the maximum principle. Combining the two integrals (3.4) and (3.5), we conclude that

kR^#_A◦ψ−hk_L∞(D)≤lim

→0 sup

|v|=1

Z

|w|=1+

Re

2 wψ⁰(w)

ψ(w)−ψ(v)−w+v w−v

dw 2πiw ≤1.

Finally, the maximum principle applied to the functionv7→R^#_A(ψ(v))−h(v) analytic inC\Dtells us that, for allz6∈E,

|R^#_A(z)| ≥ |h(φ(z))| −1 = 1

u(z)−1>0, as required for the assertion of Theorem 3.4.

We learn from Theorem 3.4 that, for convex compactEand for matricesAwith W(A)⊂E,z6∈E,

1≤Em(E, QA, z)

u(z) ≤ 2

1−u(z), Em(A, QA, z)

u(z) ≤ 2

1−u(z),

where we may construct (diagonal) matrices A with Em(A, QA, z) ≤ Em(E, QA, z) arbitrarily close toEm(E, QA, z). In the particular caseQA= 1 and thuszA,1=...= zA,m=∞, our assertion reduces to

1≤ Em(E,1, z)

|φ(z)|^−m ≤ 2

1− |φ(z)|^−m, Em(A,1, z)

|φ(z)|^−m ≤ 2 1− |φ(z)|^−m,