• Aucun résultat trouvé

5 Proof Organization of Main Theorems

5.1 Symmetric Case

For shorthand, we write τ = kEk, and κ = √

dkEVkmax. An obvious bound for κ is κ ≤ √

rµ τ (by Cauchy-Schwarz inequality). We will use these notations throughout this subsection.

Recall the spectral decomposition of A in (8). Expressing E in terms of the column vectors of V and V, which form an orthogonal basis in Rn, we write

[V, V]TE[V, V] =: E11 E12 E21 E22

!

. (27)

Note that E12 = E21T since E is symmetric. Conceptually, the perturbation results in a

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of Sigma_u (Median)

p

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of Sigma_u (IQR)

p

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of inverse Sigma (Median)

p

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of inverse Sigma (IQR)

p

error ratio

200 400 600 800

0.00.40.81.2

Relative Frobenius norm error of Sigma (Median)

p

error ratio

200 400 600 800

0.00.40.81.2

Relative Frobenius norm error of Sigma (IQR)

p

error ratio

Figure 5: Error ratios of robust estimates against varying dimension. Blue lines represent errors of Method (2) over Method (1) under different norms; black lines errors of Method (3) over Method (1); red lines errors of Method (4) over Method (1). (ftT, uTt) is generated by element-wise iid t-distribution with df = 3 (solid), 5 (dashed) and ∞ (dotted). The median errors and their IQR’s (interquartile range) over 100 simulations are reported.

rotation of [V, V], and we write a candidate orthogonal basis as follows:

V := (V +VQ)(Ir+QTQ)−1/2, V:= (V−V QT)(Id−r+QQT)−1/2, (28) where Q ∈ R(d−r)×r is to be determined. It is straightforward to check that [V , V] is an orthogonal matrix. We will choose Q in a way such that (V , V)TA(V , Ve ) is a block

diagonal matrix, i.e.,VTAVe = 0. Substituting (28) and simplifying the equation, we obtain Q(Λ1 +E11)−(Λ2+E22)Q=E21−QE12Q. (29) The approach of studying perturbation through a quadratic equation is known (see Stewart (1990) for example). Yet, to the best of our knowledge, existing results study perturbation under orthogonal-invariant norms (or unitary-invariant norms in the complex case), which includes a family of matrix operator norms and Frobenius norm, but excludes the matrix max-norm. The advantages of orthogonal-invariant norms are pronounced: such norms of a symmetric matrix only depend on its eigenvalues regardless of eigenvectors; moreover, with suitable normalization they are consistent in the sense kABk ≤ kAk · kBk. See Stewart (1990) for a clear exposition.

The max-norm, however, does not possess these important properties. An imminent issue is that it is not clear how to relate Q toVQ, which will appear in (29) after expanding E according to (27), and which we want to control. Our approach here is to study Q:=VQ directly through a transformed quadratic equation, obtained by left multiplying V to (29).

Denote H = VE21, Q = VQ, L1 = Λ1 +E11, L2 = V2 +E22)VT. If we can find an appropriate matrix Qwith Q=VQ, and it satisfies the quadratic equation

Q L1 −L2Q=H−QHTQ, (30)

then Q also satisfies the quadratic equation (29). This is because multiplying both sides of (30) by VT yields (29), and thus any solution Qto (30) with the formQ=VQmust result in a solution Qto (29).

Once we have such Q (or equivalently Q), then (V , V)TA(V , Ve ) is a block diagonal matrix, and the span of column vectors of V is a candidate space of the span of first r eigenvectors, namely span{ev1, . . . ,evr}. We will verify the two spaces are identical in Lemma 5.3. Before stating that lemma, we first provide bounds on kQkmax and kV −Vkmax. Lemma 5.1. Suppose |λr| −ε > 4rµ(τ + 2rκ). Then, there exists a matrix Q ∈ R(d−r)×r such that Q = VQ ∈ Rd×r is a solution to the quadratic equation (30), and Q satisfies kQkmax≤ω/√

d. Moreover, if rω <1/2, the matrix V defined in (28) satisfies kV −Vkmax≤2√

µ ωr/√

d . (31)

Here, ω is defined as ω= 8(1 +rµ)κ/(|λr| −ε).

The second claim of the lemma (i.e., the bound (31)) is relatively easy to prove once the first claim (i.e., the bound on kQkmax) is proved. To understand this, note that we can

rewrite V as V = (V +Q)(Ir +QTQ)−1/2, and kQTQkmax can be controlled by a trivial inequalitykQTQkmax≤dkQk2max ≤w2. To prove the first claim, we construct a sequence of matrices through recursion that converges to the fixed point Q, which is a solution to the quadratic equation (30). For all iterates of matrices, we prove a uniform max-norm bound, which leads to a max-bound on kQkmax by continuity. To be specific, we initialize Q0 = 0, and given Qt, we solve a linear equation:

Q L1−L2Q=H−QtHTQt, (32) and the solution is defined as Qt+1. Under some conditions, the iterate Qt converges to a limit Q, which is a solution to (30). The next general lemma captures this idea. It follows fromStewart (1990) with minor adaptations.

Lemma 5.2. Let T be a bounded linear operator on a Banach space B equipped with a norm k · k. Assume that T has a bounded inverse, and define β = kT−1k−1. Let ϕ: B → B be a map that satisfies

kϕ(x)k ≤ηkxk2, and kϕ(x)−ϕ(y)k ≤2ηmax{kxk,kyk}kx−yk (33) for some η ≥ 0. Suppose that B0 is a closed subspace of B such that T−1(B0) ⊆ B0 and ϕ(B0) ⊆ B0. Suppose y ∈ B0 that satisfies 4ηkyk < β2. Then, the sequence initialized with x0 = 0 and iterated through

xk+1=T−1(y+ϕ(xk)), k ≥0 (34) converges to a solutionx? toT x=y+ϕ(x). Moreover, we havex? ⊆ B0, andkx?k ≤2kyk/β.

To apply this lemma to the equation (30), we view B as the space of matrices Rd×r with the max-norm k · kmax, and B0 as the subspace of matrices of the form VQ where Q∈R(d−r)×r. The linear operator T is set to be the T(Q) = Q L1−L2Q, and the map ϕis set to be the quadratic functionϕ(Q) = −QHTQ. Roughly speaking, under the assumption of Lemma 5.2, the nonlinear effect caused by ϕ is weak compared with the linear operator T. Therefore, it is crucial to show T is invertible, i.e. to give a good lower bound on kT−1k−1max = infkQkmax=1kT(Q)kmax. Since the norm is not orthogonal-invariant, a subtle issue arises when A is not of exact low rank, which will be discussed at the end of the subsection.

If there is no perturbation (i.e., E = 0), all the iterates Qtare simply 0, so V is identical toV. If the perturbation is not too large, the next lemma shows that the column vectors of V span the same space as span{ev1, . . . ,evr}.

In other words, with a suitable orthogonal matrix R, the columns of V R are ev1, . . . ,evr. Lemma 5.3. Suppose |λr| −ε > max{3τ,64(1 +rµ)r3/2µ1/2κ}. Then, there exists an or-thogonal matrix R∈Rr×r such that the column vectors of V R are ev1, . . . ,evr.

Proof of Theorem 2.1. It is easy to check that under the assumption of Theorem 2.1, the conditions required in Lemma 5.1 and Lemma 5.3 are satisfied. Hence, the two lemmas imply Theorem2.1.

To study the perturbation of individual eigenvectors, we assume, in addition to the con-dition on |λr|, that λ1, . . . , λr satisfy a uniform gap, (namely δ > kEk2). This additional assumption is necessary, because otherwise, the perturbation may lead to a change of rel-ative order of eigenvalues, and we may be unable to match eigenvectors from the order of eigenvalues. SupposeR∈Rr×r is an orthogonal matrix such thatV Rare eigenvectors ofA.e Now, under the assumption of of Theorem2.1, the column vectors ofVe andV Rare identical up to sign, so we can rewrite the difference Ve −V as

Ve −V =V(R−Ir) + (V −V). (35) We already provided a bound on kV −Vkmax in Lemma 5.1. By the triangular inequality, we can derive a bound on kVkmax. If we can prove a bound on kR−Irkmax, it will finally leads to a bound on kVe −Vkmax. In order to do so, we use the Davis-Kahan theorem to obtain an bound on hevi, vii for all i ∈ [r]. This will lead to a max-norm bound on R−Ir (with the price of potentially increasing the bound by a factor of r). The details about the proof of Theorem 2.2 are in the appendix.

We remark that, we assume conditions on |λr| − in Theorem 2.1 and Theorem 2.2, which are only useful in cases where|λr|>kA−Ark. Ideally, we would like to have results with assumptions only involving λr and λr+1, since Davis-Kahan theorem only requires a gap in neighboring eigenvalues. Unfortunately, unlike orthogonal-invariant norms that only depend on the eigenvalues of a matrix, the max-norm k · kmax is not orthogonal-invariant, and thus it also depends on the eigenvectors of a matrix. For this reason, it is not clear whether we could obtain a lower bound on kT−1k−1max using only the eigenvaluesλr and λr+1 so that we could apply Lemma 5.2. The analysis appears to be difficult if we do not have a bound on kT−1k−1max, considering that even in the analysis of linear equations, we also need invertibility, condition numbers, etc.

Documents relatifs