Symmetric Case - 5 Proof Organization of Main Theorems

5 Proof Organization of Main Theorems

5.1 Symmetric Case

For shorthand, we write τ = kEk∞, and κ = √

dkEVk_max. An obvious bound for κ is κ ≤ √

rµ τ (by Cauchy-Schwarz inequality). We will use these notations throughout this subsection.

Recall the spectral decomposition of A in (8). Expressing E in terms of the column vectors of V and V⊥, which form an orthogonal basis in Rⁿ, we write

[V, V⊥]^TE[V, V⊥] =: E₁₁ E₁₂ E₂₁ E₂₂

. (27)

Note that E₁₂ = E₂₁^T since E is symmetric. Conceptually, the perturbation results in a

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of Sigma_u (Median)

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of Sigma_u (IQR)

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of inverse Sigma (Median)

error ratio

200 400 600 800

0.00.51.01.52.02.53.0

Spectral norm error of inverse Sigma (IQR)

error ratio

200 400 600 800

0.00.40.81.2

Relative Frobenius norm error of Sigma (Median)

error ratio

200 400 600 800

0.00.40.81.2

Relative Frobenius norm error of Sigma (IQR)

error ratio

Figure 5: Error ratios of robust estimates against varying dimension. Blue lines represent errors of Method (2) over Method (1) under different norms; black lines errors of Method (3) over Method (1); red lines errors of Method (4) over Method (1). (f_t^T, u^T_t) is generated by element-wise iid t-distribution with df = 3 (solid), 5 (dashed) and ∞ (dotted). The median errors and their IQR’s (interquartile range) over 100 simulations are reported.

rotation of [V, V⊥], and we write a candidate orthogonal basis as follows:

V := (V +V⊥Q)(I_r+Q^TQ)^−1/2, V⊥:= (V⊥−V Q^T)(Id−r+QQ^T)^−1/2, (28) where Q ∈ R^(d−r)×r is to be determined. It is straightforward to check that [V , V⊥] is an orthogonal matrix. We will choose Q in a way such that (V , V⊥)^TA(V , Ve ⊥) is a block

diagonal matrix, i.e.,V^T_⊥AVe = 0. Substituting (28) and simplifying the equation, we obtain Q(Λ₁ +E₁₁)−(Λ₂+E₂₂)Q=E₂₁−QE₁₂Q. (29) The approach of studying perturbation through a quadratic equation is known (see Stewart (1990) for example). Yet, to the best of our knowledge, existing results study perturbation under orthogonal-invariant norms (or unitary-invariant norms in the complex case), which includes a family of matrix operator norms and Frobenius norm, but excludes the matrix max-norm. The advantages of orthogonal-invariant norms are pronounced: such norms of a symmetric matrix only depend on its eigenvalues regardless of eigenvectors; moreover, with suitable normalization they are consistent in the sense kABk ≤ kAk · kBk. See Stewart (1990) for a clear exposition.

The max-norm, however, does not possess these important properties. An imminent issue is that it is not clear how to relate Q toV⊥Q, which will appear in (29) after expanding E according to (27), and which we want to control. Our approach here is to study Q:=V⊥Q directly through a transformed quadratic equation, obtained by left multiplying V⊥ to (29).

Denote H = V⊥E₂₁, Q = V⊥Q, L₁ = Λ₁ +E₁₁, L₂ = V⊥(Λ₂ +E₂₂)V_⊥^T. If we can find an appropriate matrix Qwith Q=V⊥Q, and it satisfies the quadratic equation

Q L₁ −L₂Q=H−QH^TQ, (30)

then Q also satisfies the quadratic equation (29). This is because multiplying both sides of (30) by V_⊥^T yields (29), and thus any solution Qto (30) with the formQ=V⊥Qmust result in a solution Qto (29).

Once we have such Q (or equivalently Q), then (V , V⊥)^TA(V , Ve ⊥) is a block diagonal matrix, and the span of column vectors of V is a candidate space of the span of first r eigenvectors, namely span{ev₁, . . . ,ev_r}. We will verify the two spaces are identical in Lemma 5.3. Before stating that lemma, we first provide bounds on kQk_max and kV −Vk_max. Lemma 5.1. Suppose |λ_r| −ε > 4rµ(τ + 2rκ). Then, there exists a matrix Q ∈ R^(d−r)×r such that Q = V_⊥Q ∈ R^d×r is a solution to the quadratic equation (30), and Q satisfies kQk_max≤ω/√

d. Moreover, if rω <1/2, the matrix V defined in (28) satisfies kV −Vk_max≤2√

µ ωr/√

d . (31)

Here, ω is defined as ω= 8(1 +rµ)κ/(|λr| −ε).

The second claim of the lemma (i.e., the bound (31)) is relatively easy to prove once the first claim (i.e., the bound on kQkmax) is proved. To understand this, note that we can

rewrite V as V = (V +Q)(I_r +Q^TQ)^−1/2, and kQ^TQk_max can be controlled by a trivial inequalitykQ^TQk_max≤dkQk²_max ≤w². To prove the first claim, we construct a sequence of matrices through recursion that converges to the fixed point Q, which is a solution to the quadratic equation (30). For all iterates of matrices, we prove a uniform max-norm bound, which leads to a max-bound on kQk_max by continuity. To be specific, we initialize Q⁰ = 0, and given Q^t, we solve a linear equation:

Q L₁−L₂Q=H−Q^tH^TQ^t, (32) and the solution is defined as Q^t+1. Under some conditions, the iterate Q^t converges to a limit Q, which is a solution to (30). The next general lemma captures this idea. It follows fromStewart (1990) with minor adaptations.

Lemma 5.2. Let T be a bounded linear operator on a Banach space B equipped with a norm k · k. Assume that T has a bounded inverse, and define β = kT⁻¹k⁻¹. Let ϕ: B → B be a map that satisfies

kϕ(x)k ≤ηkxk², and kϕ(x)−ϕ(y)k ≤2ηmax{kxk,kyk}kx−yk (33) for some η ≥ 0. Suppose that B0 is a closed subspace of B such that T⁻¹(B0) ⊆ B0 and ϕ(B₀) ⊆ B₀. Suppose y ∈ B₀ that satisfies 4ηkyk < β². Then, the sequence initialized with x₀ = 0 and iterated through

x_k+1=T⁻¹(y+ϕ(x_k)), k ≥0 (34) converges to a solutionx^? toT x=y+ϕ(x). Moreover, we havex^? ⊆ B0, andkx^?k ≤2kyk/β.

To apply this lemma to the equation (30), we view B as the space of matrices R^d×r with the max-norm k · kmax, and B0 as the subspace of matrices of the form V⊥Q where Q∈R^(d−r)×r. The linear operator T is set to be the T(Q) = Q L1−L2Q, and the map ϕis set to be the quadratic functionϕ(Q) = −QH^TQ. Roughly speaking, under the assumption of Lemma 5.2, the nonlinear effect caused by ϕ is weak compared with the linear operator T. Therefore, it is crucial to show T is invertible, i.e. to give a good lower bound on kT⁻¹k⁻¹_max = inf_kQk_max₌₁kT(Q)k_max. Since the norm is not orthogonal-invariant, a subtle issue arises when A is not of exact low rank, which will be discussed at the end of the subsection.

If there is no perturbation (i.e., E = 0), all the iterates Q^tare simply 0, so V is identical toV. If the perturbation is not too large, the next lemma shows that the column vectors of V span the same space as span{ev1, . . . ,evr}.

In other words, with a suitable orthogonal matrix R, the columns of V R are ev₁, . . . ,ev_r. Lemma 5.3. Suppose |λ_r| −ε > max{3τ,64(1 +rµ)r^3/2µ^1/2κ}. Then, there exists an or-thogonal matrix R∈R^r×r such that the column vectors of V R are ev₁, . . . ,ev_r.

Proof of Theorem 2.1. It is easy to check that under the assumption of Theorem 2.1, the conditions required in Lemma 5.1 and Lemma 5.3 are satisfied. Hence, the two lemmas imply Theorem2.1.

To study the perturbation of individual eigenvectors, we assume, in addition to the con-dition on |λ_r|, that λ₁, . . . , λ_r satisfy a uniform gap, (namely δ > kEk₂). This additional assumption is necessary, because otherwise, the perturbation may lead to a change of rel-ative order of eigenvalues, and we may be unable to match eigenvectors from the order of eigenvalues. SupposeR∈R^r×r is an orthogonal matrix such thatV Rare eigenvectors ofA.e Now, under the assumption of of Theorem2.1, the column vectors ofVe andV Rare identical up to sign, so we can rewrite the difference Ve −V as

Ve −V =V(R−Ir) + (V −V). (35) We already provided a bound on kV −Vk_max in Lemma 5.1. By the triangular inequality, we can derive a bound on kVk_max. If we can prove a bound on kR−I_rk_max, it will finally leads to a bound on kVe −Vkmax. In order to do so, we use the Davis-Kahan theorem to obtain an bound on hev_i, v_ii for all i ∈ [r]. This will lead to a max-norm bound on R−I_r (with the price of potentially increasing the bound by a factor of r). The details about the proof of Theorem 2.2 are in the appendix.

We remark that, we assume conditions on |λ_r| − in Theorem 2.1 and Theorem 2.2, which are only useful in cases where|λ_r|>kA−A_rk∞. Ideally, we would like to have results with assumptions only involving λ_r and λ_r+1, since Davis-Kahan theorem only requires a gap in neighboring eigenvalues. Unfortunately, unlike orthogonal-invariant norms that only depend on the eigenvalues of a matrix, the max-norm k · k_max is not orthogonal-invariant, and thus it also depends on the eigenvectors of a matrix. For this reason, it is not clear whether we could obtain a lower bound on kT⁻¹k⁻¹_max using only the eigenvaluesλ_r and λ_r+1 so that we could apply Lemma 5.2. The analysis appears to be difficult if we do not have a bound on kT⁻¹k⁻¹_max, considering that even in the analysis of linear equations, we also need invertibility, condition numbers, etc.

Dans le document An ` ∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation (Page 24-29)