HAL Id: hal-00651605
https://hal.archives-ouvertes.fr/hal-00651605
Submitted on 13 Dec 2011
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A Fresh Variational-Analysis Look at the Positive
Semidefinite Matrices World
Jean-Baptiste Hiriart-Urruty, Jérôme Malick
To cite this version:
Jean-Baptiste Hiriart-Urruty, Jérôme Malick. A Fresh Variational-Analysis Look at the Positive Semidefinite Matrices World. Journal of Optimization Theory and Applications, Springer Verlag, 2012, 153 (3), pp.551-577. �10.1007/s10957-011-9980-6�. �hal-00651605�
Survey Paper
A Fresh Variational-Analysis Look at
the Positive Semidefinite Matrices World
Jean-Baptiste Hiriart-Urruty1 J´erˆome Malick2
Communicated by Michel Th´era November 11, 2011
Abstract
Engineering sciences and applications of mathematics show unambiguously that positive semidefiniteness of matrices is the most important generalization of non-negative real num-bers. This notion of non-negativity for matrices has been well-studied in the literature; it has been the subject of review papers and entire chapters of books.
This paper reviews some of the nice, useful properties of positive (semi)definite matrices, and insists in particular on (i) characterizations of positive (semi)definiteness and (ii) the geometrical properties of the set of positive semidefinite matrices. Some properties that turn out to be less well-known have here a special treatment. The use of these properties in optimization, as well as various references to applications, are spread all the way through.
The “raison d’ˆetre” of this paper is essentially pedagogical; it adopts the viewpoint of variational analysis, shedding new light on the topic. Important, fruitful, and subtle, the positive semidefinite world is a good place to start with this domain of applied mathematics. Keywords: positive semidefiniteness, optimization, convex analysis, eigenvalues, spectral functions, Riemannian geometry
AMS Classifications: 65K10, 90C22, 15A18
1
Positive (Semi)Definiteness: a Useful Notion
Positive (semi)definite matrices are fundamental objects in applied mathematics and engineering. For example, they appear as covariance matrices in statistics, as Lyapunov functions in control, as kernels in machine learning, as diffusion tensors in medical imaging, as lifting matrices in combinatorics – just to name a few of the various uses of the concept.
Even if a first generalization of positive real numbers in matrix spaces would be the entry-wise positive matrices (see [1, Chap. 8]), the numerous applications mentionned above show that the most important generalisation is the positive semidefinite matrices. This notion of non-negativity for matrices has been well-studied in the literature; it has been the topic of review papers and entire chapters of books, for example [1, Chap. 7], [2, Chap. 6], [3, Chap. 8], [4, Chap. 1] and [5, Sec. 6.5]. Besides, a fruitful branch of mathematical optimization is devoted to problems involving positive semidefinite matrices; it is called semidefinite programming and has proved to be a powerful tool to model and solve many problems in engineering or science [6],[7].
1
Institut de math´ematiques, Universit´e Paul Sabatier, Toulouse, France [email protected]
2CNRS, Lab. J. Kuntzmann, Grenoble, France, [email protected] (corresponding author) 3
Acknowledgement: The authors thank J.-B. Lasserre for having made us aware of the characterization of the positive semidefiniteness by non-negativity of the principal invariants (see Section 3.2).
In this article, we gather some results about positive (semi)definite matrices: some are well-known, some are more original, all have interest. The special feature of this paper is that we adopt the viewpoint of variational analysis, shedding new light on these topics. The term “variational” is not to be understood in its old historical meaning (calculus of variations), but in the broadest possible sense (optimization, convex analysis, nonsmooth analysis, complementary problems,...). We insist in particular on the harmonious interplay between matrix analysis and optimization.
The set of positive semidefinite matrices is a closed, convex cone in the space of symmetric matrices, which enjoys a nice geometry, exploited in several applications. We review some of these useful and sometimes subtle geometrical properties – again with a variational eye. During this study, we will illustrate convex, semialgebraic and Riemannian geometries. As a nice, concrete example of advanced mathematics, positive semidefiniteness also has some pedagogical interest – on top of all its practical uses.
2
Positive (Semi)Definiteness: a Multi-Facet Notion
We start with a bunch of simple ideas. The goal of this first section is, while sticking to basic properties, to show how different notions and various domains come into play when talking about positive semidefinite matrices. We also recall the background properties, introduce notation, and give a flavour of what comes next.
Let Sn(R) be the linear space of symmetric real matrices of size n×n. A matrix A ∈ Sn(R) is said to be positive semidefinite (denoted A 0), if
x⊤A x ≥ 0 for all x ∈ Rn, (1)
where the superscript ·⊤ denotes the transposition of matrices. The matrix A is further called positive definite (denoted by A ≻ 0) if the above inequality is strict for all nonzero x.
2.1 (Bi)Linear Algebra at Work.
Let us recall the key-result for symmetric matrices: the n eigenvalues λi of a matrix A ∈ Sn(R)
lie in R, and there exists an orthogonal matrix U such that U⊤AU is the diagonal matrix with the λi on the diagonal, denoted by diag(λ1, . . . , λn). This result can also be written as follows:
there exist n unit eigenvectors xiof A (the columns of U ) such that we have the so-called spectral
decomposition A = n X i=1 λixixi⊤.
This result mixes nicely linear and bilinear algebra: U allows both the diagonalization of A as U−1AU = diag(λ
1, . . . , λn) (linear algebra world), and the reduction of the quadratic form
associated with A as U⊤AU = diag(λ1, . . . , λn), such that hA(Uy), Uyi = hdiag(λ1, . . . , λn)y, yi
(bilinear algebra world). There are several proofs of this result; one is optimization-based (see e.g. [8, p.222]).
This decomposition is used to define the square root matrix: if A 0, the square root of A, denoted by A1/2, is the unique S 0 such that S2 = A (given explicitly by the decomposition U⊤SU = diag(√λ1, . . . ,√λn)). When A ≻ 0, we also have (A1/2)−1 = (A−1)1/2 so that the
notation A−1/2 (that we will often use) is not ambiguous.
We can also reduce two matrices at the same time: the so-called simultaneous reduction (see e.g. [1, 4.6.12]) says that for a symmetric matrix A and a positive definite matrix B, there exists an invertible matrix P such that
P⊤AP = diag(λ1, . . . , λn) and P⊤BP = diag(µ1, . . . , µn).
We are not aware of simple results of that kind for more than two matrices; the reason might be deeper than one would think (see Problem 12 in [9]).
2.2 Inner Products.
The space Rn is equipped with the canonical inner product
hx, yi :=
n
X
i=1
xiyi = x⊤y.
We prefer to use the notation h·, ·i, often more handy for calculation involving the transpose. Up to a double sum, this inner product is the same for symmetric matrices: we equip Sn(R) with the canonical inner product
hhA, Bii :=
n
X
i,j=1
AijBij = trace(A⊤B) = trace(AB).
A connection between these two scalar products is
hAx, xi = hhA, xx⊤ii.
This easy property has an important role in combinatorial optimization: the so-called “lifting” is the first step of the semidefinite relaxation of binary quadratic problems (see e.g. [10]). Another interesting connection between the inner products is the Ky-Fan inequality: A, B ∈ Sn(R)
satisfy the inequality
hhA, Bii ≤ hλ(A), λ(B)i,
where λ(A) := (λ1(A), . . . , λn(A)) ∈ Rn is the vector of the eigenvalues of A ordered
nonin-creasingly. Moreover equality holds in the above inequality if and only if A and B admit a simultaneous eigendecomposition with ordered eigenvalues (see e.g. [11] and references therein). This inequality is a key tool for the study of spectral functions and spectral sets [12]; we will come back to spectral sets from time to time in next sections.
2.3 Quadratic Forms and Differential Calculus.
Naturally associated with A ∈ Sn(R) is the quadratic form
q : x ∈ Rn7−→ q(x) := 12hAx, xi.
The definition of semidefiniteness (1) reads q(x) ≥ 0 for all x ∈ Rn. From an optimization point
of view, we have
inf
x∈Rnq(x) > −∞ ⇐⇒ A 0
and the set of minimizers is then ker A. The quadratic form q is obviously of class C∞ on Rn, and we have for all x ∈ Rn
∇q(x) = Ax and ∇2q(x) = A. (2)
This is easy to remember: we formally have the same formulae for q(x) = ax2/2 with x ∈ R. Note that the factor 1/2 in the definition of q aims at avoiding the factor 2 when differentiating q. When A is positive definite, q defines a norm on Rn. This can be seen for example by the
simultaneous reduction: there exists P invertible such that P⊤AP = In, or in other words, up
to the change of variables y = P−1x, the quadratic form q is just the simpler form k · k2: kyk2 = hy, yi = hP⊤AP y, yi = hAP y, P yi = hAx, xi.
2.4 When Convex Analysis Enters Into the Picture.
Recall that a function f : Rn→ R is convex if
∀x, y ∈ Rn, α ∈ [0, 1], f (αx + (1 − α)y) ≤ αf(x) + (1 − α)f(y).
When f is smooth, convexity is characterized by the positive semidefiniteness of its Hessian matrix ∇2f (x) for all x ∈ Rn (see e.g. [13, B.4.3.1]). From (2), we get that q is convex if and
only if A 0. (Let us also mention that strict and strong convexities of q coincide and are equivalent to A ≻ 0.)
During our visit to the world of the positive semidefinite matrices, we will meet several notions from convex analysis (polar cone, Moreau decomposition, tangent and normal cones,...). The first notion is the Legendre-Fenchel transformation, which is an important transformation in convex analysis (see [13, Chap.E]) introducing a duality in convex optimization. The Legendre-Fenchel transformation defined by
q∗: s ∈ Rn7−→ q∗(s) := sup
x∈Rn{hs, xi − q(x)} ,
has a simple explicit expression for q(x) = hAx, xi/2 (with A 0). In the case A ≻ 0, we obtain q∗(s) = 1
2hA
−1s, si.
Incidentally, this expression allows us to establish B−1 ≺ A−1if B ≻ A, with a quickly variational proof (first use the fact that q ≤ p implies q∗ ≥ p∗, and then the fact that B − A ≻ 0). This
latter result is the matrix counterpart of the decreasing of the inverse for positive numbers. Note moreover that the involution A 7→ A−1 of linear algebra corresponds to the involution q 7→ q∗ of convex analysis.
If we just have A 0, then (see e.g. [13, E.1.1.4]) q∗(s) =
1
2hx0, si where Ax0= s if s ∈ Im A
+∞ if s 6∈ Im A.
Note finally that making the transformation twice, we come back home: (q∗)∗ = q, as expected.
2.5 Do Not Forget Geometry.
A geometrical view is often used to introduce linear algebra (as in textbook [5]). The geometrical object also associated with a matrix A ≻ 0 is the ellipsoidal convex compact set
EA:= {x ∈ Rn: hAx, xi ≤ 1}.
The eigenvalues of A give the shape of EA (see [5, Chap. 6]); it is the unit ball of the norm
given by A. The linear function A−1/2 maps the canonical unit ball B to EA; so the volume
of EA is 1/
p
det(A) times the volume of B (see Problem 3 in [9] for a question on the volume of convex bodies). We mention that the inversion A 7→ A−1 coincides for E
A with the polarity
transformation of convex compact bodies having 0 in the interior (see [13, Chap. C]).
2.6 Convex Cone, Visualisation.
The set of positive semidefinite matrices Sn
+(R) =
n
A ∈ Sn(R) : x⊤A x ≥ 0 for all x ∈ Rno (3) enjoys a nice geometry. We give here a first flavour of its geometry when n = 2 and we will come back to this topic again in Section 4.
It is not difficult to show by means of definition (3) that S+n(R) is a (closed) convex cone. Recall that this means
α ∈ [0, 1], A, B ∈ S+n(R) =⇒ αA + (1 − α)B ∈ S+n(R), α ≥ 0, A ∈ S+n(R) =⇒ αA ∈ S+n(R).
Since S2(R) is of dimension 3, we can visualize the cone S+2(R) as a subset of R3. We identify S2(R) and R3 by an isomorphism ϕ : A = a b b c ∈ S2(R) 7−→ ϕ(A) ∈ R3.
We can choose the isometry ϕ(A) = (a,√2b, c), but the best looking isomorphism is ϕ(A) := √1
2(2b, c − a, c + a) that allows us to identify the cone S2(R)
a b b c : a ≥ 0, c ≥ 0, and ac − b2 ≥ 0 . with ϕ(S2(R)), which is n (x, y, z) ∈ R3, z ≥px2+ y2o.
Figure 1 shows that ϕ(S2(R)) is the “usual” cone K of R3, also called Lorentz cone or second-order cone in R3.
y z
x
Figure 1: The usual cone in R3, identified with S+2(R)
The boundary of K corresponds to the rank-one matrices, and the apex of K to the zero matrix. To avoid giving an incorrect intuition, we emphasize that even if the boundary of S+2(R) deprived from 0 appears to be a smooth manifold, this is not true in general when n > 2 (Sn
+(R)
is in fact the union of smooth manifolds; see Section 4.4).
3
Characterizations of Positive (Semi)Definiteness
3.1 Characterizations of Positive DefinitenessThe characterizations of positive definiteness are numerous; each of them has its own usefulness, depending on the context. For example, the Gram formulation appears naturally in statistics, the characterization by invariants in mechanics, or the exponential in dynamical systems. We discuss some of these characterizations here: we start with a couple of characterizations that use decompositions; then, we put emphasis on the characterizations by the positivity of n real numbers; we finish with a more original characterization.
3.1.1 Factorization-Like Characterizations
One of the most useful and most basic characterizations is the following factorization: A is positive definite if and only if there exists B invertible such that A = BB⊤ (indeed in this case hAx, xi = kBxk2 for all x ∈ Rn). There exist in fact different factorizations of this form, among
those: the Cholesky factorization with B upper-triangular with positive diagonal elements; and the square root factorization, with B symmetric positive definite (see e.g. [2, Chap. 6]).
Other famous decomposition-like characterizations use exponential or Gram matrices: the property A ≻ 0 is characterized by each of the two properties:
• Exponential form (see e.g. [14, Ex.5-28]): there exists B ∈ Sn(R) such that A = exp(B).
• Gram form (see e.g. [1, 7.2.40]): there exists a n independant vectors {v1, . . . , vn} of Rn
(hence forming a basis of Rn) such that A
ij = hvi, vji for all i, j.
3.1.2 Positivity of n Real Numbers
Important characterizations of positive definiteness rely on the positivity of n real numbers associated with symmetric matrices. For example, it is well-known that positive eigenvalues characterize positive definiteness; we gather in next theorem three similar properties.
Theorem 3.1 (Positive definiteness by positivity of n real numbers). Each of the following properties is equivalent to A ≻ 0:
(i) the eigenvalues λ1(A), . . . , λn(A) of A are positive;
(ii) the leading principal minors ∆1(A), . . . , ∆n(A) of A are positive;
(iii) the principal invariants i1(A), . . . , in(A) are positive.
We recall that the leading principal minors are defined as ∆k(A) := det Ak for k = 1, . . . , n,
where Ak is the submatrix of A made of the first k rows and first k columns of A. Recall also
that the principal invariants of A are (up to a change of sign) the coefficients of the monomials of the characteristic polynomial PA(x) of A; more precisely
PA(x) = (−x)n+ i1(A)(−x)n−1+ · · · + ik(A)(−x)n−k+ · · · + in(A),
whence
ik(A) :=
X
1≤n1<···<nk≤n
λn1(A)λn2(A) · · · λnk(A) (4)
where the eigenvalues λi(A) are the roots of PA. For example
i1(A) = trace(A) = λ1(A) + · · · + λn(A) and in(A) = det(A) = λ1(A) · · · λn(A).
Another definition of ik(A) is
ik(A) := sum of the principal minors of order k of A.
A principal minor of order k is the determinant of the k × k matrix extracted from A by considering the rows and columns n1, . . . , nk with 1 ≤ n1 < · · · < nk≤ n.
Proof. (of Theorem 3.1). There are several ways to prove (ii): for instance, by purely linear algebra techniques, by induction [1, p.404], or by quadratic optimization techniques [8, p.220], [15, p.409]. The result also follows from the factorization of a matrix in a product of triangular matrices; we sketch this proof (see more in [14]). Let A ∈ Sn(R) such that ∆
for all k = 1, . . . , n. Then there exists a (unique) lower triangular matrix with ones on the diagonal, denoted by S, such that
A = SDS⊤ where D = diag∆1(A), ∆2(A)/∆1(A), . . . , ∆n(A)/∆n−1(A)
. We call d1 := ∆1(A), dk:= ∆k(A)/∆k−1(A) the so-called (principal) pivots of A. So we have
A ≻ 0 ⇐⇒ (dk> 0 for all k = 1, . . . , n) ⇐⇒ (∆k(A) > 0 for all k = 1, . . . , n).
Then it is easy to establish characterization (ii).
Let us prove here characterization (iii) which is less usual. One implication comes easily: if A ≻ 0, then all the eigenvalues of A are positive; each ik(A) is then positive as well (recall (4)). To
establish the reverse implication, we assume that ik(A) > 0 for all k, so in particular det(A) > 0.
Suppose now that there exists a negative eigenvalue of A, call it λi0(A) < 0. Then observe that
0 < PA(λi0(A)) = (−λi0(A))
n+ · · · + i
k(A)(−λi0(A))
n−k + · · · + i n(A)
which contradicts the fact that PA(λi0(A)) = 0. Thus we have λi(A) ≥ 0 for all i, and since
0 < det(A) = λ1(A) · · · λn(A), we get that λi(A) > 0 for all i, which guarantees that A is
positive definite.
Example 3.1 (Invariants in dimension 2 and 3). Let us illustrate in small dimension the third point of the previous theorem. In the case n = 2, we have the simple formulation
PA(x) = x2− trace(A)x + det A.
We note by the way that this is a particular case of the development
det(A + B) = det A + hhcof A, Bii + det B = det A + hhA, cof Bii + det B,
where cof M stands for the cofactors matrix of the matrix M (in general, this development is obtained by differential calculus with the fact that the determinant is multilinear, see [16]). Let us come back to our situation: the two principal invariants of A are trace A and det A, so we
have a b b c ≻ 0 ⇐⇒ a > 0 ac − b2 > 0 ⇐⇒ a + c > 0 ac − b2 > 0. For the case n = 3, we have
PA(x) = −x3+ trace(A)x2− trace(cof A)x − det A.
which is again a particular case of the nice formula
det(A + B) = det A + hhcof A, Bii + hhA, cof Bii + det B. The three principal invariants are
i1(A) = traceA = λ1(A) + λ2(A) + λ3(A)
i2(A) = (traceA)2− trace(A2)/2
= λ1(A)λ2(A) + λ2(A)λ3(A) + λ1(A)λ3(A) = trace(cof(A))
i3(A) = (traceA)3− 3trace(A)trace(A2) + 2trace(A3)
/6 = λ1(A)λ2(A)λ3(A) = det A.
Thus we have a b d b c e d e f ≻ 0 ⇐⇒ a > 0 ac − b2 > 0 det A > 0 ⇐⇒ a + c + f > 0 (cf − e2) + (af − d2) + (ac − b2) > 0 det A > 0 , which gives a handy characterization in dimension 3. These principal invariants are widely used for n = 3 in mechanics where they often have a physical interpretation (for example as stresses and strains); see the textbook [17].
3.1.3 Characterization by Polar Cones
Here is now a more original characterization, using convex cones in Rn. We start by recalling some convex analysis that is needed; those notions (polar cones and Moreau polar decomposition) have some interest of their own, and turn out to be less known than they deserve (see, for more details, [13]).
If K is a closed, convex cone in a Euclidean space (E , h·, ·i), then the polar cone K◦ of K is the closed, convex cone of E made of all the points whose projection onto K is 0; in other terms,
K◦:= {s ∈ E : hs, xi ≤ 0, for all x ∈ K}. (5)
Examples: for K = Rn+, we have K◦= Rn−; for a linear subspace K = V , we have K◦ = V⊥. We see that polarity is to (closed) convex cones what orthogonality is to linear subspaces. The fundamental result on this polarity is its reflexivity: doing the polar transformation twice leads back home, i.e. (K◦)◦ = K. Another result that we will need later (and that is easy to prove from the definition) is that for an invertible matrix B, we have
[B(K)]◦ = B−⊤(K◦). (6)
The most important result on polar cones is with no doubt the following Moreau decompo-sition. Let x, x1 and x2 be three elements of E ; then the properties (i) and (ii) are equivalent:
(i) x = x1+ x2 with x1 ∈ K, x2 ∈ K◦ and hx1, x2i = 0;
(ii) x1= ProjK(x) and x2= ProjK◦(x).
Here ProjC stands for the projection onto the closed, convex set C. The decomposition of Rn
on K and K◦ generalizes the decomposition of Rn as the direct sum of a linear space L and its orthogonal complement L⊥. Note that if we know how to project onto K, we get as a bonus the projection onto K◦, which could be interesting in practical cases when K◦ is more complicated
than K (or the other way around).
Let us come back to matrices and prove the following original characterization of positive definiteness.
Proposition 3.1. Let A ∈ Sn(R) be invertible and K be a closed, convex cone of Rn; then
A ≻ 0 ⇐⇒
hAx, xi > 0 for all x ∈ K r{0} and hA−1x, xi > 0 for all x ∈ K◦r{0}.
Note the nice symmetry of the result since (A−1)−1 = A and (K◦)◦= K. This characteriza-tion, whose proof is not straightforward, is due to [18]. We present here a proof, suggested by X. Bonnefond, that uses the Moreau decomposition, as expected.
Proof. The fact that the condition be necessary follows directly from the definition of positive definiteness; we focus on sufficiency. Let A be an invertible matrix satisfying the condition, and consider an eigendecomposition A = U DU⊤(where D = diag(λ1, . . . , λn) with nonzero λi).
Observe first that up to a change of cone (K ← U⊤K), we can assume that A be diagonal: the
condition can indeed be written, with the help of (6), as
hDx, xi > 0 for all x ∈ U⊤K r{0}
hD−1x, xi > 0 for all x ∈ U⊤K◦r{0} = (U⊤K)◦r{0}.
So we consider that A = D is diagonal, and we just have to prove that the condition n X i=1 λix2i > 0 for all x ∈ K r{0} n X i=1 1 λi x2i > 0 for all x ∈ K◦r{0}
yields that all the λi are strictly positive.
For the sake of contradiction, assume that only the k (with k < n) first eigenvalues λ1, . . . , λk
be positive, while the n − k last λk+1, . . . , λnbe negative. We do one more step of simplification
of the expression by scaling the variables by S = diag(p|λ1|, . . . ,
p |λn|): introducing eK := SK, we observe that k X i=1 y2i > n X i=k+1 yi2 for all y ∈ eK r{0} k X i=1 zi2 > n X i=k+1 zi2 for all z ∈ S−1K◦r{0} = eK◦r{0}. (7)
After multiplying these two inequalities together, the Cauchy-Schwarz inequality yields
∀y ∈ eK r{0}, z ∈ eK◦r{0}, k X i=1 yi2 k X i=1 zi2 > n X i=k+1 ziyi !2 . (8)
We apply this inequality to a well-chosen couple of vectors y and z to get a contradiction. Consider the basis vector e = (0, . . . , 0, 1) and note that e does not lie in eK nor in eK◦. So its Moreau decomposition gives
e = y + z with y ∈ eK r{0}, z ∈ eK◦r{0} and hy, zi = 0. Observe that we have zi = −yi for all i = 1, . . . , k so that
k X i=1 zi2 = k X i=1 y2i.
Now the orthogonality of y and z gives
n X i=k+1 ziyi = − k X i=1 ziyi = k X i=1 yi2 ! .
This contradicts (8), so proves that k = n, which establishes the characterization.
3.2 Characterizations of Positive Semidefiniteness
Most of the characterizations of positive definiteness have their positive semidefiniteness counter-parts. In this section, we briefly highlight some similarities and differences between the positive definite and semidefinitenesses.
The property A 0 is equivalent to each of the following statements.
• There exists a matrix B such that A = BB⊤. This can also be read as: the quadratic form qA(x) = kB⊤xk2 is zero on ker B = ker A.
• The eigenvalues λ1, . . . , λn of A are non-negative. The set S+n(R) can thus be seen as the
inverse image of Rn +
Sn
+(R) = λ−1(Rn+)
by the eigenvalue function λ : Sn(R) → Rn assigning to a symmetric A its eigenvalues in a nonincreasing order. The convex set Sn
+(R) is then a convex spectral set in the sense of
[12]. Note that, more generally, we can prove that a spectral set λ−1(C) is convex if and
• For k = 1, . . . , n, all the principal minors of order k of A are non-negative. This means that the determinants of all the submatrices made of k rows and k columns should be non-negative – and not only the n leading principal minors ∆k(A) of Theorem 3.1. Having only
∆k(A) ≥ 0 does not guarantee semidefiniteness, as shown by the following counter-example
where A is not semidefinite positive (it is actually semidefinite negative) A =
0 0
0 −1
, and det A1 = det A2= 0.
In general, checking all the principal minors to draw a conclusion about the positive semidefiniteness would mean checking 2n− 1 polynomial relations – and not only n as for the positive definiteness, which is a surprising gap! In the next characterization though, only n polynomial relations come into play.
• The principal invariants i1(A), . . . , in(A) are non-negative. This is a sort of aggregate form
of the previous condition, since for all k, the invariant ik(A) is the sum of nk
principal minors. Checking the positivity of those n polynomial relations does yield semidefinite positiveness. The proof is conducted in the same way as for (iii) in Theorem 3.1 (see [1, p.403]). Counting the number of principal minors involved in this characterization by the ik(A), we naturally retrieve 2n− 1 since
n 1 + n 2 + · · · + n n = 2n− 1.
4
Geometry of the Set of Positive (Semi)Definite Matrices
The set S+n(R) of positive semidefinite matrices is a closed, convex cone of Sn(R) that enjoys nice, subtle geometrical properties. Some aspects are exposed in this section; we see the cone Sn
+(R) as a concrete example to illustrate various notions of convex geometry and smooth
ge-ometry.
4.1 Closedness by Matrix Operations
Closedness by Additions. As a convex cone, Sn
+(R) is closed by addition and by
multipli-cation by α ≥ 0. We can add moreover the following property (easy to see by definitions) (A 0, B ≻ 0) =⇒ A + B ≻ 0.
The cone S+n(R) is also closed by another addition, the so-called parallel sum. When A ≻ 0 and B ≻ 0, the parallel addition is defined by
A //B := (A−1+ B−1)−1.
Since both the inversion and the sum preserve S++n (R), the parallel addition preserves it as well. This operation was introduced [19] by analogy with electrical networks: by the Kirchhoff law, two wires of resistances r1 and r2 connected in parallel have a total resistance r such
that 1/r = 1/r1 + 1/r2. When A, B 0, we could define A //B using pseudo-inversion (see
e.g. [4]). It turns out that this sum has also a natural variational-analysis definition (see e.g. [20, Prob. 25]) as
h(A //B)x, xi = inf
y+z=x{hAy, yi + hBz, zi}.
We recognize this as the infimal convolution of the two quadratic forms qAand qB; this operation
Closedness by Multiplications. The product of two positive semidefinite matrices is not positive semidefinite in general: the matrix product destroys symmetry. There is no way to get out of this since we have the following (rather surprising) result:
Any n × n-matrix A can be described as the product of two symmetric matrices.
This result is presented by [21] and [22] that give credit to G. Frobenius (1910). A quick way to prove it is to use the fact that a matrix A and its transpose are similar: there exists an invertible S ∈ Sn(R) (non-unique, though) such that A⊤ = S−1AS [23]. As a consequence, we can decompose A = S1S2 with S1 = SA⊤ (which is symmetric by choice of S) and S2 = S−1
which is symmetric.
Let us come back to (non)closedness by multiplication and give a positive result. Imposing a posteriori symmetry solves indeed the problem: if A, B 0 then the product AB is positive semidefinite whenever it is symmetric – that is exactly when A and B commute. We add here two (not well-known) results of the same kind:
• Result of E. Wigner [24]. If A1, A2, A3≻ 0 and the product A1A2A3 is symmetric, then it
is positive definite.
• Result of C. Ballantine [25]. Except when A = −λIn with λ > 0 and n even, any square
matrix with positive determinant can be written as the product of four positive definite matrices. (For the case left aside, five positive definite matrices are needed.)
We also mention that, for another nonstandard matrix product, S+n(R) is closed without any further assumption. If A = (aij) and B = (bij), we define the matrix product C = A ◦ B = (cij)
by cij = aijbij for all i, j. Result: if A and B are positive definite (resp. semidefinite) then so is
the product A ◦ B (see [1, 7.5.3]).
4.2 Convex Sets Attached with Sn +(R)
4.2.1 Interior and Boundary
The (relative) interior of a convex set is always convex; here the interior of S+n(R) turns out to be exactly Sn
++(R). This is another appearance of the analogy with R+, since the interior of
(R+)n is (R++)n. The boundary of S+n(R) is exactly made up of the singular matrices A 0.
There we find the positive semidefinite matrices of rank k, for 1 ≤ k ≤ n − 1, forming smooth manifolds fitting together nicely (see Section 4.4).
4.2.2 The Facial Structure of S+n(R)
A convex subset of F ⊂ C is a face of C if every segment [a, b] ⊂ C such that there exists an element of F in ]a, b[ is entirely contained in F . All the faces of Sn
+(R) are moreover exposed
faces: a face F of S+n(R) is the intersection of S+n(R) with a supporting hyperplane (that is a hyperplane H such that S+n(R) is entirely contained in one of the closed halfspaces delimited by H). More precisely, the result is the following (see [26] pointing to earlier references and discussing about the generalization to spectral sets): F is a face of S+n(R) if and only if F is the convex cone spanned byvv⊤: v ∈ V where V is a subspace of Rn. The dimension of F is then d(d + 1)/2 if d is the dimension of V . For example with n = 2: there is a face of dimension 0 (the apex of the cone Sn
+(R)), faces of dimension 1 (extremal rays of S2+(R) directed by
vectors xx⊤ with nonzero x ∈ Rn), and a face of dimension 3 (the whole S2++(R)).
There is another way to generate all the faces of S+n(R), as follows. Let L be a linear subspace of Rn of dimension m, and set
FL:=A ∈ S+n(R) : L ⊂ ker A
;
then FL is a face of S+n(R) of dimension r(r + 1)/2 with r = n − m. When L ranges all the
Let us mention that the knowledge of the faces of S+n(R) is useful to for some optimisation problems with semidefinite constraints. Indeed, the so-called “facial reduction technique” uses the explicit form of the faces to reformulate degenerate semidefinite optimization problems into problems in smaller dimension; see the initial article [27], and the recent [28] for an application to sensor network problems.
4.2.3 Polar Cone and Projection
In the space Sn(R) equipped with the inner product hh·, ·ii, the polar cone (5) of K = Sn +(R)
is simply K◦ = −S+n(R). It is also easy to determine the Moreau decomposition of a matrix A ∈ Sn(R) onto S+n(R) and its polar. Indeed consider an orthogonal eigendecomposition U⊤AU = diag(λ
1, . . . , λn); then the two matrices
A+ := U diag max{0, λ1}, . . . , max{0, λn}U⊤
A− := U diag min{0, λ1}, . . . , min{0, λn}U⊤
are such that A = A++ A− and hA+, A−i = 0. Thus A+(resp. A−) is the projection of A onto
Sn
+(R) (resp. onto its polar −S+n(R)). Said otherwise, to project A onto S+n(R), we just have to
compute an eigendecomposition of A and cut off negative eigenvalues by 0. It is remarkable that we have an explicit expression of the projection onto S+n(R), and that this projection is easy to compute (essentially by computing an eigendecomposition). More sophisticated projections onto subsets of S+n(R) are also computable using standard tools of numerical optimization [29]; those projections have many applications in statistics and finance (see [30] for an early reference of the projection onto S+n(R); see also [29, section 5]).
Finally note that we can interpret the projection of A onto S+n(R) = λ−1(Rn+) with respect to the projection of its eigenvalues onto Rn
+, namely
ProjSn
+(R)(A) = U diag ProjR n
+(λ1, . . . , λn)
U⊤.
This result has a nice generalization: we can compute a projection onto a spectral set λ−1(C)
as soon as we know how to project onto the underlying C (see [31]). 4.2.4 Tangent and Normal Cones
Normal cones in convex geometry play the role of normal spaces in smooth geometry: they give “orthogonal” directions to a set at a point of the set. Their “duals”, the tangent cones, then give a simple conic approximation of a convex set around a point.
The normal cone to S+n(R) at X ∈ S+n(R) can be defined as the set of directions S ∈ Sn(R) such that the projection of X + S onto S+n(R) is X itself. In mathematical terms, this means
NA:= {S ∈ Sn(R) : hhS, B − Aii ≤ 0 for all B ∈ S+n(R)}.
The tangent cone is then defined as the polar of NA and admits the characterization:
TA:= NA◦= R+(S+n(R) − A).
To fix ideas, we give a illustrative representation of the cone with its tangent and normal space at a point (beware of the fact that this is not a real representation, since for n = 2 the boundary of the cone is smooth).
For the special case A = 0, we have simply N0= S+n(R)◦= −S+n(R) and T0= S+n(R). For
the general case A ∈ Sn
+(R), we have
TA = {M ∈ Sn(R) : hMu, ui ≥ 0 for all u ∈ ker A} (9)
NA = {M 0 : hhM, Aii = 0}
= {M 0 : MA = 0} = {M 0 : AM = 0}
NA
A TA
Sn +(R)
Figure 2: (Nonrealistic) illustration of the normal and tangent cones to S+n(R) at A
The first expression of NA comes from the general result for closed, convex cones (see [13,
A.5.2.6]). The second one follows from the (quite surprising) property: for A, B ∈ S+n(R), we have: hhA, Bii = 0 ⇐⇒ A B = 0.
which has a one-line proof using a classical trick, as follows
hhA, Bii = traceA1/2A1/2B1/2B1/2= traceB1/2A1/2A1/2B1/2= kA1/2B1/2k2.
The third expression comes easily from the second. We can also describe NA one step further
from the last expression. Take A ∈ S+n(R) of rank r (< n). The dimension of ker A is n − r, let
{ur+1, . . . , un} be a basis of it. Then
M ∈ NA ⇐⇒ M =
n
X
i=r+1
αi uiui⊤ with αi ≤ 0 (i = r + 1, . . . , n).
4.2.5 Spectahedron, the Spectral Simplex
A special subset of S+n(R) plays an important role in the variational analysis of the eigenvalues (see [32], [33]) and in some applications of semidefinite programming (as for example for sparse principal component analysis [34]). The so-called spectahedron is defined by
Ω1 := {A 0 : trace(A) = 1}. (10)
This set is convex and compact; its extreme points are exactly the matrices xx⊤with unit-vectors x ∈ Rn. The spectahedron equivalently defined through the eigenvalues: Ω
1 = λ−1(Π1) is the
spectral set of Π1 := {(λ1, · · · , λn) : λi ≥ 0,Pni=1λi = 1} the unit-simplex of Rn.
This set and its properties generalize as follows. For an integer m ∈ {1, . . . , n}, set Ωm:= {A 0 : trace(A) = m and λmax(A) ≤ 1}.
For example, Ωn is obviously reduced to a single element, the identity matrix In. Note that
we did not add λmax(A) ≤ 1 in the definition of Ω1, since it was automatically satisfied. The
general result is the following: the set Ωm is convex and compact, and it is the convex hull
of the matrices XX⊤, where X is n × m-matrix such that X⊤X = I
m. In other words: the
convex hull of the orthogonal projection matrices of rank m is exactly the set of symmetric matrices whose eigenvalues are between 0 and 1, and whose trace is m. The proof of this result
trace(·) = 1 Ω1
Sn +(R)
In
Figure 3: (Nonrealistic) representation of the spectahedron
starts with showing that Ωm is a spectral Ωm= λ−1(Πm) with the compact convex polyhedron
Πm := {(λ1, · · · , λn) : 0 ≤ λi≤ 1,Pni=1λi = m}. Then we can determine the extremal points
of Πm: the point (¯λ1, . . . , ¯λn) is extremal in Πm if and only if all the ¯λi are zeros, expected m
of them that are ones.
4.3 Representation as Inequality Constrained Set: Nonsmooth Viewpoint
A common, pleasant situation in optimization is when a constraint set C is represented with the help of inequalities
g1(x) ≤ 0, . . . , gp(x) ≤ 0,
and the interior of C with the help of strict inequalities with the same functions g1(x) < 0, . . . , gp(x) < 0.
Does there exist such a representation for C = S+n(R) and its interior S++n (R) ? The answer is yes. We highlight two types of representing functions: a nonsmooth one in this section, and polynomial ones in next section.
Consider the function g : Sn(R) → R defined by
g(A) := λmax(−A) = −λmin(A). (11)
This function is convex and positively homogeneous, as a maximum of linear functions. Indeed, we have the well-known Rayleigh quotient
λmax(A) = max
kxk=1hAx, xi = maxkxk=1hhA, xx
⊤ii = max
X∈Ω1hhA, Xii
,
where Ω1 is defined by (10). More precisely, the last expression says that λmax is the
sup-port function of Ω1 (the notion of support function is fundamental in convex analysis; see [13,
Chap.C]). Let us come back to the representation of S+n(R) as an inequality constrained set: we have a first adequate representation as
Sn
+(R) = {A ∈ Sn(R) : g(A) ≤ 0},
Sn
++(R) = {A ∈ Sn(R) : g(A) < 0}.
We can thus represent Sn
+(R) as a constrained set with a single inequality, but we have to
use the nonsmooth function λmax. On the other hand, in Section 4.4, we will represent it with
several smooth functions.
We mention briefly that smooth approximations of the nonsmooth representing function λmax
give smooth approximations of S+n(R). We start from S+n(R) = {A ∈ Sn(R) : λmax(−A) ≤ 0}
and, following [35], we define for all µ > 0
X ∈ Sn(R) 7−→ fµ(X) := µ log n X i=1 exp(λi(X)/µ) !
where the λi(X) are the eigenvalues of X. This expression of fµ can be transformed to
fµ(X) = λmax(X) + µ log (trace(E(X))) with E(X) = exp
X − λmax(X)In
µ
. In [35], it is shown that fµ is globally Lipschitz, of class C1 with gradient
∇fµ(X) =
E(X) trace(E(X)), and that we have the uniform approximation
λmax(X) ≤ fµ(X) ≤ λmax(X) + µ log n.
Therefore, S+n(R) could be uniformly approximated by {A ∈ Sn(R) : fµ(−A) ≤ 0} which is an
inner (µ log n)-approximation.
4.4 Representation as Inequality Constrained Sets: Semialgebraic Viewpoint
The adequate representation of the previous section used a nonsmooth function. The algebraic nature of Sn
+(R) gives a representation with polynomials, and it is tempting to consider the
positively homogenous polynomial functions ∆k(A). However we cannot use them to have the desired decomposition since (recall Section 3.2)
Sn
+(R) ( {A ∈ Sn(R) : ∆k(A) ≥ 0 for all k = 1, . . . , n},
Sn
++(R) = {A ∈ Sn(R) : ∆k(A) > 0 for all k = 1, . . . , n}.
We still have an adequate polynomial representation: setting gk= −ik, we have indeed
Sn
+(R) = {A ∈ Sn(R) : gk(A) ≤ 0 for all k = 1, . . . , n}, (12)
Sn
++(R) = {A ∈ Sn(R) : gk(A) < 0 for all k = 1, . . . , n}.
The constraint functions gk are polynomial, positively homogeneous of degree k; obviously they
are of class C∞, whereas g in (11) is nonsmooth. Note that that S+n(R) represented by (12) is clearly a cone, but its convexity is less clear (and comes from subtle reasons: hyperbolic polynomials and the convexity theorem of G˚arding; see [12]).
Also note that in the representation (12) we have the a priori knowledge of the active con-straints at a matrix A (those such that gk(A) = 0). Usually, we only know this a posteriori;
here we directly know from the rank r of A that:
gk(A) < 0 for k = 1, . . . , r and gk(A) = 0 for k = r + 1, . . . , n.
We now briefly discuss the fact that Sn
+(R), as it appears in (12), is a semialgebraic set.
A so-called semialgebraic set is a set defined by unions and intersections of a finite number of polynomial inequalities. It turns out that these sets enjoy very nice closedness properties (almost any “finite”’ operation preserves semialgebraicity; see [36]). These properties are useful tools for the analysis of structured nonsmooth optimization problems (see e.g. [37], [38]). One of the main properties of semialgebraic sets is that they can be decomposed as a union of connected smooth manifolds (so-called strata) that fit together nicely. Denoting by Rr the smooth submanifold
made up from the positive semidefinite matrices of fixed rank r (for r = 0, . . . , n), we observe that the semialgebraic cone S+n(R) is the union of the n + 1 manifolds Rr, which are themselves
the union of their connected components, the strata of Sn
+(R). The two extreme cases are:
r = 0 the apex of the cone and r = n the interior of the cone. The decomposition as union of manifolds is explicit in the representation of S2(R) in R3: the point (r = 0), the boundary
deprived from the origin (r = 1) and the interior of the cone (r = 2).
The tangent space to Rr at A (denoted by TA) is connected to the tangent cone to S+n(R)
at A. In fact, TA is the largest subspace included in TA(recall(9)), namely
4.5 Natural Riemannian Metric on S++n (R)
In practice, the computations dealing with positive definite matrices are numerous and involve various operations: averaging, approximation, filtering, estimation, etc. It has been noticed that the Euclidean geometry is not always best suited for some of these operations (as for example in image processing; see e.g. [39] and references therein). This section presents another geometry in S++n (R) having different operations (for example for averaging). Next section connects this geometry to optimization. We refer to [4, Chap. 6] for more details and for the proofs of the results. Note that similar developments hold for Rr the fixed rank positive definite matrix
manifold (see [40]), but they require more involved techniques. Though we use the language of Riemannian geometry, we stay here at a very basic level.
The cone Sn
++(R) is open, and has two natural geometries: the Euclidean geometry
associ-ated with the inner product hh·, ·ii, where the distance between A and B is d2(A, B) := kA − Bk2 =
p
hhA − B, A − Bii;
and a Riemannian metric defined from ds = kA−1/2(dA)A−1/2k2 (the “infinitesimal length” at
A) by the following standard way. Let A, B ∈ Sn
++(R) and a piecewise C1 path γ : [a, b] → S++n (R) from A = γ(a) to B = γ(b);
then the length of γ is
L(γ) := Z b
a kγ
−1/2(t)γ′(t)γ−1/2(t)k 2dt,
and the Riemannian distance is
dR(A, B) := inf {L(γ) : γ paths from A to B} . (13)
It turns out that we have nice explicit expressions of dR and associated “geodesics” (locally,
the shortest paths between two points). The unique geodesic that connects A and B is the following path
[0, 1] 7−→ ¯γ(t) := A1/2(A−1/2BA−1/2)tA1/2 that reaches the minimum in (13), so that
dR(A, B)2 = k log(A−1/2BA−1/2)k2 = n X i=1 log λi(A−1/2BA−1/2) 2 = n X i=1 log λi(A−1B) 2 .
The last inequality comes from the fact that A−1B and A1/2BA1/2are similar matrices, so they
have the same eigenvalues. We can verify from this expression that we have indeed the desirable property dR(A, B) = dR(B, A) (since (A−1B)−1= B−1A, the squared log of eigenvalues are the
same). We give a flavour on why this distance has a better behaviour for some applications. Distance of inverses. We get easily that the inversion does not change the distance: for any A and B in S++n (R), we have
dR(A, B) = dR(A−1, B−1). (14)
This property does not hold for the distance d2. Here it simply follows from the fact that
(A−1B)⊤ = BA−1 have the same eigenvalues.
Geometric average (and a little more). Let A1, . . . , Ambe m matrices of S++n (R). Writing
the optimality conditions, we can prove that there exists a unique matrix M2(A1, . . . , Am) in
Sn
++(R) that minimizes X 7→
Pm
i=1(kX − Aik2)2, and we get explictly that M2 is the usual
(arithmetic) average
M2(A1, . . . , Am) :=
A1+ · · · + Am
For the Riemannian geometry, we have similarly that there exists a unique matrix X in S++n (R) that minimizes X 7−→ m X i=1 dR(X, Ai)2, (15)
and this matrix, denoted by MR(A1, . . . , Am), is called the Riemannian average. An interesting
property (not shared by M2) comes from (14): we have
MR(A−11 , . . . , A−1m ) is the inverse of MR(A1, . . . , Am). (16)
Finally, the optimality condition of (15) yields that Pmi=1log(Ai−1X) = 0 characterizes the
Riemannian average.
For m = 2, we see that MR(A1, A2) = A1(A1−1A2)1/2 = A2(A2−1A1)1/2 is the solution of
the equation (see [4, Chap. 4] for the details). If furthermore A1 and A2 commute, we have
MR(A1, A2) = A1/21 A 1/2 2 .
This last property explains why the Riemmanian average is sometimes called the geometric average.
For a general m, computing MR(A1, . . . , Am) might be less direct. As far as the symmetry
property (16) is concerned, one could consider instead another matrix average, the so-called the resolvent average Mres(A1, . . . , Am) of [41]. It can be defined as the minimum of a function
looking like (15) (with a Bregman distance replacing dR; see [41, Prop. 2.8]), but it has also a
simple, easy-to-compute expression that can be written
(Mres(A1, . . . , Am) + In)−1 = ((A1+ In)−1+ · · · + (A1+ In)−1)/n.
The above equality interprets that the resolvent of the average is the arithmetic average of the resolvents of the Ai’s, which give the name “resolvent average”. The fact that the resolvent
average satisfies (16) comes from techniques of variational analysis (see [41, Th. 4.8]).
4.6 Barrier Function of Sn ++(R)
As a function of the real variable x > 0, the logarithm x 7→ log x has a matrix relative which turns out to have a central role in optimization. The celebrated (negative) log-function for matrices is
X ≻ 0 7−→ F (X) := − log(det(X)) = log(det X−1). (17) Differential calculus for F in Sn
++(R) gives the “same” results as for the log in R+. The composite
function F is of class C∞ on S++n (R); its derivative DF (X) : H ∈ Sn(R) 7→ DF (X)[H] is such that
DF (X)[H] = −trace(X−1H) = hh−X−1, Hii = −trace(X−1/2HX−1/2), and gives the gradient ∇F (X) = −X−1. We have furthermore
D2F (X)[H, H] = trace (X−1/2HX−1/2)2 = hhX−1HX−1, Hii D3F (X)[H, H, H] = −2trace (X−1/2HX1/2)3.
We can also prove that F is strictly convex on S++n (R). This fundamental property is the topic of several exercises in [42].
As mentioned above, this function has a very special role in optimization, more precisely in semidefinite programming [6]. The function F is indeed a self-concordant barrier-function for Sn
++(R). The role of barrier is intuitive: F (X) → +∞ when X ≻ 0 approaches the boundary of
Sn
++(R), which consists of the singular matrices A 0. Self-concordance is a technical property,
namely
for a constant α (here α = 2 is valid). In the nineties, Y. Nesterov and A. Nemirovski developed a theory [43], that has had tremedous consequences in convex optimization. They showed that self-concordant barrier-functions allow to design algorithms (called interior-point methods) for linear optimization problems with conic constraints, and that moreover the complexity of those algorithms is fully understood. In particular, the interior-point algorithms for semidefinite programming made a little revolution during the nineties: they provided numerical methods to solve a wide range of engineering problems, for example in control [44], or in combinatorial optimization [45].
The log-function F in (17) is at the heart of interior point methods for semidefinite program-ming; it also has a remarkable connection with the Riemmanian metric of Section 4.5. Observe indeed that the norm in S+n(R) associated to D2F (X) corresponds nicely with the infinitesimal norm defined by the Riemannian structure,
kHkD2F (X) = (D2F (X)[H, H])1/2 = kX−1/2HX−1/2k.
It follows that the Riemannian distance naturally associated with the barrier is exactly the natu-ral Riemannian distance (13). This geometrical interpretation of the barrier function then shows that interior-point methods have an intrinsic appeal, and may explain their strong complexity results (see more in [46]).
4.7 Unit Partition in Sn ++(R)
In this last section, we discuss the so-called unit partition problem in Sn
++(R). Motivated by
a problem originated from Economy, I. Ekeland posed it as a challenge in a conference in 1997 [47]. Given k + 1 nonzeros vectors x1, . . . , xk, y in Rn, when is it possible to find positive definite matrices M1, . . . , Mk such that
Miy = xi for all i = 1, . . . , k (18) and k X i=1 Mi = In ? (19)
The first equation is of the quasi-Newton type [48]; the second equation gives the name of the problem.
A condition that guarantees existence of such matrices should obviously depend on the vectors x1, . . . , xk, y. It is easy to get from (18), (19) that a necessary condition is
hxi, yi > 0 for all i = 1, . . . , k and
k
X
i=1
xi = y. (20)
Unfortunately, this condition is not sufficient, as shown by the following counter-example. Con-sider in R2, the three vectors
x1 = 1/2 1 , x2 = 1/2 −1 and y = 1 0 . Then observe that for both i = 1, 2, the property Mixi = y with Mi ≻ 0 yields
M1 = 1/2 1 1 α and M2 = 1/2 −1 −1 β ,
with α, β > 2. Thus it is impossible to have M1+ M2 = I2. Note also that condition (20) is
even not sufficient to get a weaker version of the partition (19) with Mi 0.
We give here a constructive way to get the unit partition of S+n(R) (and later a variant of the result). The result is due to A. Inchakov, as noted by [49].
Theorem 4.1 (Condition for unit partition of S++n (R)). Let k + 1 nonzeros vectors x1, . . . , xk, y in Rn, satisfying (20). Then a necessary and sufficient condition for the existence of Mi ≻ 0
satisfying (18) and (19) is that
A0 := In− k X i=1 xixi⊤ hxi, yi be positive definite on y ⊥. (21)
Proof. Let us prove first that the condition (21) is necessary. The Cauchy-Schwarz inequality gives for all i and for all z ∈ Rn
hMiy, zi2 ≤ hMiy, yihMiz, zi with equality if and only if z and y are collinear,
which yields
hxi, zi2 ≤ hxi, yihMiz, zi with equality if and only if z and y collinear.
As a consequence, we have k X i=1 hxi, zi2 hxi, yi − hMiz, zi ! = k X i=1 hxi, zi2 hxi, yi − kzk 2 ≤ 0
with equality if and only if z and y are collinear which means
qA0(z) ≥ 0 with equality if and only if z and y are collinear.
Thus A0 is positive semidefinite and of kernel reduced to Ry, and we get (21).
To prove sufficiency of condition (21), we propose the matrices Mi:=
A0
k +
xixi⊤
hxi, yi.
By construction, we have Pki=1Mi = In; and moreover, since A0y = 0, we also have Miy = xi
for all i = 1, . . . , k. There remains to prove the positive definiteness of the Mi’s.
It is clear that Mi 0, since it is sum of two positive semidefinite matrices. Let us take
z ∈ Rn such that hM
iz, zi = 0, and let us prove that z = 0. We have
hA0z, zi
k +
hxi, zi2
hxi, yi = 0,
which yields
hA0z, zi = 0 (hence z and y are collinear by (21))
hxi, zi = 0.
The conclusion follows easily: there exists α ∈ R such that z = αy, and the condition hxi, yi > 0
implies α = 0 and then z = 0, so Mi is definite positive.
It is interesting to note that (21) is of the type (E0) hAx, xi > 0 for all x 6= 0 orthogonal to y,
for A ∈ Sn(R) and y 6= 0 in Rn, which is a frequently encountered property in matrix analysis and optimization. Here is below four formulations equivalent to (E0) (see e.g. [50]).
(E2) Condition on the augmented matrix: the matrix ¯A ∈ Sn+1(R) defined by blocks as ¯ A = A y y⊤ 0
has exactly n positive eigenvalues.
(E3) Condition on determinants (that the economists are keen of). For I ⊂ {1, . . . , n}, we note
AI the matrix extracted from A in taking only columns and rows of indices in I. Similarly
yI is obtained from y by taking the entries yi for i ∈ I. We set
¯ AI= AI yI yI⊤ 0
and the condition is det AI < 0 for all I = {1, 2}, {1, 2, 3}, . . . , {1, . . . , n}.
(E4) Condition on the inverse of the augmented matrix. The matrix ¯A is invertible and the
n×n-matrix extracted from ¯A−1 by taking the first n columns and rows is positive semidefinite.
We finish with a variant of Theorem 4.1 showing that starting from a weaker assumption, we get a weaker result.
Theorem 4.2 (Condition for unit partition of S+n(R)). Let x1, . . . , xkand y 6= 0 be k+1 vectors of Rn, satisfying hxi, yi > 0 or xi = 0 for all i = 1, . . . , k k X i=1 xi = y. (22)
Then a necessary and sufficient condition for the existence of Mi 0 satisfying (18) and (19)
is that A0:= In− X {i: xi 6=0} xixi⊤ hxi, yi is positive semidefinite. (23)
Proof. Note that under the assumption of the theorem, we have hxi, yi ≥ 0 for all i = 1, . . . , k
and there is xi 6= 0 because Pk
i=1xi = y 6= 0. The proof is similar to (and easier than) the
previous one. Let us start with the necessity of the condition (23). We have A0y = 0 since
Pn i=1xi =
P
{i: xi6=0}xi = y. With the Cauchy-Schwarz inequality, we write
hxi, zi2 ≤ hxi, yihMiz, zi, and then X {i: xi 6=0} hxi, zi2 hxi, yi − kzk 2 ≤ 0.
Thus we get A0 0. As for sufficiency, we propose the semidefinite matrices
Mi:= A0 k if x i = 0 A0 k + xixi⊤ hxi, yi if hxi, yi > 0.
5
Concluding Remarks
This article highlights nice variational properties of (the set of the) positive semidefinite matrices, and gives pointers to some of their applications. Even if the notion of positive semidefiniteness is basic and taught everywhere, there are still many open questions related to it, and more generally to the interplay between matrix analysis and optimization; see the commented problems in [9] and [51]. The “variational” viewpoint adopted here also opens the way to another notion of positivity, the so-called “copositivity”, which also has rich and useful properties: see the recent review [52] for example.
References
[1] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1989. (New edition, 1999).
[2] F. Zhang. Matrix Theory. Springer, 1999.
[3] D. Bernstein. Matrix mathematics. Princeton University Press, 2005.
[4] R. Bhatia. Positive definite matrices. Series in Applied Mathematics. Princeton University Press, 2007.
[5] G. Strang. Introduction to linear algebra. Wellesley-Cambridge Press, 2009. (4th interna-tional edition).
[6] R. Saigal, L. Vandenberghe, and H. Wolkowicz. Handbook of Semidefinite Programming. Kluwer, 2000.
[7] M. Anjos and J.B. Lasserre. Handbook of semidefinite, conic and polynomial optimization. Springer, 2011.
[8] V. Alexeev, E. Galeev, and V. Tikhomirov. Recueil de probl`emes d’optimisation. Mir Editions, 1987.
[9] J.-B. Hiriart-Urruty. Potpourri of conjectures and open questions in nonlinear analysis and optimization. SIAM Review, 49(2):255–273, 2007.
[10] S. Poljak, F. Rendl, and H. Wolkowicz. A recipe for semidefinite relaxation for (0,1)-quadratic programming. Journal of Global Optimization, 7:51–73, 1995.
[11] J.M. Borwein and A.S. Lewis. Convex Analysis and Nonlinear Optimization. Springer Verlag, New York, 2000.
[12] A.S. Lewis. Nonsmooth analysis of eigenvalues. Mathematical Programming, 84(1):1–24, 1999.
[13] J.-B. Hiriart-Urruty and C. Lemar´echal. Fundamentals of Convex Analysis. Springer Verlag, Heidelberg, 2001.
[14] J.-B. Hiriart-Urruty and Y. Plusquellec. Exercices d’alg`ebre lin´eaire et bilin´eaire. CEPADUES Editions, 1988.
[15] J. Brinkhuis and V. Tikhomirov. Optimization: Insigths and Applications. Series in Applied Mathematics. Princeton University Press, 2005.
[16] D. Az´e, G. Constans, and J.-B. Hiriart-Urruty. Calcul diff´erentiel et ´equations diff´erentielles. Exercices corrig´es. Re-edited by EDP Sciences, 2010.
[17] C. Truesdell. A First Course in Rational Continuum Mechanics: General Concepts. Aca-demic Press Inc, 1991. 2nd Revised edition.
[18] S.-P. Han and O. Mangasarian. Conjugate cone characterization of positive definite and semidefinite matrices. Linear Algebra and its Applications, 56:89–103, 1984.
[19] W.N. Anderson and R.J. Duffin. Series and parallel addition of matrices. Journal of Mathematical Analysis and Applications, 26(3):576 – 594, 1969.
[20] D. Az´e and J.-B. Hiriart-Urruty. Analyse variationnelle et optimisation. CEPADUES Edition, 2010.
[21] O. Taussky. The role of symmetric matrices in the study of general matrices. Linear Algebra and its Applications, 5:147–154, 1972.
[22] A. Bosch. The factorization of a square matrix into two symmetric matrices. American Math. Monthly, pages 462–464, 1986.
[23] O. Taussky and H. Zassenhaus. On the similarity transformation between a matrix and its transpose. Pacific Journal of Mathematics, pages 893–896, 1959.
[24] E.P. Wigner. On weakly positive matrices. Canadian Journal of Mathematics, 15:313 – 317, 1963.
[25] C.S. Ballantine. Products of positive definite matrices iii. Journal of Algebra, 10:174 – 182, 1968.
[26] A.S. Lewis. Eigenvalue-constrained faces. Linear Algebra and Applications, 269:18–1, 1997. [27] J.M. Borwein and H. Wolkowicz. Facial reduction for a cone-convex programming problem.
Journal of Australian Mathematical Society, 30(3):369–380, 1981.
[28] N. Krislock and H. Wolkowicz. Explicit sensor network localization using semidefinite representations and facial reductions. SIAM Journal on Optimization, 20(5):2679–2708, 2010.
[29] J. Malick. A dual approach to semidefinite least-squares problems. SIAM Journal on Matrix Analysis and Applications, 26, Number 1:272–284, 2004.
[30] N. Schwertman and D. Allen. Smoothing an indefinite variance-covariance matrix. Journal of Statistical Computation and Simulation, 9:183–194, 1979.
[31] A.S. Lewis and J. Malick. Alternating projections on manifolds. Mathematics of Operations Research, 33(1):216–234, 2008.
[32] J.-B. Hiriart-Urruty and D. Ye. Sensitivity analysis of all eigenvalues of a symmetric matrix. Numerische Mathematik, 70(1):45–72, 1995.
[33] M. Overton and R. Womersley. Optimality conditions and duality theory for minimiz-ing sums of the largest eigenvalues of symmetric matrices. Mathematical Programmminimiz-ing, 62(2):321–358, 1993.
[34] A. d’Aspremont, L. El Ghaoui, M.I. Jordan, and G.R. Lanckriet. A direct formulation for sparse PCA using SDP. SIAM Review, 49:434–448, 2007.
[35] Y. Nesterov. Smoothing technique and its applications in semidefinite optimization. Math-ematical Programming, 110(2):245–259, 2007.
[36] J. Bochnak, M. Coste, and M-F. Roy. Real Semialgebraic geometry. Number 36 in Results in Mathematics and Related Areas. Springer-Verlag, 1998.
[37] J. Bolte, A. Daniilidis, and A. Lewis. The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. on Optimization, 17(4):1205–1223, 2006.
[38] A. Ioffe. An invitation to tame optimization. SIAM Journal on Optimization, 19(4):1894– 1917, 2009.
[39] M. Moakher. Symmetric positive-definite matrices: From geometry to applications and visualization. In Visualization and Processing of Tensor Fields, 2006.
[40] S. Bonnabel and R. Sepulchre. Riemannian metric and geometric mean for positive semidefi-nite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications, 31(3):1055– 1070, 2009.
[41] H.H. Bauschke, S.M. Moffat, and X. Wang. The resolvent average for positive semidefinite matrices. Linear Algebra and Applications, 432(7):1757–1771, 2010.
[42] J.-B. Hiriart-Urruty. Optimisation et analyse convexe. Exercices corrig´es. Re-edited by EDP Sciences, 2009.
[43] Y. Nesterov and A.S. Nemirovski. Interior-Point Polynomial Algorithms in Convex Pro-gramming. Number 13 in SIAM Studies in Applied Mathematics. SIAM, Philadelphia, 1994.
[44] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. Studies in Applied Mathematics. SIAM, 1994.
[45] M. Goemans and D. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 6:1115–1145, 1995.
[46] Y. Nesterov and M.J. Todd. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Foundations of Computational Mathematics, 2:333– 361, 2002.
[47] I. Ekeland. Plenary talk at the worskshop of the group MODE of the French applied mathematical society, Institut Henri Poincar´e, Paris, 1997.
[48] J.F. Bonnans, J.Ch. Gilbert, C. Lemar´echal, and C. Sagastiz´abal. Numerical Optimization. Springer Verlag, 2003.
[49] I. Ekeland. Some applications of the Cartan-K¨ahler theorem to economic the-ory. In R. Berndt and O. Riemenschneider, editors, Erich K¨ahler: Mathematische Werke/Mathematical Works, 2003.
[50] Y. Chabrillac and J.-P. Crouzeix. Definiteness and semidefiniteness of quadratic forms revisited. Linear Algebra and its Applications, 63:283–292, 1984.
[51] J.-B. Hiriart-Urruty. A new series of conjectures and open questions in optimization and matrix analysis. ESAIM: Control, Optimisation and Calculus of Variations, 15:454–470, 2009.
[52] J.-B. Hiriart-Urruty and A. Seeger. A variational approach to copositive matrices. SIAM Review, 52(4):593–629, 2010.