• Aucun résultat trouvé

Composable core-sets for determinant maximization problems via spectral spanners

N/A
N/A
Protected

Academic year: 2021

Partager "Composable core-sets for determinant maximization problems via spectral spanners"

Copied!
34
0
0

Texte intégral

(1)

Composable core-sets for determinant

maximization problems via spectral spanners

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Indyk, Piotr et al. “Composable core-sets for determinant

maximization problems via spectral spanners.” Paper in the

Proceedings of the Annual ACM-SIAM Symposium on Discrete

Algorithms, Salt Lake City, UT, January 5 - 8, 2020, Association for

Computing Machinery: 1675–1694 © 2019 The Author(s)

Publisher

Association for Computing Machinery

Version

Original manuscript

Citable link

https://hdl.handle.net/1721.1/129406

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike

(2)

arXiv:1807.11648v2 [cs.DS] 16 Nov 2019

Composable Core-sets for Determinant Maximization Problems

via Spectral Spanners

Piotr Indyk MIT indyk@mit.edu Sepideh Mahabadi TTIC mahabadi@ttic.edu

Shayan Oveis Gharan University of Washington shayan@cs.washington.edu Alireza Rezaei University of Washington arezaei@cs.washington.edu Abstract

We study a generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectorsV ⊆ Rd

, we say a setU ⊆ V is an α-spectral k-spanner, for k ≤ d, if for all v ∈ V there is a probability distributionµvsupported onU such that

vv⊺

kα· Eu∼µvuu

,

where for two matricesA, B∈ Rd×dwe writeA

kB iff the sum of the bottom d− k + 1 eigenvalues

ofB− A is nonnegative. In particular, A dB iff A B. We show that any set V has an ˜O(k)-spectral

spanner of size ˜O(k) and this bound is almost optimal in the worst case.

We use spectral spanners to study composable core-sets for spectral problems. We show that for many objective functions one can use a spectral spanner, independent of the underlying function, as a core-set and obtain almost optimal composable core-sets. For example, for thek-determinant maximization problem, we obtain an ˜O(k)k-composable core-set, and we show that this is almost optimal in the worst

case.

Our algorithm is a spectral analogue of the classical greedy algorithm for finding (combinatorial) spanners in graphs. We expect that our spanners find many other applications in distributed or parallel models of computation. Our proof is spectral. As a side result of our techniques, we show that the rank of diagonally dominant lower-triangular matrices are robust under “small perturbations” which could be of independent interests.

(3)

1

Introduction

Given a graphG with n vertices{1, . . . , n}, we say a subgraph H is a α-(combinatorial) spanner if for every pair of verticesu, v of G,

distH(u, v)≤ α · distG(u, v),

wheredistG(u, v) is the shortest path distance between u, v in G. It has been shown that for any α, G has anα-spanner with only n1+O(1)/αmany edges and that can be found efficiently [EP04]. Such a spanner can be found by a simple algorithm which repeatedly finds and adds an edgef = (u, v) where distH(u, v) > α. Combinatorial spanners have many applications in distributed computing [Pel00,FJJ+01,KK00], optimiza-tion [DK99,AGK+98], etc.

In this paper we define and study a spectral generalization of this property. Given a set of vectors V ⊆ Rd, we say a set U ⊆ V is an α-spectral d-spanner of V if for any vector v ∈ V , there exists a probability distributionµvon the vectors inU such that

vv⊺ α · E

u∼µvuu

equiv hx, vi2 ≤ α · E

u∼µvhx, ui

2,∀x ∈ Rd.

To see that this is a generalization of the graph case, let bu,v = eu − ev be the vector corresponding to an edge {u, v} of G, where eu is the indicator vector of the vertexu. It is an exercise to show that for V = {be}e∈E(G)and for anyα-combinatorial spanner H of G, the set U = {be}e∈E(H)is anα2-spectral spanner ofV .

The following theorem is a special case of our main theorem.

Theorem 1.1 (Main theorem fork = d). There is an algorithm that for any set of vectors V ⊆ Rdfinds an ˜

O(d)-spectral d-spanner of size ˜O(d) in time polynomial in d and size of|V |1.

Our algorithm is a spectral generalization of the greedy algorithm mentioned above for finding combi-natorial spanners.

We further study generalizations of our spectral spanners to weaker forms of PSD inequalities. For two matrices A, B we write A k B if for every projection matrix Π onto a d− k + 1 dimensional linear subspace,hA, Πi ≤ hB, Πi. For example, if A kB, then sum of the top k eigenvalues of A is at most the sum of the topk eigenvalues of B. Analogously, we say U ⊆ V is an α-spectral k-spanner of V , if for any v∈ V , there is a distribution µv onU such that vv⊺k α· Eu∼µvuu

. In our main theorem we generalize the above statement to allk≤ d and we show that to construct an ˜O(k) spectral k-spanner we only need to use ˜O(k) many vectors independent of the ambient dimension of the space.

Theorem 1.2 (Main). There is an algorithm that for any set of vectors V ⊆ Rd finds an ˜O(k)-spectral k-spanner of size ˜O(k).

Furthermore, for any ǫ > 0 and k ≤ d, there exists a set V ⊆ Rd of sizeeΩ(kǫ)

such that anyk1−ǫ

-spectral spanner ofV must have all vectors of V .

Composable core-sets. Our main application of spectral spanners is to design (composable) core-sets for spectral problems. A functionc(V ) that maps V ⊆ Rdinto its subset is called anα-composable core-set of sizet for the function f (·) [AHPV05,IMMM14], if for any collection of setsV1, . . . , Vp⊂ Rd, we have

f (c(V1)∪ . . . ∪ c(Vp))≥ 1

α· f(V1∪ . . . ∪ Vp)

1

(4)

and |c(Vi)| ≤ t for any Vi. A composable core-set of a small size immediately yields a communication-efficient distributed approximation algorithm: if each setVi is stored on a separate machine, then all ma-chines can compute and transmit core-sets, c(Vi)’s, to a central server, which can then perform the final computation over the union. Similarly, core-sets make it possible to design a streaming algorithm which processes N vectors in one pass using only√N t storage. This is achieved by dividing the stream of data into blocks of size√N t, computing and storing a core-set for each block, and then performing the compu-tation over the union.

In this paper we show that, for a given setVi ∈ Rd, anα-spectral spanner of Vi for a proper value of α provides a good core-set of Vis. Specifically, we show that for many (spectral) optimization problems, such as determinant maximization,D-optimal design or min-eigenvalue maximization, this approach leads to almost the best possible composable core-set in the worst case.

In what follows we discuss a specific application, to determinant maximization, in more detail.

1.1 Composable Core-sets for Determinant Maximization

Determinantal point processes are widely popular probabilistic models. Given a set of vectorsV ⊆ Rd, and a parameterk, DPPs sample k-subsets S of P with probability P(S) proportional to the squared volume of the parallelepiped spanned by the elements ofS. That is,

P(S)∼ det k (

X v∈S

vvT)

This distribution formalizes a notion of diversity, as sets of vectors that are “substantially different” from each other are assigned higher probability. One can then find the “most diverse”k-subset in P by computing S that maximizes P(S), i.e., solving the maximum a posteriori (MAP) decoding problem:

max S⊂P,|S|=k

P(S).

We also refer to this problem as thek-determinant maximization problem. Since their introduction to ma-chine learning at the beginning of this decade [KT10, KT11a, KT+12], DPPs have found applications to video summarization [MJK17, GCGS14], document summarization [KT+12, CGGS15, KT11b], tweet timeline generation [YFZ+16], object detection [LCYO16], and other applications. All of the aforemen-tioned applications involve MAP decoding.

Here we use our results on spectral spanners to construct an almost optimal composable core-set for MAP problem. Before mentioning our result let us briefly discuss relevant previous work on this problem. The MAP problem is hard to approximate up to a factor of2−ck for some constant c > 0, unless P=NP. [C¸ MI09a,CMI13]. This lower bound was matched qualitatively by a recent paper of [Nik15], who gave an algorithm withek-approximation guarantee. Since the data sets in the aforementioned applications can be large, there has been a considerable effort on developing efficient algorithms in distributed, streaming or parallel models of computation [MJK17,WIB14,PJG+14,MKSK13,MKBK15,MZ15,BENW15]. All of these algorithms relied on the fact that the logarithm of the volume is a submodular function, which makes it possible to obtain multiplicative factor approximation algorithms (assuming some lower bound on the volume, as otherwise the logarithm of the volume can be negative). See Section1.3 for an overview. However, this generality comes at a price, as multiplicative approximation guarantees for the logarithm of the volume translates into ”exponential” guarantees for the volume, and necessitates the aforementioned extra lower bound assumptions. As a result, to the best of our knowledge, no multiplicative approximation factor algorithms were known before for this problem, for streaming, distributed or parallel models of computation.

(5)

In this paper we present the first (composable) core-set construction for the determinant maximization problem. Our main contributions are:

Theorem 1.3. There exists a polynomial time algorithm for computing an ˜O(k)k-composable core-set of

size ˜O(k), for the k-determinant maximization problem.

Let us discuss the proof of the above theorem for the casek = d using our main theorem1.1. Given sets of vectorsV1, . . . , VmletU1, . . . , Umbe their ˜O(d)-spectral spanners respectively. Let

S = argmaxS⊆∪iVi:|S|=dP(S). Consider the matrixA =P

v∈SEu∼µvuuT, that is we substitute each vectorv in S by a convex combination

of the vectors in the spectral spanner(s). Then, by definition of spectral spanner, 1

α X v∈S

vvT  A.

Since determinant is a monotone function with respect to the Loewner order of PSD matrices, 1 αdP(S) = det 1 α X v∈S vvT ! ≤ det(A).

The matrixA can be seen as a fractional solution to the determinant maximization problem. In fact [Nik15] showed that A can be rounded to a set T of size|T | = d such that det(A) ≤ eddet(T ). Therefore, we obtain an(eα)dapproximation for determinant maximization (seesubsection 6.1for more details).

The technique that we discussed above can be applied to many optimization problems. In general, if instead of the determinant, we wanted to maximize any function f : Rd×d → R+, that is monotone on the Loewner order of PSD matrices, we can use the above approach to construct a fractional solution A supported on the spectral spanners such thatf (A) is at least the optimum (up to a loss that depends on α). Then, we can use randomized rounding ideas to round the the matrixA to an integral solution of f . See

subsection 6.2for further examples.

We complement the above theorem by showing the above guarantee is essentially the best possible. Theorem 1.4. Any composable core-sets of size at mostfor thek-determinant maximization problem

must incur an approximation factor of at least(kβ)k(1−o(1)), for anyβ ≥ 1.

Note that our lower bound of(kβ)k(1−o(1)) for the approximation factor achievable by composable core-sets is substantially higher than the approximation factorek of the best off-line algorithm, demonstrating a large gap between these two models.

1.2 Overview of the Techniques

In this part, we give a high level overview of the proof ofTheorem 1.2. Our proof has two steps: First, we solve the “full dimensional version of the problem, i.e., we construct an ˜O(d)-spectral d-spanner of size ˜O(d) for a given set of vectors in Rdas promised inTheorem 1.1. Then, we reduce the “low dimensional” version of the problem, i.e., findingk-spanners for k < d, to the full dimensional version in a ˜O(k)-dimensional space.

(6)

Step 1: Our high-level plan is to “augment” the classical greedy algorithm for finding combinatorial spanners in graphs to the spectral setting. First, we rewrite the combinatorial algorithm in spectral language. LetG be a graph with vertex set V (G) and edge set E(G). Recall that for any edge e ={u, v} ∈ E(G) be = eu − ev. As alluded to in the introduction, if H is an α-combinatorial spanner of G, then U = {be}e∈E(H) is an α2-spectral spanner of {be}e∈E(G). The following algorithm gives anα-combinatorial spanner withn1+O(1)/αedges: Start with an empty graphH. While there is an edge f ={u, v} in G where distH(u, v) > α, add it to H. One can observe that distH(u, v) > α iff, for any distribution µ on E(H), bfbTf 6 α2Ee∼µbebTe.

This observation suggests a natural algorithm in the spectral setting: At each step find a vector v ∈ V such that for allµ supported on the set of vectors already chosen in the spanner, vvT 6 α · Eu∼µuuT, and add it to the spanner. We can implement such an algorithm in polynomial time, but we cannot directly bound the size of the spectral spanner that such an algorithm constructs using our current techniques.

So, we take a detour. First, we solve a seemingly easier problem by changing the order of quantifiers in the definition of the spectral spanner. ForV ⊆ Rd, a subsetU ⊆ V is a weak α-spectral spanner of V , if for allv∈ V and x ∈ Rdthere is a distributionµv,xonU such that

hv, xi2 ≤ α · Eu∼µv,xhu, xi

2 equiv

hv, xi2 ≤ α · max u∈Uhu, xi

2.

To find a weak spectral spanner, we use the analogue of the greedy algorithm: LetU be the set of vectors already chosen; while there is a vectorv ∈ V and x ∈ Rdsuch thathx, vi2 > α· max

u∈Uhu, xi2 we add argmaxvhx, vi2 toU .

We prove that forα = ˜O(d) the above algorithm stops in ˜O(d) steps. Suppose that the algorithm finds vectorsu1, . . . , umtogether with corresponding “bad” directionsx1, . . . , xm, wherexibeing a bad direction foruimeans that

hui, xii2> αhuj, xii2,∀1 ≤ i ≤ m, ∀1 ≤ j < i. (1) We need to show thatm = ˜O(d). We consider the matrix M ∈ Rm×mwhereMi,j =hui, xji. By the above constraintsM is diagonally dominant and approximately lower triangular matrix. But since M has rank at mostd as it is the inner product matrix of vectors lying in Rd, we conclude thatm = ˜O(d). Note that in the extreme case, whereM is truly lower triangular the latter fact obviously holds because then rank(M ) = m. As a side result, we also show that the rank of lower triangular matrices is robust under small perturbations, (seeLemma 4.3).

The above argument shows that the spectral greedy algorithm gives a weak spectral spanner for α = ˜

O(d) of size ˜O(d). To finish the proof ofTheorem 1.1 we need to find a (strong)α-spectral spanner from our weak spanner. We use a duality argument to show that any weak spectral spanner is indeed anα-spectral spanner. Let U be a weak spectral spanner. To verify that U is an α-spectral spanner, we need to find a distribution µv for any v ∈ V supported on U such that vvT  α · Eu∼µvuuT. We can find the best

distributionµvusing an SDP with variablespufor allu∈ U denoting Pµv(u). Instead of directly bounding

the primal, we write down the dual of the SDP and use hyperplane separating theorem to show that indeed such a distribution exists.

It was pointed to us by an anonymous reviewer that one can use approximate John’s ellipsoid [B+97] to find anO(d)-weak-spectral d-spanner of size ˜O(d). This improves the guarantees of our algorithm by a log d factor. We discuss the details at the end of Section4.1. Let us briefly mention the advantages of our algorithm over the John’s ellipsoid method: First, in finding the weak spanner one can tune the value ofα in (1) based on the structure of the given data points and the ideal size of the core-set. We also expect that in many real world applications, one can use our algorithm to obtain polylog(d)-spectral d-spanners of

(7)

size ˜O(d). Secondly, to implement our algorithm we only need to solve linear programs with O(d) many variables. This requires polynomially smaller amount of memory compared to the SDP solvers one needs to use to solve the John’s ellipsoid. Lastly, our algorithm is easier to parallelize.

Step 2: To reduce thek-spanner problem to the “full dimensional” case, we use the greedy algorithm of [C¸ MI09a] to find a set of vectorsW ⊂ V of size ˜O(k) such that for any v ∈ V ,

vhW i⊥v⊺

hW i⊥kO(1)· Ew[ww⊺] (2)

where the expectation is over the uniform distribution onW , and vhW i⊥ represents the projection ofv onto

the space orthogonal to the linear subspace spanned byW . Then, we project all vectors in V onto the space hW i, and we solve the full dimensional version, i.e., we find U ⊆ V of size ˜O(|W |) such that for any v∈ V , there exists a distribution µv supported onU which satisfies

vhW iv⊺ hW i ˜O(k)· Eu∼µv h uhW iu⊺ hW i i . (3)

Ideally on the RHS of the above, we need to have uu⊺ instead of u

hW iu⊺hW i which can be achieved by incurring an extra constant factor by applying (2). It is not hard to see from the above two equations that U ∪ W is an ˜O(k)-spectral k-spanner for V .

It remains to find the set W ⊆ V satisfying (2). We use the following algorithm: LetW = ∅. For i = 1, . . . , ˜O(k), add argmaxv∈V kvhUi⊥k to W . Intuitively, we greedily choose a set of vectors of size

˜

O(k) to minimize the projection of the remaining vectors in the orthogonal space ofhW i. To prove (2), we need to show

vhW i⊥v⊺

hW i⊥ kO(1)· Ew∼µ[ww

] .

Equivalently, after choosing the worst projection matrix onto ad− k + 1 linear subspace, it is enough to show kvhW i⊥k2≤ O(1) · d X i=k λi(Ew∼µ[ww⊺]). (4)

To prove the above inequality, we use properties of the greedy algorithm to study singular values of the matrix obtained by applying the Gram-Schmidt process on the vectors inW .

Lower bounds. As we discussed in the intro, it is not hard to prove that the guarantee ofTheorem 1.1is tight in the worst case. However, one might wonder if it is possible to design better composable core-sets for determinant maximization and related spectral problems. We show that for many such problems we obtain the best possible composable core-set in the worst case. Let us discuss the main ideas ofTheorem 1.4.

We consider the casek = d for simplicity of exposition. For a set V ⊆ Rdand a linear transformation Q∈ Rd×d, defineQV ={Qv}

v∈V. Choose a setV ⊆ Rdof unit vectors such that for any distinctu, v∈ V , hu, vi2≤ 1/d1−o(1). This can be achieved with high probability by selecting points inV independently and uniformly at random from the unit sphere. Recall that the setV can have exponentially (in d) large number of vectors. Consider setsA1, . . . , AdandB1, . . . , Bd−1in a(2d− 1)-dimensional space such that:

• For each 1 ≤ i ≤ d, let Ai= RiV where Riis a rotation matrix which maps Rdtohe1, e2, . . . ed−1, e(d−1)+ii and it maps a uniformly randomly chosen vector ofV to e(d−1)+i.

(8)

• For each 1 ≤ i ≤ d − 1, Bi ={Mei}, where M is a “large” number.

Our instance of determinant maximization is simplyQA1, . . . , QAd, QB1, . . . , QBd−1for a random rota-tion matrixQ∈ R(2d−1)×(2d−1).

Observe that the optimal set of 2d− 1 vectors contains Qe(d−1)+i’s from QAi’s and QM ei’s from Bi’s, and has value equal to(Md−1)2. However, sinceQ is a random rotation, the core-set function cannot determine which vector inQAiwas aligned withQe(d−1)+i. Recall that the core-set algorithm must find a core-set ofAiby only observing the vectors inAi. Thus unless core-sets are exponentially large ind, there is a good probability that, for alli, the core-set for QAidoes not containQe(d−1)+i. For a sufficiently large M , all vectors QM eifromQBimust be included in any solution with a non-trivial approximation factor. It follows that, with a constant probability, any core-set-induced solution is sub-optimal by at least a factor of dΘ(d).

Paper organization. Insection 3, we formally define spectral(d)-spanners and their generalization “spec-tralk-spanners”. Insection 4, we introduce our algorithm for finding spectral spanners and prove Theo-rem 1.1. Then, insection 5, we generalize the result ofTheorem 1.1for spectralk-spanners and show the reduction from k < d to the full dimensional case of k = d, provingTheorem 1.2. We mention applica-tions of spectral spanners for designing composable core-sets for several optimization problems including k-determinant maximization in section 6. In particular we proveTheorem 1.3. Finally in section 7, we present our lower bound results and proveTheorem 1.4.

1.3 Related work

As mentioned earlier, multiple papers developed composable core-sets (or similar constructions) when the objective function is equal to the logarithm of the volume. In particular, [MKSK13] showed that core-sets obtained via a greedy algorithm guarantee an approximation factor ofmin(k, n). The approximation ratio can be further improved to a constant if the input points are assigned to setVi uniformly at random [MZ15,BENW15]. However, these guarantees do not translate into a multiplicative approximation factor for the volume objective function.2

Core-sets constructions are known for a wide variety of geometric and metric problems, and several algorithms have found practical applications. Some of those constructions are relevant in our context. In particular, core-sets for approximating the directional width [AHPV04] have functionality that is similar to weak spanners. However, the aforementioned paper considered this problem for low-dimensional points, and as a result, the core-sets size was exponential in the dimension. Another line of research [AAYI+13,

IMMM14,CPPU17] considered core-sets for maximizing metric diversity measures, such as the minimum inter-point distance. Those measures rely only on relationships between pairs of points, and thus have quite different properties from the volume-induced measure considered in this paper.

We also remark that one can consider generalizations of our problem to settings were we want to max-imize the volume under additional constraints. Over the last few years several extensions were studied extensively and many new algorithmic ideas were developed [NS16,AO17,SV17,ESV17]. In this paper, we study composable core-sets for the basic version of the determinant maximization problem where no additional constraints are present.

2

It is possible to show that the greedy method achieves composable core-sets with multiplicative approximation factor of2O(k2). Since this bound is substantially larger than our bound obtained by spectral spanners, we do not include the proof in this paper.

(9)

2

Preliminaries

2.1 Linear Algebra

Throughout the paper, all vectors that we consider are column based and sitting in Rd, unless otherwise specified. For a vectorv, we use notation v(i) to denote its ithcoordinate and usekvk to denote its ℓ2norm. Vectorsv1, . . . , vkare called orthonormal if for anyi,kvik = 1, and for any i 6= j, hvi, vji = 0. For a set of vectorsV , we lethV i denote the linear subspace spanned by vectors of V . We also use Sto denote the linear subspace orthogonal toS, for a linear subspace S.

Notationh, i is used to denote Frobenius inner product of matrices, for matrices A, B ∈ Rd×d

hA, Bi = d X i=1 d X j=1

Ai,jBi,j = tr(AB⊺)

whereAi,j denotes the entry of matrixA in row i and column j. Matrix A is a Positive Semi-definite (PSD) matrix denoted byA 0 if it is symmetric and for any vector v, we have vAv =hA, vvi ≥ 0. For PSD matricesA, B we write A B if B − A  0. We also denote the set of d × d PSD matrices by Sd+.

Projection Matrices. A matrixΠ∈ Rd×dis a projection matrix ifΠ2 = Π. It is also easy to see that for anyv∈ Rd,

hvv⊺, Πi = vΠv =hΠv, Πvi = kΠvk2.

For a linear subspaceS, we let ΠS denote the matrix projecting vectors from RdontoS which means for any vectorv, (ΠS)v is the projection of v onto S. If S is k-dimensional and v1, . . . , vkform an arbitrary orthonormal basis ofS, then one can see that ΠS =

 Pk i=1viv ⊺ i 

. We also represent the set of all projection matrices ontok-dimensional subspaces byPk.

Fact 2.1. For any vectorsu, v ∈ Rdand any projection matrixΠ∈ Rd×d h(u + v)(u + v)⊺, Πi ≤ 2huu+ vv, Πi.

Proof. Leta = Πu and b = Πv. Then since Π2 = Π, we havehuu⊺, Πi = kak2,hvv, Πi = kbk2, and h(u + v)(u + v)⊺, Πi = ka + bk2. Now, the assertion is equivalent to

ka + bk2≤ 2(kak2+kbk2)

which follows by Cauchy-Schwarz inequality,ha, bi ≤ kakkbk ≤ (kak2+kbk2)/2.

Eigenvalues and Singular Values. For a symmetric MatrixA, λ1(A)≥ . . . ≥ λd(A) denotes the eigen-values of A. The following characterization of eigenvalues of symmetric matrices known as min-max or variational theorem is useful for us.

Theorem 2.2 (Min-max Characterization of Eigenvalues). LetA∈ Rd×dbe a symmetric matrix with

eigen-valuesλ1 ≥ λ2≥ . . . ≥ λd. Then λk= max{min

x∈U x⊺Ax

kxk2 | U is a k-dimensional linear subspace},

or

λk = min{max x∈U

x⊺Ax

(10)

The following theorem known as Cauchy interlacing theorem shows the relation between eigenvalues of a symmetric matrix and eigenvalues of its submatrices.

Theorem 2.3 (Cauchy Interlacing Theorem). Let A be a symmetric d× d matrix, and B be an m × m

principal submatrix ofA. Then for any 1≤ i ≤ m, we have

λd−m+i(A)≤ λi(B)≤ λi(A)

We also use the following lemma which is an easy implication of min-max characterization to bound eigenvalues of summation of two matrices in terms of the summation their eigenvalues.

Lemma 2.4. LetA, B∈ Rd×dbe two symmetric matrices. Thenλ

i+j−d(A + B)≥ λi(A) + λj(B) for any i, j with i + j− d > 0.

Proof. Following the min-max characterization of eigenvalues, let SA and SB be two i-dimensional and j-dimensional linear subspaces for which we have

λi(A) = minx∈SA

x⊺Ax

kxk2 and λj(B) = minx∈SB

x⊺Bx

kxk2

Then let S = SA∩ SB. The dimension of S is at least i + j− d, and by min-max characterization of eigenvalues we have λi+j−d(A + B)≥ min x∈S x⊺(A + B)x kxk2 ≥ minx∈S A x⊺Ax kxk2 + minx∈S B x⊺Bx kxk2 = λi(A) + λj(B), hence the proof is complete.

Moreover, we take advantage of the following simple lemma which is also known as extremal partial trace. A proof of it can be found in [Tao10].

Lemma 2.5. LetL∈ Rd×dbe a symmetric matrix. Then for any integern≤ d,

min Π∈PnhΠ, Li = d X d−n+1 λi(L).

In particular, we use it to conclude that ifx1, . . . , xn ∈ Rdare orthonormal vectors, then n X i=1 x⊺ iLxi ≥ d X d−n+1 λi(L).

For a matrixA, we use σ1(A)≥ . . . ≥ σd(A)≥ 0 to denote singular values of A (for symmetric matrices they are the same as eigenvalues). Given a matrix, we use the following simple lemma to construct a symmetric matrix whose eigenvalues are the singular values of the input matrix and their negations.

Many of the matrices that we work with in this paper are not symmetric. Define a symmetrization operatorSd: Rd×d→ R2d×2dwhere for any matrixA∈ Rd×d,

Sd(A) =  0 A A⊺ 0  .

(11)

Fact 2.6. For any matrix A ∈ Rd×d, S(A) has eigenvalues σ1(A) ≥ . . . ≥ σd(A) ≥ −σd(A) . . . ≥ −σ1(A).

Proof. Letu1, . . . , ud and v1, . . . , vd be right and left singular vectors ofA, respectively. Then we have Aui= σiviandA⊺vi= σiuifor any1≤ i ≤ m. Now, it is easy to see [vi ui] and [−vi ui] are eigenvectors forS(A) with eigenvalues σiand−σifor any1≤ i ≤ m.

Throughout the paper, we work with different norms for matrices. The ℓ2 norm of matrixA, denoted bykAk2 or justkAk denotes maxkxk2=1kAxk2. Also,kAk∞ = maxi,j|Ai,j| denotes the ℓ∞norm. The

Frobenius norm of A denoted by kAkF is defined by kAkF = phA, Ai. We use the following identity which relates Frobenius norm to singular values.

Fact 2.7. For any matrixA∈ Rd×d,

kAk2F = d X i=1

σi(A)2.

Determinant Maximization Problem. We use the notion of determinant of a subset of vectors as a mea-sure of their diversity. From a geometric point of view, for a subset of vectorsV = {v1, . . . , vd} ⊂ Rd, det(Pd

i=1vivi⊺) is equal to the square of the volume of the parallelepiped spanned by V . For S, T ⊆ [d], LetAS,T denote the|S| × |T | submatrix formed by intersecting the rows and columns corresponding to S, T respectively. The notation detkis a generalization of determinant and is defined by

det k (A) =

X S∈([d]k)

det AS,S.

In particular, for vectors v1, . . . , vk ∈ Rd, detk(Pki=1vv ⊺

i) is equal to the square of the k-dimensional volume of the parallelepiped spanned byv1, . . . , vk. The problem ofk-determinant maximization is defined as follows.

Definition 2.8 (k-Determinant Maximization). Let V = {v1, . . . , vn} ⊂ Rd be a set of vectors, and let M ∈ Rn×n be the Gram matrix obtained fromA, i.e., Mi,j =hvi, vji. For an integer k ≤ d, the goal of the k-determinant maximization problem is to choose a subset S⊆ V such that |S| = k and the determinant of MS,Sis maximized.

Throughout the paper, we extensively use the Cauchy-Binet identity which states that for any integer k≤ d, B ∈ Rk×d, andC∈ Rd×k we have

det(BC) = X S∈(d k)

det(B[k],S) det(CS,[k]),

For anyS ⊂ [n](|S| = k), if we let VS∈ Rk×dbe the matrix with{vi}i∈Sas its rows, then we have det(MS,S) = det(VSVS⊺) = det(V

⊺ SVS) = det k ( X v∈S vv⊺), (5)

where the last equality uses the Cauchy-Binet identity. The k-determinant maximization is also known as maximum volume k-simplex since detk(Pv∈Svv⊺) is equal to the square of the volume spanned by

(12)

{vi}i∈S. Throughout the paper, we also use the following identity which can be derived from the Cauchy-Binet formula when the columns ofB ∈ Rd×narevis andC = B.

det( n X i=1 vivi⊺) = X S∈([n]d) det(X i∈S vivi⊺). (6)

We use it to deduce the following simple lemma

Lemma 2.9. For any set of vectorsv1, . . . , vn∈ Rdand any integer1≤ k ≤ d,

det k n X i=1 vivi⊺ ! = X S∈([n] k) det k X i∈S vivi⊺ !

Proof. For a setT ⊂ [d] and any 1 ≤ i ≤ n, let vi,T ∈ Rkdenote the restriction ofvito its coordinates in T . The proof can be derived as follows

det k n X i=1 vivi⊺ ! = X T ∈([d] k) det n X i=1 vi,Tvi,T⊺ ! By definition ofdet k = X T ∈([d] k) X S∈([n] k) det X i∈S vi,Tv⊺i,T ! By (6) = X S∈([n]k)    X T ∈([d]k) det X i∈S vi,Tvi,T⊺ !   = X S∈([n]k) det k X i∈S vivi⊺ ! By definition ofdet k

We also use the following identities about the determinant of matrices. For ad× d matrix A, we have

det(A) = d Y i=1

σi(A).

IfA is lower(upper) triangular, i.e. Ai,j = 0 for j > i(j < i), we have det(A) = Πdi=1Ai,i.

2.2 Core-sets

The notion of core-sets has been introduced in [AHPV04]. Informally, a core-set for an optimization prob-lem is a subset of the data with the property that solving the underlying probprob-lem on the core-set gives an approximate solution for the original data. This notion is somewhat generic, and many variations of core-sets exist.

The specific notion of composable core-sets was explicitly formulated in [IMMM14].

Definition 2.10 (α-Composable Core-sets). A function c(V ) that maps the input set V ⊂ Rd into one of its subsets is called an α-composable core-set for a maximization problem with respect to a function f : 2Rd

→ R if, for any collection of sets V1,· · · , Vm ⊂ Rd, we have f (c(V1)∪ · · · ∪ c(Vm))≥

1

(13)

For simplicity, we will often refer to the set c(P ) as the core-set for P and use the term “core-set function” with respect toc(·). The size of c(·) is defined as the smallest number t such that c(P ) ≤ t for all setsP (assuming it exists). Unless otherwise stated, whenever we use the term “core-set”, we mean a composable core-set.

3

Spectral Spanners

In this section we introduce the notion of spectral spanners and review their properties. In the following, we define the special case of spectral spanners. Later in Definition3.4, we introduce its generalization, spectral k-spanners.

Definition 3.1 (Spectral Spanner). Let V ⊂ Rd be a set of vectors. We say U ⊆ V is an α-spectral

d-spanner forV if for any v∈ V , there exists a probability distribution µvon the vectors inU so that vv⊺  α · E

u∼µv[uu

] . (7)

We study spectral spanners in Section 4, and propose polynomial time algorithms for finding ˜ O(d)-spectral spanners of sized. Considering (7) for allv ∈ V implies that if U ⊆ V is an α-spectral spanner of V , then for any probability distribution µ : V → R+, there exists a distributionµ : U˜ → R+such that

Ev∼µ[vv] α · E

u∼˜µ[uu⊺] . (8) We crucially take advantage of this property in Section6 to develop core-sets for the experimental design problem. Letf :Sd+→ R+be a monotone function such thatf (A)≤ f(B) if A  B. Roughly speaking, we use monotonicity off along (8) to reduce optimizingf on the set of all matrices of the form Ev∼µ[vv⊺] for some distribution µ, to optimizing it on distributions which are only supported on the small set U . A wide range of matrix functions used in practice lie in the category of monotone functions, e.g. determinant, trace. More generally one can seeλi(.) for any i is a monotone function, and consequently the same holds for any elementary symmetric polynomial of the eigenvalues. For polynomial functions of the lower-degree, e.g. trace,detk, the monotonicity can be guaranteed by weaker constraints. Therefore, one should expect to find smaller core-sets with better guarantees for those functions. Motivated by this, we introduce the notion of spectralk-spanners. Let us first define the notationkto generalize.

Definition 3.2 (knotation). For two matricesA, B ∈ Rd×d, we sayA kB if for any Π∈ Pd−k+1, we

havehA, Πi ≤ hB, Πi.

In particular note thatA d B is equivalent to A  B and A 1 B is the same as tr(A) ≤ tr(B), since P1 = RdandPd= I. More generally, the following lemma can be used to check if AkB.

Lemma 3.3. LetA, B∈ Rd×dbe two symmetric matrices. ThenAkB if and only ifPd

i=kλi(B− A) ≥ 0.

Proof. Suppose thatAkB. Then by definition for any Π∈ Pd−k+1,hB − A, Πi ≥ 0, so combining with

Lemma 2.5, we get 0 min Π∈Pd−k+1hB − A, Πi = d X i=k λi(B− A).

(14)

Now, we are ready to define spectralk-spanners.

Definition 3.4 (Spectral k-Spanner). Let V ⊂ Rd be a set of vectors. We say U ⊆ V is an α-spectral k-spanner for V if for any v∈ V , there exists a probability distribution µv on the vectors inU so that

vv⊺

k α· Eu∼µv[uu

] . (9)

We may dropk, whenever it is clear from the context. Finally, we remark that spectral k-spanners have the composability property: If U1, U2 areα-spectral spanners of V1, V2 respectively, then U1 ∪ U2 is an α-spectral spanner of V1∪ V2. This property will be useful to construct composable core-sets.

We will prove the first part ofTheorem 1.2in Section5. The second part of the theorem shows almost optimality of our results: We cannot get better than anΩ(d)-spectral spanner in the worst case unless the spectral spanner has size exponential ind. Next, here we prove the second part of the theorem.

First, let us prove the claim fork = d. Let V be a set of12edǫ/8independently chosen random±1 vectors in Rd. By Azuma-Hoeffding inequality and the union bound, we get that

P "

∀u, v ∈ V : |hu, vi| ≤r 1 2d

1+ǫ #

≥ 1 − |V |2e−dǫ/4≥ 1/2.

So, let V be a set where for all u, v ∈ V , hu, vi2 1

2d1+ǫ. We claim that anyd1−ǫ-spectral spanner of V must have all V . Let U be such a spanner and suppose v ∈ V is not in u. We observe that vv⊺ 6 d1−ǫEu∼µuufor anyµ supported on U . This is because for any µ supported on U ,

Eu∼µhv, ui2 ≤ Eu∼µ1 2d 1+ǫ 1 2d 1+ǫ = 1 2d1−ǫd 2 = 1 2d1−ǫhv, vi 2 as desired.

Now, let us extend the above proof tok < d. Firstly, we construct a set V ⊆ Rkof12ekǫ/8independently chosen random ±1 vectors in Rk. By above argument V has no k1−ǫ-spectral k-spanner. Now define V′ ⊆ Rdby appendingd− k zeros to each vector in V . It is not hard to see that any α-spectral k-spanner of V is also an α-spectral k-spanner of V′. Therefore, anyk1−ǫ-spectralk-spanner of V′has all vectors ofV′.

4

Spectral Spanners in Full Dimensional Case

In this section we proveTheorem 1.2for the casek = d. In this case we have a slightly better bound. So, indeed we will proveTheorem 1.1. As alluded to in the introduction we design a greedy algorithm that can be seen as a spectral analogue of the classical greedy algorithms for finding combinatorial spanners in graphs. The details of our algorithm is in Algorithm1.

1 Function Spectral d-Spanner(V ,α) 2 LetU =∅

3 While there is a vectorv ∈ V such that the polytope

Pv ={x| ∀u ∈ U, hx, vi >√α|hx, ui|}

is nonempty. Find such vectorv and let x be any point in Pv. Add argmaxu∈Vhu, xi2 toU 4 If there are no such vectors, terminate the algorithm and outputU .

(15)

Note that for any vectorv we can test whether Pv is empty using linear program. Therefore, the above algorithm runs in time polynomial in|V | and d.

As alluded to in Section1.2, we first prove that our algorithm constructs a weak ˜O(d)-spectral spanner. Let us recall the definition of weak spectral spanner.

Definition 4.1 (Weak Spectral Spanner). A subsetU ⊆ V ⊂ Rdis a weakα-spectral spanner of V , if for

allv∈ V and x ∈ Rdthere is a probability distributionµ

v,xonU such that hv, xi2 ≤ α · Eu∼µv,xhu, xi

2 equiv hv, xi2 ≤ α · max u∈Uhu, xi

2

In the rest of this section, we may call spectral spanners strong to emphasize its difference from weak spectral spanners defined above. The rest of this section is organized as follows: In Section 4.1we prove that the output of the algorithm is a weakα-spectral spanner of size O(d log d) for α = Ω(d log2d). Then, in Section4.2we prove that for anyα, any weak α-spectral spanner is a strong α-spectral spanner.

4.1 Construction of a Weak Spectral Spanner

In this section we show that Algorithm1returns anα-spectral d-spanner when α is sufficiently larger than d. At the end of this section we discuss an alternative algorithm for finding a weak spanner that achieves slightly better approximation guarantee. However, we believe that this algorithm is simpler to implement, easier to parallelize and can be tuned for practical applications.

Proposition 4.2. There is a universal constantC > 0 such that for α≥ C · d log2d, Algorithm1returns a weakα-spectral spanner of size O(d log d).

First, we observe that for anyα, the output of the algorithm is a weak α-spectral spanner. For the sake of contradiction, suppose the output setU is not a weak α-spectral spanner. So, there is a vector v∈ V and x∈ Rdsuch that

hx, vi2 > α· max u∈Uhx, ui

2 (10)

We show thatPv is non-empty, which impliesU cannot be the output. We can assumehx, vi > 0, perhaps by multiplying x by a−1. So the above equation is equivalent to hx, vi > √α· maxu∈U|hu, xi| , which impliesx∈ Pv.

It remains to bound the size of the output set U . As alluded to in the introduction, the main technical part of the proof is to show that the rank of lower triangular matrices is robust under small perturbations. To bound the size ofU we will construct such a matrix and we will useLemma 4.3(see below) to bound its rank. Letu1, u2, . . . , umbe the sequence of vectors added to our spectral spanner in the algorithm, i.e.,uiis thei-th vector added to the set U . By Step 2 of the algorithm for any uithere exists a “bad” vectorxi∈ Rd such that

hui, xii2> α· max

1≤j<ihuj, xii 2,

Furthermore, by construction,uiis the vector with largest projection ontoxi, i.e.,ui = argmaxu∈Vhxi, ui2. Define inner product matrixM ∈ Rm×m

Mij =hui, xji.

By the above conditions on the vectors ui, xj, M is diagonally dominant and for all 1 ≤ i ≤ m and 1 ≤ j < i we have Mj,i ≤ M√i,iα. So the assertion of the Lemma 4.3holds forM and ǫ = √1α. By the

(16)

lemma, rank(M )≥ C · min  4α log2α, m log m  ,

whereC for some constant C > 0. But, it turns out that rank(M )≤ d as it can be written as the product of anm× d matrix and a d × m matrix. Setting α = d logC2d, implies|U| = m ≤ 2d log dC for large enoughd, as desired. It remains to prove the following lemma.

Lemma 4.3. LetM ∈ Rm×m be a diagonally dominant and approximately lower triangular matrix in the

following sense

Mj,i≤ ǫ · Mi,i,∀ 1 ≤ j < i ≤ m, (11)

Then, there is a universal constantC > 0 such that we have rank(M )≥ C · min   1 ǫ log1ǫ 2 ,log mm  . Proof. Without loss of generality, perhaps after scaling each column ofM by its diagonal entry, we assume Mi,i = 1 for all i. Note that rank and (11) is invariant under scaling, so it is enough to prove the statement for such a matrix. Let Ms denote the top left s× s principal submatrix of M for some integer s ≤ m that we specify later. Note that rank is monotonically decreasing under taking principal sub-matrices, so this operations does not increase the rank and showing the assertion of the lemma onrank(Ms) proves the lemma. Furthermore, (11) is closed under taking principal sub-matrices. We can writeMs = L + E such that

• L ∈ Rs×sis a lower triangular matrix whereLi,i = 1 and|Li,j| ≤ 1, for any 1 ≤ j ≤ i ≤ s. In particular,kLk = 1.

• kEk∞≤ ǫ (note that we may further assume E is upper triangular, but we do not use it in our proof). Letσ1(Ms)≥ . . . ≥ σs(Ms) denote singular values of Ms. Obviously,σi(Ms) > 0 implies rank(Ms)≥ i for any1 ≤ i ≤ s. Considering this fact, let us give some intuition on why Ms has a large rank. SinceL is lower-triangular with non-zero entries on the diagonal, it is a full rank matrix. Moreover, entries ofE are much smaller than (diagonal) entries of L. Singular values of E are on average much smaller than those ofL, so adding E to L can only make a small fraction of singular values of L vanish. This implies that Ms= L + E must have a high rank. Now we make the argument rigorous.

LetS(Ms),S(L), S(E) be the symmetrized versions of Ms, L and E respectively (seesubsection 2.1). ByFact 2.6, to showσi(Ms) > 0 for some i, we can equivalently prove λi(S(Ms)) > 0. We use Lemma

2.4: SettingA =S(L) and B = S(E), for any pair of integers ℓ < k ≤ s such that

λk(S(L)) + λ2s−ℓ(S(E)) > 0 (12) we haveλk−ℓ(S(Ms)) > 0. So to prove the lemma, it suffices to find s, k and ℓ satisfying the above and k− ℓ ≥ C · min   1 ǫ log1 ǫ 2 ,log mm 

for some constantC.

To find proper values ofk and ℓ, we use the following two claims. Claim 4.4. For any≤ s,

λ2s−ℓ(S(E)) ≥ λ2s−ℓ+1(S(E)) = −σℓ(E)≥ −kEk√ F ℓ ≥

−ǫ · s √

(17)

Claim 4.5. For anyk < s2,

λk(S(L)) = σk(L)≥ k − 1 s2

k−1s . Therefore, to show (12) it is enough to show

s log √ ℓ ǫs > (k− 1) log s2 k− 1, (13) for k, ℓ ≤ s

2. We analyze the above in two cases. Ifm log m ≤ ǫ12, then one can see that for s = m,

k = 4 log mm ⌋ and ℓ = ⌊8 log mm ⌋, and large enough m, (13) holds. It implies that in this caserank(Ms) ≥ k− ℓ ≥ m

8 log m, thus we are done. Now suppose that ǫ12 ≤ m log m. We set s < m to be the largest integer

such that s log s 16ǫ12. Next, we letℓ = ⌊4ǫ2s2⌋. Note that s log s ≤ 16ǫ12 implies ℓ ≤ 4 log ss . Now

applyingℓ =⌊4ǫ2s2⌋ into (13) turns it into

s > (k− 1) log s 2 k− 1. So, fork =⌊ s

2 log s⌋, the above is satisfied. Furthermore, in this case k − ℓ = ⌊2 log ss ⌋ − ⌊4ǫ2s2⌋ ≥ 4 log ss ≥ 1

256ǫ2log2 1 ǫ

, ass is the largest number such that s log s 16ǫ12 andlog s≤ 2 log 1ǫ. So the lemma holds for

C 2561 .

Proof of Claim4.4. By Fact2.7we know that s

X i=1

σi(E)2 =kEk2F ≤ kEk2∞· s2 ≤ ǫ2· s2.

Now, by Markov inequality we getσℓ(E)2 ≤ ǫ

2s2

ℓ . Therefore, the claim is proved.

Proof of Claim4.5. SinceL is lower-triangular, we have that s Y i=1 σi(L) = det L = s Y i=1 Li,i = 1, (14)

It follows that for anyk≤ s, k−1 Y i=1 σi(L) = 1 Qs j=kσj(L) ≥ 1 σk(L)s−k+1 . (15)

Now, we use the Frobenius norm to prove an upper bound on the firstk− 1 singular values. By Fact2.7, s

X i=1

σi(L)2 =kLk2F ≤ kLk∞· s2 = s2, (16)

By AM-GM inequality we get k−1 Y i=1 σi(L)≤ Pk−1 i=1 σi(L)2 k− 1 !k−12 ≤ kLk 2 F k− 1 k−12 ≤  s2 k− 1 k−12 .

(18)

The above together with (15) provesσk(L) ≥ k−1s2

2(s−k+1)k−1

. Noting that fork s2, 2(s− k + 1) ≥ s completes the proof of the claim.

We would like to thank an anonymous reviewer for suggesting an alternative algorithm for finding weak spanners. It offers slightly better guarantees, and finds a weakd-spectral spanner of size O(d log d). How-ever, as we argue, our algorithm can be made more efficient in practice, and in particular in a distributed setting.

An alternative Algorithm. For a subset V ∈ Rd, define sym(V ) to be the symmetric set sym(V ) = V ∪ {−x|x ∈ V }. A direct application of the separating hyperplane theorem shows that a subset U ⊆ V is a weakα-spectral spanner of V , if conv(sym(V )) √α· conv(sym(U)) where conv refers to the convex hull of the set. Knowing this, we can apply the celebrated result of F. John [B+97] to get a weak d-spectral spanner. Letting the notation MVEE of a set denote the minimum volume ellipsoid enclosing the set, it implies that there exists a subsetU ⊂ V of size O(d2) and an ellipsoid E where E = MVEE(sym(U )) = MVEE(sym(V )) and √E

d ⊆ conv(sym(U)). Therefore, U is a weak d-spectral spanner of V with size O(d2). Moreover, the size of U can be reduced to O(d) by using the spectral sparsification machinery of [BSS12]. We also note that these ideas has been extended in [Bar14] to show that for anyd-dimensional symmetric convex body C and any 0 < ǫ < 1, there is a polytope P with roughly d1ǫ vertices such that

P ⊂ C ⊂ (√ǫd)P . In our terminology, it gives an algorithm to find a weak d-spectral spanner of V of size O(d). Although, the approximation guarantee can be improved by a log factor in this algorithm, this improvement comes at a cost. First of all, finding the John’s ellipsoid requires solving a semidefinite program withO(d2) variables whereas in Algorithm1, we only need to solve linear programs withO(d) many variables. This requires polynomially smaller amount of memory. Furthermore, note that the main computational task of each step of Algorithm1is to solve|V | feasibility LPs where each of them has ˜O(d) variables and constraints. These LPs can be solved in parallel: having access toO(|V |) many processors, our greedy algorithm runs in poly(d) time in PRAM model of computation. This extreme parallelism cannot be achieved using the above approach. Finally, in finding the weak spanner one can tune the value ofα in (1) based on the structure of the given data points and the ideal size of the core-set, making the algorithm more suitable for applications.

4.2 From Weak Spectral Spanners to Strong Spectral Spanners

In this section, we prove that ifU is a weak α-spectral spanner of V , then it is a strong α-spectral spanner ofV . Combining withProposition 4.2it provesTheorem 1.1.

Lemma 4.6. For any set of vectors V ⊂ Rd, any weak α-spectral spanner of V is a strong α-spectral

spanner ofV .

Proof. Let U be a weak α-spectral spanner of V . Fix a vector v ∈ V , we write a program to find a probability distributionµv : U → R+such thatvvT  1δ· Eu∼µvuuT, for the largest possible δ. It turns

out that this is a semi-definite program, where we have a variablepu = Pµv(u) to denote the probability of

each vectoru∈ U, see (17) for details.

max δ

s.t δ· vvT  Eu∼µvuu

T µv is a distribution onU

(19)

To prove the lemma, it suffices to show the optimal of the program is at least α1. To do that, we analyze the dual of the program. We first show the set of feasible solutions of the program has a non-empty interior; this implies that the Slater condition is satisfied, and the duality gap is zero. Then we show any solution of the dual has value at least1/α.

To see the first assertion, we letµvbe equal to the uniform distribution onU and δ ≤ α|U|1 . It is not hard to see that this is a feasible solution of the program sinceU is a weak α-spectral spanner.

Next, we prove the second statement. First we write down the dual. min λ

s.t. uTXu≤ λ, ∀u ∈ U vTXv≥ 1

X  0

Let(X, λ) be a feasible solution of the dual. Our goal is to show λ≥ 1

α. LetE ={x ∈ Rd| xTXx≤ λ} be an ellipsoid of radius√λ defined by X. The set E has the following properties:

• Convexity,

• Symmetry: If x ∈ E, then −x ∈ E,

• U ⊆ E: By the dual constraints u⊺Xu≤ λ for all u ∈ U. Let¯v = v/√α. We claim that ¯v∈ E. Note that if ¯v ∈ E we obtain

λ≥ ¯vX ¯vα1,

which completes the proof.

For the sake of contradiction suppose v /¯ ∈ E. We show that U is not a weak α-spectral spanner. By convexity ofE there is a hyperplane separating ¯v from E. So there is a vector e∈ Rdsuch that

hv, ei =√α· h¯v, ei ≥√α and ∀x ∈ E, hx, ei < 1.

Moreover, by symmetry ofE, for any x∈ E,

hx, ei2 ≤ max{hx, ei, h−x, ei}2 < 1

Finally, sinceU ⊂ E, we obtain maxu∈Uhu, ei2 < 1. Therefore,hv, ei2 6≤ α max

u∈Uhu, ei2 which implies U is not a weak α-spectral spanner.

5

Construction of Spectral k-Spanners

In this section we extend our proof on spectrald-spanner to spectral k-spanners for k < d, this proves our main theorem 1.2. Here is our high-level plan of proof: First we use the greedy algorithm of [C¸ MI09a] for volume maximization to find an ˜O(k)-dimensional linear subspace of Rdonto which input vectors have a “large” projection. Next, we apply Theorem 1.1 to this ˜O(k)-dimensional space to obtain the desired spectralk-spanner.

(20)

5.1 Greedy Algorithm for Volume Maximization

In this subsection, we prove the following statement.

Proposition 5.1. For any set of vectorsV ⊂ Rd, and anyk < d and m > 2k, there is a set U ⊆ V of size m such that for all v ∈ V we have

vU⊥v⊺

U⊥ k2m( 2k m)· E

u∼µuu⊺, (18)

wherevU⊥ = ΠhUi(v) is the projection of v on the space orthogonal to the span of U and µ is the uniform

distribution on the setU .

As we will see in the next subsection, form = Θ(k log k), the set U promised above will be a part of our spectralk-spanner. Roughly speaking, to obtain a spectral spanner of V , it is enough to additionally add a spectral spanner ofhUi(v)}v∈V. In the next subsection, will useTheorem 1.1for the latter part.

First, we will describe an algorithm to find the setU promised in the proposition. Then, we will prove the correctness. We use the greedy algorithm of [C¸ MI09b] for volume maximization to find the set U .

1 Function Volume Maximization(V ,m) 2 LetU =∅

3 While|U| < m, add argmaxv∈VhUi⊥(v)k to U

4 ReturnU .

Algorithm 2: Greedy Algorithm for Volume Maximization

Let U = {u1, . . . , um} be the output of the algorithm and suppose ui is the i-th vector added to the set, andµ be a uniform distribution on U . Fix a vector v∈ V for which we will verify the assertion of the proposition. Note that ifv∈ U the statement obviously holds. So, assume v /∈ U.

Fix a Π ∈ Πd−k+1. Observe thathvU⊥vU⊺⊥, Πi ≤ kΠhUi⊥(v)k2. On the other hand, byLemma 2.5,

hEu∼µuu⊺, Πi ≥ Pdi=kλi whereλ1 ≥ λ2 ≥ . . . ≥ λdare eigenvalues of Eu∼µuu⊺. Therefore, to prove (18), it suffices to prove kΠhUi⊥(v)k2 ≤ 2m 2k m · d X i=k λi. (19)

Define uˆ1, ˆu2, . . . , ˆum to be an orthonormal basis ofhUi obtained by the Gram-Schmidt process on u1, . . . , um, i.e., uˆ1 = kuu11k,uˆ2 =

Π

hu1i⊥(u2)

hu1i⊥(u2)k and so on. Define M ∈ R

m to be a matrix where thei th column is the representation ofuiin the orthonormal basis formed by{ˆu1, . . . , ˆum}, i.e., for all 1 ≤ i, j ≤ m,

Mi,j =huj, ˆuii. Note that Eu∼µuu⊺is the same as 1

mM M⊺up to a rotation of the space. In other words, both matrices have the same set of non-zero eigenvalues. Since eigenvalues of m1M M⊺are the squares of the singular values of√1

mM , to prove (19) it is enough to show

hUi⊥(v)k2 ≤ 2m 2k m · m X i=k σi2(m−1/2M ). (20)

Sincev ∈ V and v /∈ U we get kΠhUi⊥(v)k2 ≤ kΠu1,...,ˆum−1i⊥(um)k2 = Mm,m2 . So to prove (20), it

suffices to show Mm,m2 ≤ 2m2km · m X i=k σi2(m−1/2M ) (21)

(21)

Note that the above inequality can be seen just as a property of the matrix M . First, let us discuss properties ofM that we will use to prove the above:

I) M is upper-triangular as ui∈ hˆu1, . . . , ˆuii.

II) By description of the algorithm, for anyi < j ≤ m we have

Mi,i2 =u1,...,ˆui−1i⊥(ui)k2 ≤ kΠu1,...,ˆui−1i⊥(uj)k2 =

j X

ℓ=i

Mℓ,j2 (22)

The following lemma completes the proof ofProposition 5.1.

Lemma 5.2. LetM ∈ Rm×msatisfying (i) and (ii). For anyk < m/2, we have

Mm,m2 ≤ 2m2km m X i=k 1 mσ 2 i(M ).

Proof. Here is the main idea of the proof. First, we use Cauchy-Interlacing theorem along with property (ii) to deduceσicannot be much larger thanMi,i. Then, we combine it with the fact thatM is upper triangular and sodet(M ) =Qm

i=1Mi,i= Qm

i=1σi, to upper-bound Mm,m2 by a multiple of Pm

i=kσi2(M ). First, we show for all1≤ i ≤ m,

σ2i(M )≤ (m − i + 1)Mi,i2. (23) DefineMito be the(m− i + 1) × (m − i + 1) matrix obtained by removing the first i − 1 rows and columns ofM . First, Cauchy interlacing theorem tells us σi(M ) ≤ σ1(Mi). Secondly, by Fact2.7and property (ii) we have σ1(Mi)2 ≤ m−i+1 X j=1 σj(Mi)2 =kMik2F ≤ (m − i + 1)Mi,i2.

This proves (23). SinceM is upper-triangular,

det(M )2 = m Y i=1 Mi,i2 = m Y i=1 σi2(M ) Pm j=kσj(M )2 m− k + 1 !m−k+1 k−1 Y i=1 σi2 =: β k−1 Y i=1 σi2

where the inequality follows by the AM-GM inequality andβ = ( Pm j=kσj(M )2 m−k+1 )m−k+1. By (23), m Y i=1 σi2(M )≤ β k−1 Y i=1 (m− i + 1)Mi,i2 ≤ mkβ k−1 Y i=1 Mi,i2. (24) UsingQm i=1σi2= Qm

i=1Mi,i2 again, we get m Y i=k

Mi,i2 ≤ mkβ.

Using property (ii) again, we haveM2

i,i≥ Mm,m2 for alli. Therefore, Qm

i=kMi,i2 ≥ Mm,m2(m−k+1), we get Mm,m2(m−k+1) ≤ m

The lemma follows by raising both sides to 1/(m − k + 1) and using that m − k + 1 ≥ m/2 since k < m/2.

(22)

5.2 Main algorithm

In this section we prove Theorem1.2. The details of our algorithm are described in Algorithm3.

1 Function Spectral-k-Spanner(V ,α,k) 2 Setm, such that m

2k

m = O(1).

3 Run Volume-Maximization(V ,m) of Algorithm2and letU be the output, i.e., the set of vectors satisfying (18).

4 3. Run Spectral-d-Spanner({ΠhUi(v)}v∈V,α) of Algorithm1and letW be the output of the corresponding spectralm-spanner.

5 4. ReturnU ∪ {v : ΠhUi(v)∈ W }.

Algorithm 3: Finds anα-Spectral k-Spanner

First of all let us analyze the size of the output. By definition, Algorithm2returnsm vectors. Then, by

Theorem 1.1, Algorithm 1has size at mostO(m log m). Since m = O(k log k), the size of the output is |U| + |W | ≤ O(k log2k), as desired.

In the rest of this section we prove the correctness. Fix a vectorv∈ V , we need to find a distribution µv onU∪ W such that vv

kαEu∼µvuu⊺for someα = ˜O(k).

First, byFact 2.1,

vv⊺

k 2(ΠhUi⊥(v)ΠhUi⊥(v)⊺+ ΠhUi(v)ΠhUi(v)⊺)

So, it is enough to prove that

ΠhUi⊥(v)ΠhUi⊥(v)⊺+ ΠhUi(v)ΠhUi(v)⊺k(α/2)Eu∼µvuu⊺ (25)

We proceed by upper-bounding the LHS term by term. ByProposition 5.1,

ΠhUi⊥(v)ΠhUi⊥(v)⊺k O(1)· Eu∼µuu⊺ (26)

whereµ is the uniform distribution on U . So, to prove the above, it is enough to find a distribution µv on U ∪ W such that

ΠhUi(v)ΠhUi(v)⊺

k αEu∼µvuu

(27)

for someα = ˜O(k). From now on, for any vector v∈ V we use ˆv to denote ΠhUi(v).

By description of the algorithm, {ˆv}v∈W is anO(m log2m)-spectral m-spanner for{ˆv}v∈V. So, there exists a probability distributionνvonW such that

ˆ vˆv⊺

 O(m log2m))Ew∼νvw ˆˆw

equiv

∀x ∈ hUi : hx, ˆvi2≤ O(m log2m))· Ew∼νvh ˆw, xi

2 (28) In fact the above holds for anyx ∈ Rd, ashx, ˆui = hΠhUi(x), ˆui for any vector u ∈ V . Therefore, for any Π∈ Πd−k+1, by summing (28) up over an orthonormal basis ofΠ and noting m = O(k log k), we get

hΠ, ˆvˆv⊺i ≤ ˜O(k)· hE

w∼νvw ˆˆw

, Πi, which by definition implies

ˆ vˆv⊺

k O(k)˜ · Ew∼νvw ˆˆw

. (29)

Therefore, to show (27) forα = ˜O(k) it suffices to find a distribution µvonU ∪ W such that Ew∼ν

vw ˆˆw



kO(1)· Eu∼µvuu

(23)

But, observe that for anyw∈ W , we can write ˆ w ˆw⊺ k 2  ww⊺+ Π hUi⊥(w)ΠhUi⊥(w)⊺  kO(1)· (ww⊺+ Eu∼µuu⊺)

whereµ is the uniform distribution over U . The first inequality follows byFact 2.1and the second inequality follows by equation (26) which holds for all vectorsv∈ V . Averaging out the above inequality with respect to the distributionνv completes the proof.

6

Applications

In this section we discuss applications ofTheorem 1.2in designing composable core-sets. As we discussed in the intro, we show that for many problems spectral spanners provide almost the best possible composable core-set in the worst case. Next, we see that for any functionf that is “monotone” on PSD matrices, spectral spanners provide a composable core-set for a fractional budgeted minimization problem with respect tof . Later, in Sections 6.1 and 6.2 we see that for a large class of monotone functions the optimum of the fractional budgeted minimization problem is within a small factor of the optimum of the corresponding integral problem. So, spectral spanners provide almost optimal composable core-sets for several spectral budgeted minimization problems.

LetV ⊂ Rdbe a set of vectors. For a function f : S+

d → R+on PSD matrices and a positive integer B denoting the budget, the fractional budgeted minimization problem is to choose a mass B of the vectors ofV , i.e.,{sv}v∈V wherePvsv ≤ B, such that f(PvsvvvT) is minimized. This can be modeled as a continuous optimization problem, seeBMfor details.

inf f X v∈V svvvT ! . s.t X v∈V sv ≤ B sv ≥ 0, ∀v ∈ V BM

Definition 6.1 (k-monotone functions). We say a function f : S+d → R+ is k-monotone for an integer 1≤ k ≤ d, if for all PSD matrices A, B  0, we have A kB implies f (A)≥ f(B).

We say f is vector k-monotone if for all PSD matrices A, B and all vectors v ∈ Rd, if vv k B,

then f (A + vv⊺) ≥ f(A + B). Note that any k-monotone function is obviously vector k-monotone as A + vv⊺

kA + B.

We show that an algorithm for findingα-spectral k-spanners give an α-composable core-set function for the fractional budgeted minimization for any functionf that is vector k-monotone. We emphasize that our composable core-set does not depend on the choice off as long as it is vector monotone.

Proposition 6.2. For any 1 ≤ k ≤ d and any vector k-monotone function f : S+d → R+, Algorithm

(24)

BM(V ,f ,B), where for any t > 0,

β(f, t) = sup A∈S+d

f (A) f (tA).

Proof. LetV1, V2, . . . , Vp be p given input sets for an arbitrary integer p, and letSpi=1Vi = V . For each 1≤ i ≤ p, let Uibe the output of Spectralk-Spanner(Vi,k,α). ByTheorem 1.2, forα = ˜O(k),|Ui| ≤ ˜O(k). LetU = U1∪ · · · ∪ Up.

Fix ak-monotone function f and a budget B > 0 and let s ={sv}v∈V be a feasible solution ofBM(V , f , B). To prove the assertion we need to show that there exists a feasible solution ˜s ofBM(U, f, B) such that f X u∈U ˜ suuuT ! ≤ β(f, α) · f X v∈V svvvT ! (30)

By composability property of spanners,U is an α-spectral k-spanner of V . Therefore, for any v∈ V , there exists a probability distributionµvonU such that vvT kα· Eu∼µvuu⊺. SayV ={v1, . . . , vm}. It follows

by vectork-monotonicity of f and by U bein an α-spectral k-spanner that

f m X i=1 sviviv ⊺ i ! ≥ f αsv1Eu∼µv1uu ⊺+ m X i=2 sviviv ⊺ i ! ≥ f 2 X i=1 αsviEu∼µviuu ⊺+ m X i=3 sviviv ⊺ i ! ≥ · · · ≥ f m X i=1 αsviEu∼µviuu T !

Now, define˜s by ˜su = Pv∈V svPµv[u] for any u ∈ U. It is straight-forward to see that this is a feasible

solution ofBM(V, f, B) sinceP u∈Us˜u = P v∈V P u∈UPµv[u] = P v∈V sv≤ B. Therefore, f X v∈V svvv⊺ ! ≥ f X v∈V αsvEu∼µvuu ⊺ ! = f X u∈U α˜suuu⊺ ! ≥ β(f, α)f X u∈U ˜ suuu⊺ ! ,

This proves (30) as desired.

In general, we may not solve BM efficiently if f is not a convex function. It turns out that if f is convex and has a certain reciprocal multiplicity property then the integrality gap of the program is small, so assuming further thatf is (vector) k-monotone, by the above theorem we obtain a composable core-set for the corresponding integral budgeted minimization problems. In the next two sections we explain two such families of functions namely determinant maximization and optimal design.

6.1 Determinant Maximization

In this section, we use Proposition 6.2 to prove Theorem 1.3. Throughout this section, for an integer 1 ≤ k ≤ d we let f : S+d → R+ be the mapA 7→ − det

k(A)1/k. It follows from theory of Hyper-bolic polynomials that for any 1 ≤ k ≤ d, − detk(A)1/k is a convex function [G ¨ul97], so one can solve

BM(− det1/kk , V, k) using convex programming. Furthermore, observe that BM(− det1/kk , V, k) gives a re-laxation ofk-determinant maximization problem. Nikolov [Nik15] showed that any fractional solution can be rounded to an integral solution incurring only a multiplicative error ofe.

(25)

Theorem 6.3 ([Nik15]). There is a randomized algorithm that for any set V ⊆ Rd,1 ≤ k ≤ d, and any

feasible solutionx ofBM(− det1/kk , V, k) outputs S⊂ V of size k such that

det k X v∈S vv⊺ ! ≥ e−k max T ∈(V k) det k X u∈T uu⊺ ! (31)

Proof. We include the proof for the sake of completeness. First, we explain the algorithm: For1≤ i ≤ k, choose a vectorv with probability xv

k (with replacement) and call itui. It follows that,

E " det k k X i=1 uiu⊺i !# = X S∈(V k) k! kkΠv∈Sxvdetk ( X v∈S vv⊺)≥ e−k· X S∈(V k) det k X v∈S xvvv⊺ !

where first equality holds, since we havek! different orderings for selecting a fixed set S of k vectors, but by Cauchy-Binet identity the RHS is equal toe−kdetk(P

v∈V xvvv⊺) as desired.

Note that the algorithm we discussed in the above proof may have an exponentially small probability of success but [Nik15] also gives a de-randomization using the conditional expectation method. From now on, we do not need convexity. To useProposition 6.2we need to verify that− det1/kk is (vector)k-monotone. Lemma 6.4. For any integer1≤ k ≤ d, the function − det1/kk is vectork-monotone.

Proof. Equivalently, we show−detkis vectork-monotone. Fix A 0, and decompose it as A =Pa∈Aaa⊺ where we abuse notation and also useA to denote the set of vectors in the decomposition of A. Also, fix a vectorv and suppose vv⊺ 

k B for some B  0. We need to show detk(A + vv⊺) ≤ detk(A + B). By

Lemma 2.9, det k (A + vv ⊺)− det k (A) = X S∈( A k−1) det k X a∈S aa⊺+ vv⊺ ! = X S∈( A k−1) det k−1 X a∈S aa⊺ ! hvv⊺, Π hSi⊥i.

The second equality follows by the fact that detk(Pa∈Saa⊺ + vv⊺) is the same as the determinant of thek× k inner product matrix of all of these k vectors. Using Gram-Schmidt orthogonalization process the latter can be re-written asdetk−1(P

a∈Saa⊺)hvv⊺, ΠhSi⊥i. Since vv⊺ k B for any such S we have

hvv⊺, Π

hSi⊥i ≤ hB, ΠhSi⊥i. Therefore,

det k (A + vv T) = det k (A) + X S∈( A k−1) det k−1 X a∈S aa⊺ ! hvv⊺, Π hSi⊥i ≤ det k (A) + X S∈(k−1A ) det k−1 X a∈S aa⊺ ! hB, ΠhSi⊥i ≤ det k (A + B).

Références

Documents relatifs

For this particular problem, Gangbo and ´ Swi¸ech proved existence and uniqueness of an optimal transportation plan which is supported by the graph of some transport map.. In

We seek to develop a fluorophore targeting method that combines high labeling specificity with small noninvasive peptide tag, works for intracellular protein labeling

dadantii (upper panel) and expression level of three promoters from different species (lower panel, solid line for the model prediction) [ 23 , 45 ]; (B) Various mechanisms involving

A higher percentage of recovered gel fraction (Fig.9) was obtained for the thiol-ene coupling performed using a total polymer concentration of 20 % (best samples, G9

The upper half of Table 1 displays the regression coefficients for the 6 explanatory variables, and the intercepts, obtained from a standard multiple regression (OLS) and three

Positive/negative ratio and number of positive and negative feedback circuits in 2-PLNs divided by category (n+ s+ , n− s+ , n+ s− and n− s− ) using different thresholds for the

Our result is in the vein of (and extends) Wainwright and Jordan [18, Theorem 3.3], where (when the domain of f is open) the authors have shown that the gradient map ∇ log f is onto

In this paper, we have proved that the exponential- ization process does not converge in general towards the maximum likelihood of the initial model using the SAEM or