Composable core-sets for determinant maximization problems via spectral spanners

(1)

Composable core-sets for determinant

maximization problems via spectral spanners

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Indyk, Piotr et al. “Composable core-sets for determinant

maximization problems via spectral spanners.” Paper in the

Proceedings of the Annual ACM-SIAM Symposium on Discrete

Algorithms, Salt Lake City, UT, January 5 - 8, 2020, Association for

Computing Machinery: 1675–1694 © 2019 The Author(s)

Publisher

Association for Computing Machinery

Version

Original manuscript

Citable link

https://hdl.handle.net/1721.1/129406

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike

(2)

arXiv:1807.11648v2 [cs.DS] 16 Nov 2019

Composable Core-sets for Determinant Maximization Problems

via Spectral Spanners

Piotr Indyk MIT indyk@mit.edu Sepideh Mahabadi TTIC mahabadi@ttic.edu

Shayan Oveis Gharan University of Washington shayan@cs.washington.edu Alireza Rezaei University of Washington arezaei@cs.washington.edu Abstract

We study a generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectorsV ⊆ Rd

, we say a setU ⊆ V is an α-spectral k-spanner, for k ≤ d, if for all v ∈ V there is a probability distributionµvsupported onU such that

vv⊺

kα· Eu∼µvuu

⊺_,

where for two matricesA, B∈ Rd×d_{we write}_A

kB iff the sum of the bottom d− k + 1 eigenvalues

ofB− A is nonnegative. In particular, A dB iff A B. We show that any set V has an ˜O(k)-spectral

spanner of size ˜O(k) and this bound is almost optimal in the worst case.

We use spectral spanners to study composable core-sets for spectral problems. We show that for many objective functions one can use a spectral spanner, independent of the underlying function, as a core-set and obtain almost optimal composable core-sets. For example, for thek-determinant maximization problem, we obtain an ˜O(k)k_{-composable core-set, and we show that this is almost optimal in the worst}

case.

Our algorithm is a spectral analogue of the classical greedy algorithm for finding (combinatorial) spanners in graphs. We expect that our spanners find many other applications in distributed or parallel models of computation. Our proof is spectral. As a side result of our techniques, we show that the rank of diagonally dominant lower-triangular matrices are robust under “small perturbations” which could be of independent interests.

(3)

1 Introduction

Given a graphG with n vertices_{{1, . . . , n}, we say a subgraph H is a α-(combinatorial) spanner if for every} pair of verticesu, v of G,

distH(u, v)≤ α · distG(u, v),

wheredistG(u, v) is the shortest path distance between u, v in G. It has been shown that for any α, G has anα-spanner with only n1+O(1)/αmany edges and that can be found efficiently [EP04]. Such a spanner can be found by a simple algorithm which repeatedly finds and adds an edgef = (u, v) where distH(u, v) > α. Combinatorial spanners have many applications in distributed computing [Pel00,FJJ+01,KK00], optimiza-tion [DK99,AGK+98], etc.

In this paper we define and study a spectral generalization of this property. Given a set of vectors V _{⊆ R}d_{, we say a set} _U _{⊆ V is an α-spectral d-spanner of V if for any vector v ∈ V , there exists a} probability distributionµvon the vectors inU such that

vv⊺_{α · E}

u∼µvuu

⊺ _equiv _{hx, vi}2 _{≤ α · E}

u∼µvhx, ui

2_,_{∀x ∈ R}d_.

To see that this is a generalization of the graph case, let bu,v = eu − ev be the vector corresponding to an edge _{{u, v} of G, where e}u is the indicator vector of the vertexu. It is an exercise to show that for V = {be}e∈E(G)and for anyα-combinatorial spanner H of G, the set U = {be}e∈E(H)is anα2-spectral spanner ofV .

The following theorem is a special case of our main theorem.

Theorem 1.1 (Main theorem fork = d). There is an algorithm that for any set of vectors V _{⊆ R}d_{finds an} ˜

O(d)-spectral d-spanner of size ˜O(d) in time polynomial in d and size of|V |1_.

Our algorithm is a spectral generalization of the greedy algorithm mentioned above for finding combi-natorial spanners.

We further study generalizations of our spectral spanners to weaker forms of PSD inequalities. For two matrices A, B we write A k B if for every projection matrix Π onto a d− k + 1 dimensional linear subspace,hA, Πi ≤ hB, Πi. For example, if A kB, then sum of the top k eigenvalues of A is at most the sum of the topk eigenvalues of B. Analogously, we say U _{⊆ V is an α-spectral k-spanner of V , if for any} v_{∈ V , there is a distribution µ}v onU such that vv⊺k α· Eu∼µvuu

⊺_{. In our main theorem we generalize} the above statement to allk≤ d and we show that to construct an ˜O(k) spectral k-spanner we only need to use ˜O(k) many vectors independent of the ambient dimension of the space.

Theorem 1.2 (Main). There is an algorithm that for any set of vectors V ⊆ Rd _{finds an ˜}_{O(k)-spectral} k-spanner of size ˜O(k).

Furthermore, for any ǫ > 0 and k _{≤ d, there exists a set V ⊆ R}d _{of size}_eΩ(kǫ₎

such that anyk1−ǫ

-spectral spanner ofV must have all vectors of V .

Composable core-sets. Our main application of spectral spanners is to design (composable) core-sets for spectral problems. A functionc(V ) that maps V _{⊆ R}d_{into its subset is called an}_{α-composable core-set of} sizet for the function f (·) [AHPV05,IMMM14], if for any collection of setsV1, . . . , Vp⊂ Rd, we have

f (c(V1)∪ . . . ∪ c(Vp))≥ 1

α· f(V1∪ . . . ∪ Vp)

1

(4)

and _|c(Vi)| ≤ t for any Vi. A composable core-set of a small size immediately yields a communication-efficient distributed approximation algorithm: if each setVi is stored on a separate machine, then all ma-chines can compute and transmit core-sets, c(Vi)’s, to a central server, which can then perform the final computation over the union. Similarly, core-sets make it possible to design a streaming algorithm which processes N vectors in one pass using only√N t storage. This is achieved by dividing the stream of data into blocks of size√N t, computing and storing a core-set for each block, and then performing the compu-tation over the union.

In this paper we show that, for a given setVi ∈ Rd, anα-spectral spanner of Vi for a proper value of α provides a good core-set of Vis. Specifically, we show that for many (spectral) optimization problems, such as determinant maximization,D-optimal design or min-eigenvalue maximization, this approach leads to almost the best possible composable core-set in the worst case.

In what follows we discuss a specific application, to determinant maximization, in more detail.

1.1 Composable Core-sets for Determinant Maximization

Determinantal point processes are widely popular probabilistic models. Given a set of vectorsV ⊆ Rd_{, and} a parameterk, DPPs sample k-subsets S of P with probability P(S) proportional to the squared volume of the parallelepiped spanned by the elements ofS. That is,

P_(S)_{∼ det} k (

X v∈S

vvT)

This distribution formalizes a notion of diversity, as sets of vectors that are “substantially different” from each other are assigned higher probability. One can then find the “most diverse”k-subset in P by computing S that maximizes P(S), i.e., solving the maximum a posteriori (MAP) decoding problem:

max S⊂P,|S|=k

P_(S).

We also refer to this problem as thek-determinant maximization problem. Since their introduction to ma-chine learning at the beginning of this decade [KT10, KT11a, KT+12], DPPs have found applications to video summarization [MJK17, GCGS14], document summarization [KT+12, CGGS15, KT11b], tweet timeline generation [YFZ+16], object detection [LCYO16], and other applications. All of the aforemen-tioned applications involve MAP decoding.

Here we use our results on spectral spanners to construct an almost optimal composable core-set for MAP problem. Before mentioning our result let us briefly discuss relevant previous work on this problem. The MAP problem is hard to approximate up to a factor of2−ck _{for some constant} _{c > 0, unless P=NP.} [C¸ MI09a,CMI13]. This lower bound was matched qualitatively by a recent paper of [Nik15], who gave an algorithm withek-approximation guarantee. Since the data sets in the aforementioned applications can be large, there has been a considerable effort on developing efficient algorithms in distributed, streaming or parallel models of computation [MJK17,WIB14,PJG+14,MKSK13,MKBK15,MZ15,BENW15]. All of these algorithms relied on the fact that the logarithm of the volume is a submodular function, which makes it possible to obtain multiplicative factor approximation algorithms (assuming some lower bound on the volume, as otherwise the logarithm of the volume can be negative). See Section1.3 for an overview. However, this generality comes at a price, as multiplicative approximation guarantees for the logarithm of the volume translates into ”exponential” guarantees for the volume, and necessitates the aforementioned extra lower bound assumptions. As a result, to the best of our knowledge, no multiplicative approximation factor algorithms were known before for this problem, for streaming, distributed or parallel models of computation.

(5)

In this paper we present the first (composable) core-set construction for the determinant maximization problem. Our main contributions are:

Theorem 1.3. There exists a polynomial time algorithm for computing an ˜O(k)k_{-composable core-set of}

size ˜O(k), for the k-determinant maximization problem.

Let us discuss the proof of the above theorem for the casek = d using our main theorem1.1. Given sets of vectorsV1, . . . , VmletU1, . . . , Umbe their ˜O(d)-spectral spanners respectively. Let

S = argmax_S⊆∪_i_V_i_:|S|=dP_(S). Consider the matrixA =P

v∈SEu∼µvuuT, that is we substitute each vectorv in S by a convex combination

of the vectors in the spectral spanner(s). Then, by definition of spectral spanner, 1

α X v∈S

vvT A.

Since determinant is a monotone function with respect to the Loewner order of PSD matrices, 1 αdP(S) = det 1 α X v∈S vvT ! ≤ det(A).

The matrixA can be seen as a fractional solution to the determinant maximization problem. In fact [Nik15] showed that A can be rounded to a set T of size_{|T | = d such that det(A) ≤ e}ddet(T ). Therefore, we obtain an(eα)d_{approximation for determinant maximization (see}_{subsection 6.1}_{for more details).}

The technique that we discussed above can be applied to many optimization problems. In general, if instead of the determinant, we wanted to maximize any function f : Rd×d _{→ R}+, that is monotone on the Loewner order of PSD matrices, we can use the above approach to construct a fractional solution A supported on the spectral spanners such thatf (A) is at least the optimum (up to a loss that depends on α). Then, we can use randomized rounding ideas to round the the matrixA to an integral solution of f . See

subsection 6.2for further examples.

We complement the above theorem by showing the above guarantee is essentially the best possible. Theorem 1.4. Any composable core-sets of size at most kβ _{for the}_{k-determinant maximization problem}

must incur an approximation factor of at least(k_β)k(1−o(1)), for anyβ ≥ 1.

Note that our lower bound of(k_β)k(1−o(1)) for the approximation factor achievable by composable core-sets is substantially higher than the approximation factorek _{of the best off-line algorithm, demonstrating a} large gap between these two models.

1.2 Overview of the Techniques

In this part, we give a high level overview of the proof ofTheorem 1.2. Our proof has two steps: First, we solve the “full dimensional version of the problem, i.e., we construct an Õ(d)-spectral d-spanner of size Õ(d) for a given set of vectors in Rdas promised inTheorem 1.1. Then, we reduce the “low dimensional” version of the problem, i.e., findingk-spanners for k < d, to the full dimensional version in a Õ(k)-dimensional space.

(6)

Step 1: Our high-level plan is to “augment” the classical greedy algorithm for finding combinatorial spanners in graphs to the spectral setting. First, we rewrite the combinatorial algorithm in spectral language. LetG be a graph with vertex set V (G) and edge set E(G). Recall that for any edge e ={u, v} ∈ E(G) be = eu − ev. As alluded to in the introduction, if H is an α-combinatorial spanner of G, then U = {be}_e∈E(H) is an α2-spectral spanner of {be}_e∈E(G). The following algorithm gives anα-combinatorial spanner withn1+O(1)/α_{edges: Start with an empty graph}_{H. While there is an edge f =}_{{u, v} in G where} distH(u, v) > α, add it to H. One can observe that distH(u, v) > α iff, for any distribution µ on E(H), bfbTf 6 α2Ee∼µbebTe.

This observation suggests a natural algorithm in the spectral setting: At each step find a vector v _{∈ V} such that for allµ supported on the set of vectors already chosen in the spanner, vvT 6 α · Eu∼µuuT, and add it to the spanner. We can implement such an algorithm in polynomial time, but we cannot directly bound the size of the spectral spanner that such an algorithm constructs using our current techniques.

So, we take a detour. First, we solve a seemingly easier problem by changing the order of quantifiers in the definition of the spectral spanner. ForV ⊆ Rd_{, a subset}_U _{⊆ V is a weak α-spectral spanner of V , if} for allv_{∈ V and x ∈ R}dthere is a distributionµv,xonU such that

hv, xi2 ≤ α · Eu∼µv,xhu, xi

2 _equiv

hv, xi2 ≤ α · max u∈Uhu, xi

2_.

To find a weak spectral spanner, we use the analogue of the greedy algorithm: LetU be the set of vectors already chosen; while there is a vectorv ∈ V and x ∈ Rd_{such that}_{hx, vi}2 _{> α}_{· max}

u∈Uhu, xi2 we add argmax_vhx, vi2 _to_{U .}

We prove that forα = ˜O(d) the above algorithm stops in ˜O(d) steps. Suppose that the algorithm finds vectorsu1, . . . , umtogether with corresponding “bad” directionsx1, . . . , xm, wherexibeing a bad direction foruimeans that

hui, xii2> αhuj, xii2,∀1 ≤ i ≤ m, ∀1 ≤ j < i. (1) We need to show thatm = ˜O(d). We consider the matrix M _{∈ R}m×m_where_M_i,j ₌_hu_i_{, x}_j_{i. By the above} constraintsM is diagonally dominant and approximately lower triangular matrix. But since M has rank at mostd as it is the inner product matrix of vectors lying in Rd, we conclude thatm = ˜O(d). Note that in the extreme case, whereM is truly lower triangular the latter fact obviously holds because then rank(M ) = m. As a side result, we also show that the rank of lower triangular matrices is robust under small perturbations, (seeLemma 4.3).

The above argument shows that the spectral greedy algorithm gives a weak spectral spanner for α = ˜

O(d) of size ˜O(d). To finish the proof ofTheorem 1.1 we need to find a (strong)α-spectral spanner from our weak spanner. We use a duality argument to show that any weak spectral spanner is indeed anα-spectral spanner. Let U be a weak spectral spanner. To verify that U is an α-spectral spanner, we need to find a distribution µv for any v ∈ V supported on U such that vvT α · Eu∼µvuuT. We can find the best

distributionµvusing an SDP with variablespufor allu∈ U denoting Pµv(u). Instead of directly bounding

the primal, we write down the dual of the SDP and use hyperplane separating theorem to show that indeed such a distribution exists.

It was pointed to us by an anonymous reviewer that one can use approximate John’s ellipsoid [B+97] to find anO(d)-weak-spectral d-spanner of size ˜O(d). This improves the guarantees of our algorithm by a log d factor. We discuss the details at the end of Section4.1. Let us briefly mention the advantages of our algorithm over the John’s ellipsoid method: First, in finding the weak spanner one can tune the value ofα in (1) based on the structure of the given data points and the ideal size of the core-set. We also expect that in many real world applications, one can use our algorithm to obtain polylog(d)-spectral d-spanners of

(7)

size ˜O(d). Secondly, to implement our algorithm we only need to solve linear programs with O(d) many variables. This requires polynomially smaller amount of memory compared to the SDP solvers one needs to use to solve the John’s ellipsoid. Lastly, our algorithm is easier to parallelize.

Step 2: To reduce thek-spanner problem to the “full dimensional” case, we use the greedy algorithm of [C¸ MI09a] to find a set of vectorsW _{⊂ V of size ˜}O(k) such that for any v _{∈ V ,}

v_{hW i}⊥v⊺

hW i⊥kO(1)· Ew[ww⊺] (2)

where the expectation is over the uniform distribution onW , and v_{hW i}⊥ represents the projection ofv onto

the space orthogonal to the linear subspace spanned byW . Then, we project all vectors in V onto the space hW i, and we solve the full dimensional version, i.e., we find U ⊆ V of size ˜O(|W |) such that for any v∈ V , there exists a distribution µv supported onU which satisfies

v_{hW i}v⊺ hW i ˜O(k)· Eu∼µv h u_{hW i}u⊺ hW i i . (3)

Ideally on the RHS of the above, we need to have uu⊺ _{instead of} _u

hW iu⊺_{hW i} which can be achieved by incurring an extra constant factor by applying (2). It is not hard to see from the above two equations that U _{∪ W is an ˜}O(k)-spectral k-spanner for V .

It remains to find the set W ⊆ V satisfying (2). We use the following algorithm: LetW = ∅. For i = 1, . . . , ˜O(k), add argmax_v∈V _kv_hUi⊥k to W . Intuitively, we greedily choose a set of vectors of size

˜

O(k) to minimize the projection of the remaining vectors in the orthogonal space ofhW i. To prove (2), we need to show

v_{hW i}⊥v⊺

hW i⊥ kO(1)· Ew∼µ[ww

⊺_{] .}

Equivalently, after choosing the worst projection matrix onto ad− k + 1 linear subspace, it is enough to show kvhW i⊥k2≤ O(1) · d X i=k λi(Ew∼µ[ww⊺]). (4)

To prove the above inequality, we use properties of the greedy algorithm to study singular values of the matrix obtained by applying the Gram-Schmidt process on the vectors inW .

Lower bounds. As we discussed in the intro, it is not hard to prove that the guarantee ofTheorem 1.1is tight in the worst case. However, one might wonder if it is possible to design better composable core-sets for determinant maximization and related spectral problems. We show that for many such problems we obtain the best possible composable core-set in the worst case. Let us discuss the main ideas ofTheorem 1.4.

We consider the casek = d for simplicity of exposition. For a set V ⊆ Rd_{and a linear transformation} Q_{∈ R}d×d_{, define}_{QV =}_{Qv}

v∈V. Choose a setV ⊆ Rdof unit vectors such that for any distinctu, v∈ V , hu, vi2_{≤ 1/d}1−o(1)_{. This can be achieved with high probability by selecting points in}_{V independently and} uniformly at random from the unit sphere. Recall that the setV can have exponentially (in d) large number of vectors. Consider setsA1, . . . , AdandB1, . . . , Bd−1in a(2d− 1)-dimensional space such that:

• For each 1 ≤ i ≤ d, let Ai= RiV where Riis a rotation matrix which maps Rdtohe1, e2, . . . ed−1, e(d−1)+ii and it maps a uniformly randomly chosen vector ofV to e_(d−1)+i.

(8)

• For each 1 ≤ i ≤ d − 1, Bi ={Mei}, where M is a “large” number.

Our instance of determinant maximization is simplyQA1, . . . , QAd, QB1, . . . , QBd−1for a random rota-tion matrixQ_{∈ R}(2d−1)×(2d−1)_.

Observe that the optimal set of 2d− 1 vectors contains Qe(d−1)+i’s from QAi’s and QM ei’s from Bi’s, and has value equal to(Md−1)2. However, sinceQ is a random rotation, the core-set function cannot determine which vector inQAiwas aligned withQe_(d−1)+i. Recall that the core-set algorithm must find a core-set ofAiby only observing the vectors inAi. Thus unless core-sets are exponentially large ind, there is a good probability that, for alli, the core-set for QAidoes not containQe_(d−1)+i. For a sufficiently large M , all vectors QM eifromQBimust be included in any solution with a non-trivial approximation factor. It follows that, with a constant probability, any core-set-induced solution is sub-optimal by at least a factor of dΘ(d).

Paper organization. Insection 3, we formally define spectral(d)-spanners and their generalization “spec-tralk-spanners”. Insection 4, we introduce our algorithm for finding spectral spanners and prove Theo-rem 1.1. Then, insection 5, we generalize the result ofTheorem 1.1for spectralk-spanners and show the reduction from k < d to the full dimensional case of k = d, provingTheorem 1.2. We mention applica-tions of spectral spanners for designing composable core-sets for several optimization problems including k-determinant maximization in section 6. In particular we proveTheorem 1.3. Finally in section 7, we present our lower bound results and proveTheorem 1.4.

1.3 Related work

As mentioned earlier, multiple papers developed composable core-sets (or similar constructions) when the objective function is equal to the logarithm of the volume. In particular, [MKSK13] showed that core-sets obtained via a greedy algorithm guarantee an approximation factor ofmin(k, n). The approximation ratio can be further improved to a constant if the input points are assigned to setVi uniformly at random [MZ15,BENW15]. However, these guarantees do not translate into a multiplicative approximation factor for the volume objective function.2

Core-sets constructions are known for a wide variety of geometric and metric problems, and several algorithms have found practical applications. Some of those constructions are relevant in our context. In particular, core-sets for approximating the directional width [AHPV04] have functionality that is similar to weak spanners. However, the aforementioned paper considered this problem for low-dimensional points, and as a result, the core-sets size was exponential in the dimension. Another line of research [AAYI+13,

IMMM14,CPPU17] considered core-sets for maximizing metric diversity measures, such as the minimum inter-point distance. Those measures rely only on relationships between pairs of points, and thus have quite different properties from the volume-induced measure considered in this paper.

We also remark that one can consider generalizations of our problem to settings were we want to max-imize the volume under additional constraints. Over the last few years several extensions were studied extensively and many new algorithmic ideas were developed [NS16,AO17,SV17,ESV17]. In this paper, we study composable core-sets for the basic version of the determinant maximization problem where no additional constraints are present.

2

It is possible to show that the greedy method achieves composable core-sets with multiplicative approximation factor of2O(k2). Since this bound is substantially larger than our bound obtained by spectral spanners, we do not include the proof in this paper.

(9)

2 Preliminaries

2.1 Linear Algebra

Throughout the paper, all vectors that we consider are column based and sitting in Rd, unless otherwise specified. For a vectorv, we use notation v(i) to denote its ithcoordinate and usekvk to denote its ℓ2norm. Vectorsv1, . . . , vkare called orthonormal if for anyi,kvik = 1, and for any i 6= j, hvi, vji = 0. For a set of vectorsV , we let_{hV i denote the linear subspace spanned by vectors of V . We also use S}⊥_{to denote the} linear subspace orthogonal toS, for a linear subspace S.

Notation_{h, i is used to denote Frobenius inner product of matrices, for matrices A, B ∈ R}d×d

hA, Bi = d X i=1 d X j=1

Ai,jBi,j = tr(AB⊺)

whereAi,j denotes the entry of matrixA in row i and column j. Matrix A is a Positive Semi-definite (PSD) matrix denoted byA_{0 if it is symmetric and for any vector v, we have v}⊺_{Av =}_{hA, vv}⊺_{i ≥ 0. For PSD} matricesA, B we write A_{B if B − A 0. We also denote the set of d × d PSD matrices by S}_d+.

Projection Matrices. A matrixΠ_{∈ R}d×dis a projection matrix ifΠ2 = Π. It is also easy to see that for anyv_{∈ R}d_,

hvv⊺_{, Π}_{i = v}⊺_{Πv =}_{hΠv, Πvi = kΠvk}2_.

For a linear subspaceS, we let ΠS denote the matrix projecting vectors from RdontoS which means for any vectorv, (ΠS)v is the projection of v onto S. If S is k-dimensional and v1, . . . , vkform an arbitrary orthonormal basis ofS, then one can see that ΠS =

Pk i=1viv ⊺ i

. We also represent the set of all projection matrices ontok-dimensional subspaces by_Pk.

Fact 2.1. For any vectorsu, v _{∈ R}d_{and any projection matrix}_Π_{∈ R}d×d h(u + v)(u + v)⊺_{, Π}_{i ≤ 2huu}⊺_{+ vv}⊺_{, Π}_i.

Proof. Leta = Πu and b = Πv. Then since Π2 = Π, we havehuu⊺_{, Π}_{i = kak}2_,_hvv⊺_{, Π}_{i = kbk}2_{, and} h(u + v)(u + v)⊺_{, Π}_{i = ka + bk}2_{. Now, the assertion is equivalent to}

ka + bk2≤ 2(kak2+_kbk2)

which follows by Cauchy-Schwarz inequality,_{ha, bi ≤ kakkbk ≤ (kak}2+_kbk2)/2.

Eigenvalues and Singular Values. For a symmetric MatrixA, λ1(A)≥ . . . ≥ λd(A) denotes the eigen-values of A. The following characterization of eigenvalues of symmetric matrices known as min-max or variational theorem is useful for us.

Theorem 2.2 (Min-max Characterization of Eigenvalues). LetA∈ Rd×d_{be a symmetric matrix with}

eigen-valuesλ1 ≥ λ2≥ . . . ≥ λd. Then λk= max{min

x∈U x⊺_Ax

kxk2 | U is a k-dimensional linear subspace},

or

λk = min{max x∈U

x⊺_Ax

(10)

The following theorem known as Cauchy interlacing theorem shows the relation between eigenvalues of a symmetric matrix and eigenvalues of its submatrices.

Theorem 2.3 (Cauchy Interlacing Theorem). Let A be a symmetric d_{× d matrix, and B be an m × m}

principal submatrix ofA. Then for any 1_{≤ i ≤ m, we have}

λ_d−m+i(A)≤ λi(B)≤ λi(A)

We also use the following lemma which is an easy implication of min-max characterization to bound eigenvalues of summation of two matrices in terms of the summation their eigenvalues.

Lemma 2.4. LetA, B_{∈ R}d×d_{be two symmetric matrices. Then}_λ

i+j−d(A + B)≥ λi(A) + λj(B) for any i, j with i + j− d > 0.

Proof. Following the min-max characterization of eigenvalues, let SA and SB be two i-dimensional and j-dimensional linear subspaces for which we have

λi(A) = minx∈SA

x⊺_Ax

kxk2 and λj(B) = minx∈SB

x⊺_Bx

kxk2

Then let S = SA∩ SB. The dimension of S is at least i + j− d, and by min-max characterization of eigenvalues we have λ_i+j−d(A + B)_{≥ min} x∈S x⊺_{(A + B)x} kxk2 ≥ min_x∈S A x⊺_Ax kxk2 + min_x∈S B x⊺_Bx kxk2 = λi(A) + λj(B), hence the proof is complete.

Moreover, we take advantage of the following simple lemma which is also known as extremal partial trace. A proof of it can be found in [Tao10].

Lemma 2.5. LetL∈ Rd×d_{be a symmetric matrix. Then for any integer}_n_{≤ d,}

min Π∈PnhΠ, Li = d X d−n+1 λi(L).

In particular, we use it to conclude that ifx1, . . . , xn ∈ Rdare orthonormal vectors, then n X i=1 x⊺ iLxi ≥ d X d−n+1 λi(L).

For a matrixA, we use σ1(A)≥ . . . ≥ σd(A)≥ 0 to denote singular values of A (for symmetric matrices they are the same as eigenvalues). Given a matrix, we use the following simple lemma to construct a symmetric matrix whose eigenvalues are the singular values of the input matrix and their negations.

Many of the matrices that we work with in this paper are not symmetric. Define a symmetrization operator_Sd: Rd×d→ R2d×2dwhere for any matrixA∈ Rd×d,

Sd(A) = 0 A A⊺ ₀ .

(11)

Fact 2.6. For any matrix A _{∈ R}d×d, _{S(A) has eigenvalues σ}1(A) ≥ . . . ≥ σd(A) ≥ −σd(A) . . . ≥ −σ1(A).

Proof. Letu1, . . . , ud and v1, . . . , vd be right and left singular vectors ofA, respectively. Then we have Aui= σiviandA⊺vi= σiuifor any1≤ i ≤ m. Now, it is easy to see [vi ui] and [−vi ui] are eigenvectors for_{S(A) with eigenvalues σ}iand−σifor any1≤ i ≤ m.

Throughout the paper, we work with different norms for matrices. The ℓ2 norm of matrixA, denoted by_kAk2 or justkAk denotes max_kxk2=1kAxk2. Also,kAk∞ = maxi,j|Ai,j| denotes the ℓ∞norm. The

Frobenius norm of A denoted by _kAkF is defined by kAkF = phA, Ai. We use the following identity which relates Frobenius norm to singular values.

Fact 2.7. For any matrixA_{∈ R}d×d_,

kAk2F = d X i=1

σi(A)2.

Determinant Maximization Problem. We use the notion of determinant of a subset of vectors as a mea-sure of their diversity. From a geometric point of view, for a subset of vectorsV = {v1, . . . , vd} ⊂ Rd, det(Pd

i=1vivi⊺) is equal to the square of the volume of the parallelepiped spanned by V . For S, T ⊆ [d], LetAS,T denote the|S| × |T | submatrix formed by intersecting the rows and columns corresponding to S, T respectively. The notation detkis a generalization of determinant and is defined by

det k (A) =

X S∈([d]k)

det AS,S.

In particular, for vectors v1, . . . , vk ∈ Rd, detk(Pki=1vv ⊺

i) is equal to the square of the k-dimensional volume of the parallelepiped spanned byv1, . . . , vk. The problem ofk-determinant maximization is defined as follows.

Definition 2.8 (k-Determinant Maximization). Let V = _{v1, . . . , vn} ⊂ Rd be a set of vectors, and let M ∈ Rn×n _{be the Gram matrix obtained from}_{A, i.e., M}_i,j ₌_hv_i_{, v}_j_{i. For an integer k ≤ d, the goal of the} k-determinant maximization problem is to choose a subset S_{⊆ V such that |S| = k and the determinant of} MS,Sis maximized.

Throughout the paper, we extensively use the Cauchy-Binet identity which states that for any integer k_{≤ d, B ∈ R}k×d_{, and}_C_{∈ R}d×k _{we have}

det(BC) = X S∈(d k)

det(B_[k],S) det(C_S,[k]),

For anyS ⊂ [n](|S| = k), if we let VS∈ Rk×dbe the matrix with{vi}i∈Sas its rows, then we have det(MS,S) = det(VSVS⊺) = det(V

⊺ SVS) = det k ( X v∈S vv⊺_), ₍₅₎

where the last equality uses the Cauchy-Binet identity. The k-determinant maximization is also known as maximum volume k-simplex since detk(P_v∈Svv⊺) is equal to the square of the volume spanned by

(12)

{vi}i∈S. Throughout the paper, we also use the following identity which can be derived from the Cauchy-Binet formula when the columns ofB _{∈ R}d×n_are_v_i_{s and}_{C = B}⊺_.

det( n X i=1 viv_i⊺) = X S∈([n]d) det(X i∈S viv_i⊺). (6)

We use it to deduce the following simple lemma

Lemma 2.9. For any set of vectorsv1, . . . , vn∈ Rdand any integer1≤ k ≤ d,

det k n X i=1 vivi⊺ ! = X S∈([n] k) det k X i∈S vivi⊺ !

Proof. For a setT _{⊂ [d] and any 1 ≤ i ≤ n, let v}i,T ∈ Rkdenote the restriction ofvito its coordinates in T . The proof can be derived as follows

det k n X i=1 vivi⊺ ! = X T ∈([d] k) det n X i=1 vi,Tvi,T⊺ ! By definition ofdet k = X T ∈([d] k) X S∈([n] k) det X i∈S vi,Tv⊺_i,T ! By (6) = X S∈([n]k)    X T ∈([d]k) det X i∈S vi,Tv_i,T⊺ !   = X S∈([n]k) det k X i∈S viv_i⊺ ! By definition ofdet k

We also use the following identities about the determinant of matrices. For ad_{× d matrix A, we have}

det(A) = d Y i=1

σi(A).

IfA is lower(upper) triangular, i.e. Ai,j = 0 for j > i(j < i), we have det(A) = Πdi=1Ai,i.

2.2 Core-sets

The notion of core-sets has been introduced in [AHPV04]. Informally, a core-set for an optimization prob-lem is a subset of the data with the property that solving the underlying probprob-lem on the core-set gives an approximate solution for the original data. This notion is somewhat generic, and many variations of core-sets exist.

The specific notion of composable core-sets was explicitly formulated in [IMMM14].

Definition 2.10 (α-Composable Core-sets). A function c(V ) that maps the input set V _{⊂ R}d into one of its subsets is called an α-composable core-set for a maximization problem with respect to a function f : 2Rd

→ R if, for any collection of sets V1,· · · , Vm ⊂ Rd, we have f (c(V1)∪ · · · ∪ c(Vm))≥

1

(13)

For simplicity, we will often refer to the set c(P ) as the core-set for P and use the term “core-set function” with respect toc(_{·). The size of c(·) is defined as the smallest number t such that c(P ) ≤ t for} all setsP (assuming it exists). Unless otherwise stated, whenever we use the term “core-set”, we mean a composable core-set.

3 Spectral Spanners

In this section we introduce the notion of spectral spanners and review their properties. In the following, we define the special case of spectral spanners. Later in Definition3.4, we introduce its generalization, spectral k-spanners.

Definition 3.1 (Spectral Spanner). Let V _{⊂ R}d _{be a set of vectors. We say} _U _{⊆ V is an α-spectral}

d-spanner forV if for any v∈ V , there exists a probability distribution µvon the vectors inU so that vv⊺ _{α · E}

u∼µv[uu

⊺_{] .} ₍₇₎

We study spectral spanners in Section 4, and propose polynomial time algorithms for finding ˜ O(d)-spectral spanners of sized. Considering (7) for allv _{∈ V implies that if U ⊆ V is an α-spectral spanner of} V , then for any probability distribution µ : V → R+_{, there exists a distribution}_{µ : U}_˜ _{→ R}+_{such that}

E_v∼µ_[vv⊺_]_{α · E}

u∼˜µ[uu⊺] . (8) We crucially take advantage of this property in Section6 to develop core-sets for the experimental design problem. Letf :_S_d+_{→ R}+be a monotone function such thatf (A)_{≤ f(B) if A B. Roughly speaking,} we use monotonicity off along (8) to reduce optimizingf on the set of all matrices of the form E_v∼µ[vv⊺_] for some distribution µ, to optimizing it on distributions which are only supported on the small set U . A wide range of matrix functions used in practice lie in the category of monotone functions, e.g. determinant, trace. More generally one can seeλi(.) for any i is a monotone function, and consequently the same holds for any elementary symmetric polynomial of the eigenvalues. For polynomial functions of the lower-degree, e.g. trace,detk, the monotonicity can be guaranteed by weaker constraints. Therefore, one should expect to find smaller core-sets with better guarantees for those functions. Motivated by this, we introduce the notion of spectralk-spanners. Let us first define the notationkto generalize.

Definition 3.2 (knotation). For two matricesA, B ∈ Rd×d, we sayA kB if for any Π∈ Pd−k+1, we

havehA, Πi ≤ hB, Πi.

In particular note thatA d B is equivalent to A B and A 1 B is the same as tr(A) ≤ tr(B), since P1 = RdandPd= I. More generally, the following lemma can be used to check if AkB.

Lemma 3.3. LetA, B_{∈ R}d×d_{be two symmetric matrices. Then}_A_k_{B if and only if}Pd

i=kλi(B− A) ≥ 0.

Proof. Suppose thatAkB. Then by definition for any Π∈ Pd−k+1,hB − A, Πi ≥ 0, so combining with

Lemma 2.5, we get 0_≤ min Π∈Pd−k+1hB − A, Πi = d X i=k λi(B− A).

(14)

Now, we are ready to define spectralk-spanners.

Definition 3.4 (Spectral k-Spanner). Let V _{⊂ R}d _{be a set of vectors. We say} _U _{⊆ V is an α-spectral} k-spanner for V if for any v∈ V , there exists a probability distribution µv on the vectors inU so that

vv⊺

k α· Eu∼µv[uu

⊺_{] .} ₍₉₎

We may dropk, whenever it is clear from the context. Finally, we remark that spectral k-spanners have the composability property: If U1, U2 areα-spectral spanners of V1, V2 respectively, then U1 ∪ U2 is an α-spectral spanner of V1∪ V2. This property will be useful to construct composable core-sets.

We will prove the first part ofTheorem 1.2in Section5. The second part of the theorem shows almost optimality of our results: We cannot get better than anΩ(d)-spectral spanner in the worst case unless the spectral spanner has size exponential ind. Next, here we prove the second part of the theorem.

First, let us prove the claim fork = d. Let V be a set of1₂edǫ/8independently chosen random_{±1 vectors} in Rd. By Azuma-Hoeffding inequality and the union bound, we get that

P "

∀u, v ∈ V : |hu, vi| ≤r 1 2d

1+ǫ #

≥ 1 − |V |2e−dǫ/4≥ 1/2.

So, let V be a set where for all u, v _{∈ V , hu, vi}2 _≤ 1

2d1+ǫ. We claim that anyd1−ǫ-spectral spanner of V must have all V . Let U be such a spanner and suppose v ∈ V is not in u. We observe that vv⊺ ₆ d1−ǫE_u∼µ_uu⊺_{for any}_{µ supported on U . This is because for any µ supported on U ,}

E_u∼µ_{hv, ui}2 _{≤ E}_u∼µ1 2d 1+ǫ_≤ 1 2d 1+ǫ ₌ 1 2d1−ǫd 2 ₌ 1 2d1−ǫhv, vi 2 as desired.

Now, let us extend the above proof tok < d. Firstly, we construct a set V _{⊆ R}kof1₂ekǫ/8independently chosen random _{±1 vectors in R}k. By above argument V has no k1−ǫ_-spectral _{k-spanner. Now define} V′ ⊆ Rd_{by appending}_d_{− k zeros to each vector in V . It is not hard to see that any α-spectral k-spanner of} V is also an α-spectral k-spanner of V′. Therefore, anyk1−ǫ-spectralk-spanner of V′has all vectors ofV′.

4 Spectral Spanners in Full Dimensional Case

In this section we proveTheorem 1.2for the casek = d. In this case we have a slightly better bound. So, indeed we will proveTheorem 1.1. As alluded to in the introduction we design a greedy algorithm that can be seen as a spectral analogue of the classical greedy algorithms for finding combinatorial spanners in graphs. The details of our algorithm is in Algorithm1.

1 Function Spectral d-Spanner(V ,α) 2 LetU =∅

3 While there is a vectorv ∈ V such that the polytope

Pv ={x| ∀u ∈ U, hx, vi >√α|hx, ui|}

is nonempty. Find such vectorv and let x be any point in Pv. Add argmax_u∈Vhu, xi2 toU 4 If there are no such vectors, terminate the algorithm and outputU .

(15)

Note that for any vectorv we can test whether Pv is empty using linear program. Therefore, the above algorithm runs in time polynomial in_{|V | and d.}

As alluded to in Section1.2, we first prove that our algorithm constructs a weak ˜O(d)-spectral spanner. Let us recall the definition of weak spectral spanner.

Definition 4.1 (Weak Spectral Spanner). A subsetU _{⊆ V ⊂ R}dis a weakα-spectral spanner of V , if for

allv_{∈ V and x ∈ R}d_{there is a probability distribution}_µ

v,xonU such that hv, xi2 ≤ α · Eu∼µv,xhu, xi

2 _equiv _{hv, xi}2 _{≤ α · max} u∈Uhu, xi

2

In the rest of this section, we may call spectral spanners strong to emphasize its difference from weak spectral spanners defined above. The rest of this section is organized as follows: In Section 4.1we prove that the output of the algorithm is a weakα-spectral spanner of size O(d log d) for α = Ω(d log2d). Then, in Section4.2we prove that for anyα, any weak α-spectral spanner is a strong α-spectral spanner.

4.1 Construction of a Weak Spectral Spanner

In this section we show that Algorithm1returns anα-spectral d-spanner when α is sufficiently larger than d. At the end of this section we discuss an alternative algorithm for finding a weak spanner that achieves slightly better approximation guarantee. However, we believe that this algorithm is simpler to implement, easier to parallelize and can be tuned for practical applications.

Proposition 4.2. There is a universal constantC > 0 such that for α_{≥ C · d log}2d, Algorithm1returns a weakα-spectral spanner of size O(d log d).

First, we observe that for anyα, the output of the algorithm is a weak α-spectral spanner. For the sake of contradiction, suppose the output setU is not a weak α-spectral spanner. So, there is a vector v_{∈ V and} x_{∈ R}d_{such that}

hx, vi2 > α· max u∈Uhx, ui

2 ₍₁₀₎

We show thatPv is non-empty, which impliesU cannot be the output. We can assumehx, vi > 0, perhaps by multiplying x by a_{−1. So the above equation is equivalent to hx, vi >} √α_{· max}_u∈U_{|hu, xi| , which} impliesx∈ Pv.

It remains to bound the size of the output set U . As alluded to in the introduction, the main technical part of the proof is to show that the rank of lower triangular matrices is robust under small perturbations. To bound the size ofU we will construct such a matrix and we will useLemma 4.3(see below) to bound its rank. Letu1, u2, . . . , umbe the sequence of vectors added to our spectral spanner in the algorithm, i.e.,uiis thei-th vector added to the set U . By Step 2 of the algorithm for any uithere exists a “bad” vectorxi∈ Rd such that

hui, xii2> α· max

1≤j<ihuj, xii 2_,

Furthermore, by construction,uiis the vector with largest projection ontoxi, i.e.,ui = argmax_u∈Vhxi, ui2. Define inner product matrixM _{∈ R}m×m

Mij =hui, xji.

By the above conditions on the vectors ui, xj, M is diagonally dominant and for all 1 ≤ i ≤ m and 1 ≤ j < i we have Mj,i ≤ M√i,i_α. So the assertion of the Lemma 4.3holds forM and ǫ = √1_α. By the

(16)

lemma, rank(M )≥ C · min 4α log2α, m log m ,

whereC for some constant C > 0. But, it turns out that rank(M )≤ d as it can be written as the product of anm_{× d matrix and a d × m matrix. Setting α =} d log_C2d, implies_{|U| = m ≤} 2d log d_C for large enoughd, as desired. It remains to prove the following lemma.

Lemma 4.3. LetM _{∈ R}m×m _{be a diagonally dominant and approximately lower triangular matrix in the}

following sense

Mj,i≤ ǫ · Mi,i,∀ 1 ≤ j < i ≤ m, (11)

Then, there is a universal constantC > 0 such that we have rank(M )≥ C · min 1 ǫ log1_ǫ 2 ,_{log m}m . Proof. Without loss of generality, perhaps after scaling each column ofM by its diagonal entry, we assume Mi,i = 1 for all i. Note that rank and (11) is invariant under scaling, so it is enough to prove the statement for such a matrix. Let Ms denote the top left s× s principal submatrix of M for some integer s ≤ m that we specify later. Note that rank is monotonically decreasing under taking principal sub-matrices, so this operations does not increase the rank and showing the assertion of the lemma onrank(Ms) proves the lemma. Furthermore, (11) is closed under taking principal sub-matrices. We can writeMs = L + E such that

• L ∈ Rs×s_{is a lower triangular matrix where}_L_i,i _{= 1 and}_|L_i,j_{| ≤ 1, for any 1 ≤ j ≤ i ≤ s. In} particular,_kLk_∞ = 1.

• kEk∞≤ ǫ (note that we may further assume E is upper triangular, but we do not use it in our proof). Letσ1(Ms)≥ . . . ≥ σs(Ms) denote singular values of Ms. Obviously,σi(Ms) > 0 implies rank(Ms)≥ i for any1 _{≤ i ≤ s. Considering this fact, let us give some intuition on why M}s has a large rank. SinceL is lower-triangular with non-zero entries on the diagonal, it is a full rank matrix. Moreover, entries ofE are much smaller than (diagonal) entries of L. Singular values of E are on average much smaller than those ofL, so adding E to L can only make a small fraction of singular values of L vanish. This implies that Ms= L + E must have a high rank. Now we make the argument rigorous.

LetS(Ms),S(L), S(E) be the symmetrized versions of Ms, L and E respectively (seesubsection 2.1). ByFact 2.6, to showσi(Ms) > 0 for some i, we can equivalently prove λi(S(Ms)) > 0. We use Lemma

2.4: SettingA =S(L) and B = S(E), for any pair of integers ℓ < k ≤ s such that

λk(S(L)) + λ2s−ℓ(S(E)) > 0 (12) we haveλ_k−ℓ(_S(Ms)) > 0. So to prove the lemma, it suffices to find s, k and ℓ satisfying the above and k_{− ℓ ≥ C · min} 1 ǫ log1 ǫ 2 ,_{log m}m

for some constantC.

To find proper values ofk and ℓ, we use the following two claims. Claim 4.4. For anyℓ_{≤ s,}

λ_2s−ℓ(S(E)) ≥ λ2s−ℓ+1(S(E)) = −σℓ(E)≥ −kEk√ F ℓ ≥

−ǫ · s √

(17)

Claim 4.5. For anyk < s₂,

λk(S(L)) = σk(L)≥ k − 1 s2

k−1_s . Therefore, to show (12) it is enough to show

s log √ ℓ ǫs > (k− 1) log s2 k− 1, (13) for k, ℓ ≤ s

2. We analyze the above in two cases. Ifm log m ≤ ǫ12, then one can see that for s = m,

k = _⌊_{4 log m}m _{⌋ and ℓ = ⌊}_{8 log m}m _{⌋, and large enough m, (}13) holds. It implies that in this caserank(Ms) ≥ k− ℓ ≥ m

8 log m, thus we are done. Now suppose that ǫ12 ≤ m log m. We set s < m to be the largest integer

such that s log s _≤ _16ǫ12. Next, we letℓ = ⌊4ǫ2s2⌋. Note that s log s ≤ _16ǫ12 implies ℓ ≤ _{4 log s}s . Now

applyingℓ =_⌊4ǫ2_s2_{⌋ into (}₁₃_{) turns it into}

s > (k− 1) log s 2 k_{− 1}. So, fork =⌊ s

2 log s⌋, the above is satisfied. Furthermore, in this case k − ℓ = ⌊2 log ss ⌋ − ⌊4ǫ2s2⌋ ≥ 4 log ss ≥ 1

256ǫ2_log2 1 ǫ

, ass is the largest number such that s log s _≤ _16ǫ12 andlog s≤ 2 log 1_ǫ. So the lemma holds for

C_≥ ₂₅₆1 .

Proof of Claim4.4. By Fact2.7we know that s

X i=1

σi(E)2 =kEk2F ≤ kEk2∞· s2 ≤ ǫ2· s2.

Now, by Markov inequality we getσℓ(E)2 ≤ ǫ

2_s2

ℓ . Therefore, the claim is proved.

Proof of Claim4.5. SinceL is lower-triangular, we have that s Y i=1 σi(L) = det L = s Y i=1 Li,i = 1, (14)

It follows that for anyk_{≤ s,} k−1 Y i=1 σi(L) = 1 Qs j=kσj(L) ≥ 1 σk(L)s−k+1 . (15)

Now, we use the Frobenius norm to prove an upper bound on the firstk_{− 1 singular values. By Fact}2.7, s

X i=1

σi(L)2 =kLk2F ≤ kLk∞· s2 = s2, (16)

By AM-GM inequality we get k−1 Y i=1 σi(L)≤ Pk−1 i=1 σi(L)2 k_{− 1} !k−1₂ ≤ kLk 2 F k_{− 1} k−1₂ ≤ s2 k_{− 1} k−1₂ .

(18)

The above together with (15) provesσk(L) ≥ k−1_s2

_2(s−k+1)k−1

. Noting that fork _≤ s₂, 2(s_{− k + 1) ≥ s} completes the proof of the claim.

We would like to thank an anonymous reviewer for suggesting an alternative algorithm for finding weak spanners. It offers slightly better guarantees, and finds a weakd-spectral spanner of size O(d log d). How-ever, as we argue, our algorithm can be made more efficient in practice, and in particular in a distributed setting.

An alternative Algorithm. For a subset V ∈ Rd_{, define sym(V ) to be the symmetric set sym(V ) =} V _{∪ {−x|x ∈ V }. A direct application of the separating hyperplane theorem shows that a subset U ⊆ V is} a weakα-spectral spanner of V , if conv(sym(V ))_⊆ √α_{· conv(sym(U)) where conv refers to the convex} hull of the set. Knowing this, we can apply the celebrated result of F. John [B+97] to get a weak d-spectral spanner. Letting the notation MVEE of a set denote the minimum volume ellipsoid enclosing the set, it implies that there exists a subsetU _{⊂ V of size O(d}2_{) and an ellipsoid E where E = MVEE(sym(U )) =} MVEE(sym(V )) and √E

d ⊆ conv(sym(U)). Therefore, U is a weak d-spectral spanner of V with size O(d2_{). Moreover, the size of U can be reduced to O(d) by using the spectral sparsification machinery of} [BSS12]. We also note that these ideas has been extended in [Bar14] to show that for anyd-dimensional symmetric convex body C and any 0 < ǫ < 1, there is a polytope P with roughly d1ǫ vertices such that

P _{⊂ C ⊂ (}√ǫd)P . In our terminology, it gives an algorithm to find a weak d-spectral spanner of V of size O(d). Although, the approximation guarantee can be improved by a log factor in this algorithm, this improvement comes at a cost. First of all, finding the John’s ellipsoid requires solving a semidefinite program withO(d2) variables whereas in Algorithm1, we only need to solve linear programs withO(d) many variables. This requires polynomially smaller amount of memory. Furthermore, note that the main computational task of each step of Algorithm1is to solve|V | feasibility LPs where each of them has ˜O(d) variables and constraints. These LPs can be solved in parallel: having access toO(_{|V |) many processors,} our greedy algorithm runs in poly(d) time in PRAM model of computation. This extreme parallelism cannot be achieved using the above approach. Finally, in finding the weak spanner one can tune the value ofα in (1) based on the structure of the given data points and the ideal size of the core-set, making the algorithm more suitable for applications.

4.2 From Weak Spectral Spanners to Strong Spectral Spanners

In this section, we prove that ifU is a weak α-spectral spanner of V , then it is a strong α-spectral spanner ofV . Combining withProposition 4.2it provesTheorem 1.1.

Lemma 4.6. For any set of vectors V ⊂ Rd_{, any weak} _{α-spectral spanner of V is a strong α-spectral}

spanner ofV .

Proof. Let U be a weak α-spectral spanner of V . Fix a vector v ∈ V , we write a program to find a probability distributionµv : U → R+such thatvvT 1_δ· Eu∼µvuuT, for the largest possible δ. It turns

out that this is a semi-definite program, where we have a variablepu = Pµv(u) to denote the probability of

each vectoru∈ U, see (17) for details.

max δ

s.t δ· vvT Eu∼µvuu

T µv is a distribution onU

(19)

To prove the lemma, it suffices to show the optimal of the program is at least _α1. To do that, we analyze the dual of the program. We first show the set of feasible solutions of the program has a non-empty interior; this implies that the Slater condition is satisfied, and the duality gap is zero. Then we show any solution of the dual has value at least1/α.

To see the first assertion, we letµvbe equal to the uniform distribution onU and δ ≤ _α|U|1 . It is not hard to see that this is a feasible solution of the program sinceU is a weak α-spectral spanner.

Next, we prove the second statement. First we write down the dual. min λ

s.t. uTXu≤ λ, ∀u ∈ U vTXv≥ 1

X 0

Let(X, λ) be a feasible solution of the dual. Our goal is to show λ≥ 1

α. LetE ={x ∈ Rd| xTXx≤ λ} be an ellipsoid of radius√λ defined by X. The set E has the following properties:

• Convexity,

• Symmetry: If x ∈ E, then −x ∈ E,

• U ⊆ E: By the dual constraints u⊺_Xu_{≤ λ for all u ∈ U.} Let¯v = v/√α. We claim that ¯v_{∈ E. Note that if ¯v ∈ E we obtain}

λ_{≥ ¯v}⊺_{X ¯}_v ≥ _α1,

which completes the proof.

For the sake of contradiction suppose v /¯ ∈ E. We show that U is not a weak α-spectral spanner. By convexity ofE there is a hyperplane separating ¯v from E. So there is a vector e_{∈ R}dsuch that

hv, ei =√α· h¯v, ei ≥√α and ∀x ∈ E, hx, ei < 1.

Moreover, by symmetry ofE, for any x∈ E,

hx, ei2 ≤ max{hx, ei, h−x, ei}2 < 1

Finally, sinceU _{⊂ E, we obtain max}_u∈U_{hu, ei}2 _{< 1. Therefore,}_{hv, ei}2 _{6≤ α max}

u∈Uhu, ei2 which implies U is not a weak α-spectral spanner.

5 Construction of Spectral k-Spanners

In this section we extend our proof on spectrald-spanner to spectral k-spanners for k < d, this proves our main theorem 1.2. Here is our high-level plan of proof: First we use the greedy algorithm of [Ç MI09a] for volume maximization to find an Õ(k)-dimensional linear subspace of Rdonto which input vectors have a “large” projection. Next, we apply Theorem 1.1 to this Õ(k)-dimensional space to obtain the desired spectralk-spanner.

(20)

5.1 Greedy Algorithm for Volume Maximization

In this subsection, we prove the following statement.

Proposition 5.1. For any set of vectorsV ⊂ Rd_{, and any}_{k < d and m > 2k, there is a set U} _{⊆ V of size} m such that for all v ∈ V we have

v_U⊥v⊺

U⊥ k2m( 2k m)· E

u∼µuu⊺, (18)

wherev_U⊥ = Π_hUi⊥(v) is the projection of v on the space orthogonal to the span of U and µ is the uniform

distribution on the setU .

As we will see in the next subsection, form = Θ(k log k), the set U promised above will be a part of our spectralk-spanner. Roughly speaking, to obtain a spectral spanner of V , it is enough to additionally add a spectral spanner of_{Π_hUi(v)_}_v∈V. In the next subsection, will useTheorem 1.1for the latter part.

First, we will describe an algorithm to find the setU promised in the proposition. Then, we will prove the correctness. We use the greedy algorithm of [C¸ MI09b] for volume maximization to find the set U .

1 Function Volume Maximization(V ,m) 2 LetU =∅

3 While|U| < m, add argmax_v∈V kΠ_hUi⊥(v)k to U

4 ReturnU .

Algorithm 2: Greedy Algorithm for Volume Maximization

Let U = {u1, . . . , um} be the output of the algorithm and suppose ui is the i-th vector added to the set, andµ be a uniform distribution on U . Fix a vector v_{∈ V for which we will verify the assertion of the} proposition. Note that ifv_{∈ U the statement obviously holds. So, assume v /}_{∈ U.}

Fix a Π ∈ Πd−k+1. Observe thathvU⊥v_U⊺⊥, Πi ≤ kΠ_hUi⊥(v)k2. On the other hand, byLemma 2.5,

hEu∼µuu⊺, Πi ≥ Pdi=kλi whereλ1 ≥ λ2 ≥ . . . ≥ λdare eigenvalues of Eu∼µuu⊺. Therefore, to prove (18), it suffices to prove kΠ_hUi⊥(v)k2 ≤ 2m 2k m · d X i=k λi. (19)

Define uˆ1, ˆu2, . . . , ˆum to be an orthonormal basis ofhUi obtained by the Gram-Schmidt process on u1, . . . , um, i.e., uˆ1 = _kuu1₁_k,uˆ2 =

Π

hu1i⊥(u2)

kΠ_hu1i⊥(u2)k and so on. Define M ∈ R

m _{to be a matrix where the}_i th column is the representation ofuiin the orthonormal basis formed by{ˆu1, . . . , ˆum}, i.e., for all 1 ≤ i, j ≤ m,

Mi,j =huj, ˆuii. Note that E_u∼µuu⊺_{is the same as} 1

mM M⊺up to a rotation of the space. In other words, both matrices have the same set of non-zero eigenvalues. Since eigenvalues of _m1M M⊺_{are the squares of the singular values} of√1

mM , to prove (19) it is enough to show

kΠ_hUi⊥(v)k2 ≤ 2m 2k m · m X i=k σ_i2(m−1/2M ). (20)

Sincev _{∈ V and v /}_{∈ U we get kΠ}_hUi⊥(v)k2 ≤ kΠ_hˆ_u₁_,...,ˆ_u_m−1_i⊥(u_m)k2 = M_m,m2 . So to prove (20), it

suffices to show M_m,m2 _{≤ 2m}2km · m X i=k σ_i2(m−1/2M ) (21)

(21)

Note that the above inequality can be seen just as a property of the matrix M . First, let us discuss properties ofM that we will use to prove the above:

I) M is upper-triangular as ui∈ hˆu1, . . . , ˆuii.

II) By description of the algorithm, for anyi < j _{≤ m we have}

M_i,i2 =_kΠ_hˆ_u₁_,...,ˆ_u_i−1_i⊥(u_i)k2 ≤ kΠ_hˆ_u₁_,...,ˆ_u_i−1_i⊥(u_j)k2 =

j X

ℓ=i

M_ℓ,j2 (22)

The following lemma completes the proof ofProposition 5.1.

Lemma 5.2. LetM _{∈ R}m×m_{satisfying (i) and (ii). For any}_{k < m/2, we have}

M_m,m2 _{≤ 2m}2km m X i=k 1 mσ 2 i(M ).

Proof. Here is the main idea of the proof. First, we use Cauchy-Interlacing theorem along with property (ii) to deduceσicannot be much larger thanMi,i. Then, we combine it with the fact thatM is upper triangular and sodet(M ) =Qm

i=1Mi,i= Qm

i=1σi, to upper-bound Mm,m2 by a multiple of Pm

i=kσi2(M ). First, we show for all1≤ i ≤ m,

σ2_i(M )_{≤ (m − i + 1)M}_i,i2. (23) DefineMito be the(m− i + 1) × (m − i + 1) matrix obtained by removing the first i − 1 rows and columns ofM . First, Cauchy interlacing theorem tells us σi(M ) ≤ σ1(Mi). Secondly, by Fact2.7and property (ii) we have σ1(Mi)2 ≤ m−i+1 X j=1 σj(Mi)2 =kMik2F ≤ (m − i + 1)Mi,i2.

This proves (23). SinceM is upper-triangular,

det(M )2 = m Y i=1 M_i,i2 = m Y i=1 σ_i2(M )_≤ Pm j=kσj(M )2 m_{− k + 1} !m−k+1 k−1 Y i=1 σ_i2 =: β k−1 Y i=1 σ_i2

where the inequality follows by the AM-GM inequality andβ = ( Pm j=kσj(M )2 m−k+1 )m−k+1. By (23), m Y i=1 σ_i2(M )≤ β k−1 Y i=1 (m− i + 1)Mi,i2 ≤ mkβ k−1 Y i=1 M_i,i2. (24) UsingQm i=1σi2= Qm

i=1Mi,i2 again, we get m Y i=k

M_i,i2 _{≤ m}kβ.

Using property (ii) again, we haveM2

i,i≥ Mm,m2 for alli. Therefore, Qm

i=kMi,i2 ≥ Mm,m2(m−k+1), we get M_m,m2(m−k+1) _{≤ m}kβ

The lemma follows by raising both sides to 1/(m − k + 1) and using that m − k + 1 ≥ m/2 since k < m/2.

(22)

5.2 Main algorithm

In this section we prove Theorem1.2. The details of our algorithm are described in Algorithm3.

1 Function Spectral-k-Spanner(V ,α,k) 2 Setm, such that m

2k

m = O(1).

3 Run Volume-Maximization(V ,m) of Algorithm2and letU be the output, i.e., the set of vectors satisfying (18).

4 3. Run Spectral-d-Spanner({Π_hUi(v)}_v∈V,α) of Algorithm1and letW be the output of the corresponding spectralm-spanner.

5 4. ReturnU ∪ {v : Π_hUi(v)∈ W }.

Algorithm 3: Finds anα-Spectral k-Spanner

First of all let us analyze the size of the output. By definition, Algorithm2returnsm vectors. Then, by

Theorem 1.1, Algorithm 1has size at mostO(m log m). Since m = O(k log k), the size of the output is |U| + |W | ≤ O(k log2k), as desired.

In the rest of this section we prove the correctness. Fix a vectorv_{∈ V , we need to find a distribution µ}v onU_{∪ W such that vv}⊺

kαEu∼µvuu⊺for someα = ˜O(k).

First, byFact 2.1,

vv⊺

k 2(Π_hUi⊥(v)Π_hUi⊥(v)⊺+ Π_hUi(v)Π_hUi(v)⊺)

So, it is enough to prove that

Π_hUi⊥(v)Π_hUi⊥(v)⊺+ Π_hUi(v)Π_hUi(v)⊺_k(α/2)E_u∼µ_vuu⊺ (25)

We proceed by upper-bounding the LHS term by term. ByProposition 5.1,

Π_hUi⊥(v)Π_hUi⊥(v)⊺_k O(1)· E_u∼µuu⊺ (26)

whereµ is the uniform distribution on U . So, to prove the above, it is enough to find a distribution µv on U ∪ W such that

Π_hUi(v)Π_hUi(v)⊺

k αEu∼µvuu

⊺ ₍₂₇₎

for someα = ˜O(k). From now on, for any vector v_{∈ V we use ˆv to denote Π}_hUi(v).

By description of the algorithm, _{ˆv}_v∈W is anO(m log2m)-spectral m-spanner for_{ˆv}_v∈V. So, there exists a probability distributionνvonW such that

ˆ vˆv⊺

O(m log2m))E_w∼νvw ˆˆw

⊺ _equiv

∀x ∈ hUi : hx, ˆvi2≤ O(m log2m))_{· E}_w∼νvh ˆw, xi

2 ₍₂₈₎ In fact the above holds for anyx _{∈ R}d, as_{hx, ˆui = hΠ}_hUi(x), ˆu_{i for any vector u ∈ V . Therefore, for any} Π_{∈ Π}_d−k+1, by summing (28) up over an orthonormal basis ofΠ and noting m = O(k log k), we get

hΠ, ˆvˆv⊺_{i ≤ ˜}_O(k)_{· hE}

w∼νvw ˆˆw

⊺_{, Π}_i, which by definition implies

ˆ vˆv⊺

k O(k)˜ · Ew∼νvw ˆˆw

⊺_. ₍₂₉₎

Therefore, to show (27) forα = ˜O(k) it suffices to find a distribution µvonU ∪ W such that E_w∼ν

vw ˆˆw

⊺

kO(1)· Eu∼µvuu

(23)

But, observe that for anyw_{∈ W , we can write} ˆ w ˆw⊺ k 2 ww⊺_{+ Π} hUi⊥(w)Π_hUi⊥(w)⊺ kO(1)· (ww⊺+ Eu∼µuu⊺)

whereµ is the uniform distribution over U . The first inequality follows byFact 2.1and the second inequality follows by equation (26) which holds for all vectorsv∈ V . Averaging out the above inequality with respect to the distributionνv completes the proof.

6 Applications

In this section we discuss applications ofTheorem 1.2in designing composable core-sets. As we discussed in the intro, we show that for many problems spectral spanners provide almost the best possible composable core-set in the worst case. Next, we see that for any functionf that is “monotone” on PSD matrices, spectral spanners provide a composable core-set for a fractional budgeted minimization problem with respect tof . Later, in Sections 6.1 and 6.2 we see that for a large class of monotone functions the optimum of the fractional budgeted minimization problem is within a small factor of the optimum of the corresponding integral problem. So, spectral spanners provide almost optimal composable core-sets for several spectral budgeted minimization problems.

LetV _{⊂ R}d_{be a set of vectors. For a function} _{f : S}+

d → R+on PSD matrices and a positive integer B denoting the budget, the fractional budgeted minimization problem is to choose a mass B of the vectors ofV , i.e.,_{sv}v∈V wherePvsv ≤ B, such that f(PvsvvvT) is minimized. This can be modeled as a continuous optimization problem, seeBMfor details.

inf f X v∈V svvvT ! . s.t X v∈V sv ≤ B sv ≥ 0, ∀v ∈ V BM

Definition 6.1 (k-monotone functions). We say a function f : S+_d → R+ _is _{k-monotone for an integer} 1_{≤ k ≤ d, if for all PSD matrices A, B 0, we have A}kB implies f (A)≥ f(B).

We say f is vector k-monotone if for all PSD matrices A, B and all vectors v _{∈ R}d_{, if} _vv⊺ k B,

then f (A + vv⊺₎ _{≥ f(A + B). Note that any k-monotone function is obviously vector k-monotone as} A + vv⊺

kA + B.

We show that an algorithm for findingα-spectral k-spanners give an α-composable core-set function for the fractional budgeted minimization for any functionf that is vector k-monotone. We emphasize that our composable core-set does not depend on the choice off as long as it is vector monotone.

Proposition 6.2. For any 1 _{≤ k ≤ d and any vector k-monotone function f : S}+_d _{→ R}+_{, Algorithm}

(24)

BM(V ,f ,B), where for any t > 0,

β(f, t) = sup A∈S+d

f (A) f (tA).

Proof. LetV1, V2, . . . , Vp be p given input sets for an arbitrary integer p, and letSpi=1Vi = V . For each 1≤ i ≤ p, let Uibe the output of Spectralk-Spanner(Vi,k,α). ByTheorem 1.2, forα = ˜O(k),|Ui| ≤ ˜O(k). LetU = U1∪ · · · ∪ Up.

Fix ak-monotone function f and a budget B > 0 and let s =_{sv}v∈V be a feasible solution ofBM(V , f , B). To prove the assertion we need to show that there exists a feasible solution ˜s ofBM(U, f, B) such that f X u∈U ˜ suuuT ! ≤ β(f, α) · f X v∈V svvvT ! (30)

By composability property of spanners,U is an α-spectral k-spanner of V . Therefore, for any v_{∈ V , there} exists a probability distributionµvonU such that vvT kα· Eu∼µvuu⊺. SayV ={v1, . . . , vm}. It follows

by vectork-monotonicity of f and by U bein an α-spectral k-spanner that

f m X i=1 sviviv ⊺ i ! ≥ f αsv1Eu∼µv1uu ⊺₊ m X i=2 sviviv ⊺ i ! ≥ f 2 X i=1 αsviEu∼µ_viuu ⊺₊ m X i=3 sviviv ⊺ i ! ≥ · · · ≥ f m X i=1 αsviEu∼µ_viuu T !

Now, define˜s by ˜su = P_v∈V svPµv[u] for any u ∈ U. It is straight-forward to see that this is a feasible

solution ofBM(V, f, B) sinceP u∈Us˜u = P v∈V P u∈UPµv[u] = P v∈V sv≤ B. Therefore, f X v∈V svvv⊺ ! ≥ f X v∈V αsvEu∼µvuu ⊺ ! = f X u∈U α˜suuu⊺ ! ≥ β(f, α)f X u∈U ˜ suuu⊺ ! ,

This proves (30) as desired.

In general, we may not solve BM efficiently if f is not a convex function. It turns out that if f is convex and has a certain reciprocal multiplicity property then the integrality gap of the program is small, so assuming further thatf is (vector) k-monotone, by the above theorem we obtain a composable core-set for the corresponding integral budgeted minimization problems. In the next two sections we explain two such families of functions namely determinant maximization and optimal design.

6.1 Determinant Maximization

In this section, we use Proposition 6.2 to prove Theorem 1.3. Throughout this section, for an integer 1 ≤ k ≤ d we let f : S+_d → R+ _{be the map}_A _{7→ − det}

k(A)1/k. It follows from theory of Hyper-bolic polynomials that for any 1 _{≤ k ≤ d, − det}k(A)1/k is a convex function [G ¨ul97], so one can solve

BM(_{− det}1/k_k , V, k) using convex programming. Furthermore, observe that BM(_{− det}1/k_k , V, k) gives a re-laxation ofk-determinant maximization problem. Nikolov [Nik15] showed that any fractional solution can be rounded to an integral solution incurring only a multiplicative error ofe.

(25)

Theorem 6.3 ([Nik15]). There is a randomized algorithm that for any set V _{⊆ R}d,1 _{≤ k ≤ d, and any}

feasible solutionx ofBM(− det1/kk , V, k) outputs S⊂ V of size k such that

det k X v∈S vv⊺ ! ≥ e−k max T ∈(V k) det k X u∈T uu⊺ ! (31)

Proof. We include the proof for the sake of completeness. First, we explain the algorithm: For1_{≤ i ≤ k,} choose a vectorv with probability xv

k (with replacement) and call itui. It follows that,

E " det k k X i=1 uiu⊺_i !# = X S∈(V k) k! kkΠv∈Sxvdet_k ( X v∈S vv⊺₎_{≥ e}−k_· X S∈(V k) det k X v∈S xvvv⊺ !

where first equality holds, since we havek! different orderings for selecting a fixed set S of k vectors, but by Cauchy-Binet identity the RHS is equal toe−k_det_k₍P

v∈V xvvv⊺) as desired.

Note that the algorithm we discussed in the above proof may have an exponentially small probability of success but [Nik15] also gives a de-randomization using the conditional expectation method. From now on, we do not need convexity. To useProposition 6.2we need to verify that− det1/kk is (vector)k-monotone. Lemma 6.4. For any integer1≤ k ≤ d, the function − det1/kk is vectork-monotone.

Proof. Equivalently, we show−detkis vectork-monotone. Fix A 0, and decompose it as A =P_a∈Aaa⊺ where we abuse notation and also useA to denote the set of vectors in the decomposition of A. Also, fix a vectorv and suppose vv⊺

k B for some B 0. We need to show detk(A + vv⊺) ≤ detk(A + B). By

Lemma 2.9, det k (A + vv ⊺₎_{− det} k (A) = X S∈( A k−1) det k X a∈S aa⊺_{+ vv}⊺ ! = X S∈( A k−1) det k−1 X a∈S aa⊺ ! hvv⊺_{, Π} hSi⊥i.

The second equality follows by the fact that detk(P_a∈Saa⊺ + vv⊺) is the same as the determinant of thek_{× k inner product matrix of all of these k vectors. Using Gram-Schmidt orthogonalization process} the latter can be re-written asdet_k−1(P

a∈Saa⊺)hvv⊺, ΠhSi⊥i. Since vv⊺ _k B for any such S we have

hvv⊺_{, Π}

hSi⊥i ≤ hB, Π_hSi⊥i. Therefore,

det k (A + vv T_{) = det} k (A) + X S∈( A k−1) det k−1 X a∈S aa⊺ ! hvv⊺_{, Π} hSi⊥i ≤ det k (A) + X S∈(k−1A ) det k−1 X a∈S aa⊺ ! hB, Π_hSi⊥i ≤ det k (A + B).