Problem
2
Nguyen Kim Thang
3
IBISC, Univ Evry, University Paris Saclay
4
Evry, France
5
6
https://orcid.org/0000-0002-6085-9453
7
Abstract
8
In the subspace approximation problem, given m points in Rn and an integer k ≤ n, the goal
9
is to find ak-dimension subspace ofRn that minimizes the`p-norm of the Euclidean distances
10
to the given points. This problem generalizes several subspace approximation problems and has
11
applications from statistics, machine learning, signal processing to biology. Deshpande et al. [4]
12
gave a randomizedO(√
p)-approximation and this bound is proved to be tight assuming NP6= P
13
by Guruswami et al. [7]. It is an intriguing question of determining the performance guarantee
14
of deterministic algorithms for the problem. In this paper, we present a simple deterministic
15
O(√
p)-approximation algorithm with also a simple analysis. That definitely settles the status
16
of the problem in term of approximation up to a constant factor. Besides, the simplicity of the
17
algorithm makes it practically appealing.
18
2012 ACM Subject Classification Theory of computation Approximation algorithms analysis
19
Keywords and phrases Approximation Algorithms, Subspace Approximation
20
Digital Object Identifier 10.4230/LIPIcs.SWAT.2018.30
21
Funding Research supported by the ANR project OATA noANR-15-CE40-0015-01.
22
1 Introduction
23
Massive data in high dimension emerge naturally in many domains from machine learning to
24
biology. It has been observed that although data lie in high-dimensional spaces, in practice
25
they have low intrinsic dimension. Dimension-reduction algorithms are essential in many
26
domains such as image processing, personalized medicine, etc. In this paper, we consider
27
the following subspace approximation problem in the context of capturing the underlying
28
low-dimensional structures of given data.
29
Subspace Problem. Given pointsa1, . . . ,am∈Rn and integersp≥1 and 0≤k≤n. Find ak-dimensional linear subspaceW that minimizes the`p-norms of Euclidean distances of these points to W, i.e.,
min
W:dim(W)=k
m X
i=1
d(ai, W)p 1/p
The Subspace Problem, which is introduced by Deshpande et al. [4], is a generalization of
30
several sub-space approximation problems which have been widely studied. For example,
31
the well-knownLeast Square Fit Problemis a particular case. In the latter, given a matrix
32
A∈Rm×n and 0≤k≤n, find a matrixB∈Rm×n of rank at most kthat minimizes the
33
Frobenius norm of the differencekA−BkF := P
i,j(Aij−Bij)21/2. Taking the rows ofA
34
to bea1, . . . ,amandp= 2,Subspace Problemreduces toLeast Square Fit Problem. Another
35
© Nguyen Kim Thang;
licensed under Creative Commons License CC-BY
16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018).
Editor: David Eppstein; Article No. 30; pp. 30:1–30:7
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
special case ofSubspace ProblemisRadii Problem. In the latter, given pointsa1, . . . ,am∈Rn,
36
theirouter (n−k) radius is the minimum, over allk-dimensional linear subspaces, of the
37
maximum Euclidean distances of these points to the subspace. This problem is equivalent
38
toSubspace Problemwithp=∞. Moreover,Subspace Problem is related to other problems
39
such as theLp-Grothendieck Problem[8] and`p-Regression Problem[1, 5, 3]. We refer the
40
reader to [4] for more details about the connection between these problems.
41
Deshpande et al. [4] introduced theSubspace Problem and gave a randomizedO(√ p)-
42
approximation algorithm. In their approach, they consider a convex relaxation that optimizes
43
over positive semidefinite matrices and the rank constraint is replaced by a trace constraint.
44
Subsequently, the solution X of the convex relaxation is rounded to a matrix of suitable
45
rank. Intuitively, the authors divide singular vectors of the solutionX into several bins and
46
constructs one vector for each bin by taking a Bernoulli random linear combination of vectors
47
within each bin. The analysis is carried out by powerful techniques coupled with properties
48
of thepth-moment of sums of Bernoulli random variables. Besides, Deshpande et al. [4]
49
proved that theSubspace Problemis hard to approximate within a factor Ω(√
p) assuming
50
the Unique Games Conjecture (UGC). Later on, bypassing the need for UGC, Guruswami et
51
al. [7] showed that the problem is indeed NP-hard to approximate within a factor Ω(√ p).
52
Our Contribution. In this paper, we present a deterministic greedy algorithm with the same
53
O(√
p)-approximation guarantee. Informally, at any step, the algorithm greedily extends the
54
subspace in order to minimizes the marginal cost of the objective functions. The algorithm
55
(and also the analysis) is extremely simple, which makes it practically appealing. Besides,
56
our algorithm is deterministic whereas the one in [4] is randomized. The analysis is based on
57
a smooth inequality (Lemma 2), which has been originally used in the context of algorithmic
58
game theory in order to bound the quality of equilibrium (price of anarchy) in scheduling
59
games [2]. This allows us to give a deterministic algorithm instead of the randomized one [4]
60
which crucially relies on concentration inequalities in functional analysis in order to bound
61
the moments of sums of Bernoulli random variables. Our result definitely settles the status
62
of the problem in term of approximation up to a constant factor.
63
Related works. The most closely related to our paper is [4] where the results have been
64
summarized earlier. For theLeast Square Fit Problem, the optimal subspace is spanned by the
65
topkright singular vector ofAand that can be computed in timeO(min{n2m, nm2}) [6]. For
66
theRadii Problem,O(√
logm)-approximation withk=n−1 can be implied from the works of
67
Nesterov [10] and Nemirovski et al. [9]. Later on, Varadarajan et al. [11] gave anO(√ logm)-
68
approximation algorithm for arbitraryk. Note that it is well-known that`∞-norm can be
69
approximated by`logm-norm up to a constant factor. Hence,O(√
logm)-approximation can
70
be deduced from [4] (so our work) by choosingp= logm.
71
2 Greedy Algorithm
72
As in [4], we use a formulation of theSubspace Problemin terms of the orthogonal complement
73
of the subspaceW. Specifically, letz1, . . . ,zn−k be an orthonormal basis for the orthogonal
74
complement. Let Z ∈ Rn×(n−k) be the matrix with the jth column vector zj. Then
75
d(ai, W) =kaTiZk2. Hence, the problem is to find an orthonormal basisz1, . . . ,zn−k of a
76
(n−k)-dim vector spaceV so that the corresponding matrixZ minimizes
77
m
X
i=1
kaTi Zkp2=
m
X
i=1
n−k X
`=1
aTiz`2p/2
78
=
m
X
i=1 n−k
X
j=1
" j X
`=1
aTi z`
2p/2
− j−1
X
`=1
aTi z`
2p/2# ,
79 80
where conventionally the sum with no term equals 0.
81
Algorithm. Initially, the subspace U0 = ∅. For 1≤ j ≤ n−k, choose a vectoruj 6=0
82
orthonormal to the subspace Uj−1 spanned by u1, . . . ,uj−1 such that it minimizes the
83
marginal increase of the objective, i.e.,
84
uj∈arg min
z⊥Uj−1
m
X
i=1
"
(aTiz)2+
j−1
X
`=1
aTiu`2p/2
− j−1
X
`=1
aTiu`2p/2#
85 86
In fact, vectorujcan be computed by solving a convex program. Specifically, let{e1, . . . ,en−j+1}
87
be an arbitrary orthogonal basis of the vector spaceUj−⊥1 (which is orthogonal toUj−1).
88
Computinguj is equivalent to computing the coefficientsb1, . . . , bn−j+1 in the decomposition
89
ofuj in the basis{e1, . . . ,en−j+1}. The convex program is
90
b1,...,bminn−j+1
m
X
i=1
aTi ·
n−j+1
X
h=1
bheh
2
+
j−1
X
`=1
aTiu`
2p/2
− j−1
X
`=1
aTiu`
2p/2
91
n−j+1
X
h=1
b2h= 1
92
b1, . . . , bn−j+1∈R
9394
Note that in the convex program, variables areb1, . . . , bn−j+1 (the vectorsu`’s, eh’s have
95
been already determined).
96
Analysis. LetV be an optimaln×(n−k) matrix andV be the corresponding vector space spanned by column vectors ofV. In the remaining, we will show that
m X
i=1
kaTiUkp2 1/p
≤O(γp)· m
X
i=1
kaTiVkp2 1/p
First, we recall the following standard lemma.
97
ILemma 1.For any(n−k)-dim subspaceV, there exists an orthonormal basis{v1, . . . ,vn−k}
98
ofV such thatvj is orthogonal to Uj−1 for1≤j≤n−k.
99
Proof. We construct an orthogonal basis{v1, . . . ,vn−k}ofV by induction. The orthonormal
100
basis is obtained by standard normalizing procedure. Forj= 1, any arbitrary vectorv1∈V
101
is perpendicular toU0. Assume that vectorsv1, . . . ,vj forj <(n−k) have been constructed
102
so that they satisfy the lemma. Since the subspaceV has dimension (n−k) which is strictly
103
larger than j, there exits a vectorwj+1 ∈ V which is independent tov1, . . . ,vj. Define
104
vectorvj+1:=wj+1−Pruj(wj+1) wherePruj(wj+1) is the projection of vectorwj+1 onto
105
the subspaceUj. Sovj+1 is orthogonal touj and is independent tov1, . . . ,vj. J
106
S WAT 2 0 1 8
ILemma 2. For any given vectora, for arbitrary vectorsu` and v` with 1≤`≤n−k, it
107
holds that
108
n−k
X
j=1
"
(aTvj)2+
j−1
X
`=1
aTu`2p/2
− j−1
X
`=1
aTu`2p/2#
109
≤µ n−k
X
`=1
aTu`2p/2
+λ n−k
X
`=1
aTv`2p/2
110 111
whereλ=O (α·p2)p2−1
for some constant αandµ= p/p/2−21.
112
Proof. Denotebj = (aTvj)2 andcj = (aTuj)2 for 1≤j ≤n−k. The lemma inequality
113
reads
114
n−k
X
j=1
"
bj+
j−1
X
`=1
c`
p/2
− j−1
X
`=1
c`
p/2#
≤µ n−k
X
`=1
c`
p/2
+λ n−k
X
`=1
b`
p/2
115 116
The inequality holds for λ=O (α·p2)p2−1
for some constantαandµ=p/p/2−21, which has
117
been proved in [2] in the context of algorithmic game theory. For completeness, we put the
118
proof of the above inequality in the appendix (Lemma 5). J
119
ITheorem 3. The greedy algorithm isO(√
p)-approximation.
120
Proof. Recall thatU be the solution of the algorithm where thejth column vector isuj for
121
1≤j ≤n−k. Letv1, . . . ,vn−k be an orthonormal basis ofV that satisfies Lemma 1. We
122
have
123
m
X
i=1
kaTiUkp2=
m
X
i=1 n−k
X
j=1
"
j X
`=1
aTiu`2p/2
− j−1
X
`=1
aTiu`2p/2#
124
≤
m
X
i=1 n−k
X
j=1
"
(aTi vj)2+
j−1
X
`=1
aTi u`2p/2
− j−1
X
`=1
aTi u`2p/2#
125
≤
m
X
i=1
"
µ n−k
X
`=1
aTiu`2p/2
+λ n−k
X
`=1
aTi v`2p/2#
126
=µ
m
X
i=1
kaTiUkp2+λ
m
X
i=1
kaTiVkp2
127 128
The first inequality is due to the choice of the algorithm at any stepj(note thatvj⊥Uj−1 so
129
vjis a candidate at stepj). The second inequality holds by Lemma 2 whereλ=O (α·p2)p2−1
130
for some constantαandµ=p/p/2−21. Rearranging the terms and taking the pth-root, we get
131
m
X
i=1
kaTiUkp2
!1/p
≤O(√ p)·
m
X
i=1
kaTiVkp2
!1/p
132 133
J
134
References
135
1 Kenneth L Clarkson. Subgradient and sampling algorithms for`1regression. InProc. 16th
136
Symposium on Discrete algorithms, pages 257–266, 2005.
137
2 Johanne Cohen, Christoph Dürr, and Nguyen Kim Thang. Smooth inequalities and equilib-
138
rium inefficiency in scheduling games. InInternet and Network Economics, pages 350–363,
139
2012.
140
3 Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W Ma-
141
honey. Sampling algorithms and coresets for`p regression. SIAM Journal on Computing,
142
38(5):2060–2078, 2009.
143
4 Amit Deshpande, Madhur Tulsiani, and Nisheeth K Vishnoi. Algorithms and hardness for
144
subspace approximation. InProc. 22nd Symposium on Discrete Algorithms, pages 482–496,
145
2011.
146
5 Petros Drineas, Michael W Mahoney, and S Muthukrishnan. Sampling algorithms for`2
147
regression and applications. InProc. 17th Symposium on Discrete algorithm, pages 1127–
148
1136, 2006.
149
6 Gene H Golub and Charles F Van Loan. Matrix computations. Johns Hopkins University
150
Press, 2012.
151
7 Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing UGC
152
from some optimal geometric inapproximability results. ACM Transactions on Algorithms,
153
12(1):6, 2016.
154
8 Guy Kindler, Assaf Naor, and Gideon Schechtman. The UGC hardness threshold of the
155
Lpgrothendieck problem. Mathematics of Operations Research, 35(2):267–283, 2010.
156
9 Arkadi Nemirovski, Cornelis Roos, and Tamás Terlaky. On maximization of quadratic form
157
over intersection of ellipsoids with common center. Mathematical Programming, 86(3):463–
158
473, 1999.
159
10 Yuri Nesterov. Global quadratic optimization via conic relaxation. Université Catholique
160
de Louvain. Center for Operations Research and Econometrics [CORE], 1998.
161
11 Kasturi Varadarajan, Srinivasan Venkatesh, Yinyu Ye, and Jiawei Zhang. Approximating
162
the radii of point sets. SIAM Journal on Computing, 36(6):1764–1776, 2007.
163
S WAT 2 0 1 8
Appendix: Technical Lemmas
164
In this section, we show technical lemmas. The following lemma has been proved in [2]. We
165
give it here for completeness.
166
ILemma 4 ([2]). Letk be a positive integer. Let 0< a(k)≤1be a function on k. Then, for anyx, y >0, it holds that
y(x+y)k ≤ k
k+ 1a(k)xk+1+b(k)yk+1 whereαis some constant and
b(k) =
Θ αk· k
logka(k) k−1!
if limk→∞(k−1)a(k) =∞, (1a) Θ αk·kk−1
if (k−1)a(k)are bounded∀k, (1b) Θ
αk· 1 ka(k)k
if limk→∞(k−1)a(k) = 0. (1c) Proof. Letf(z) := k+1k a(k)zk+1−(1 +z)k+b(k). To show the claim, it is equivalent to
167
prove thatf(z)≥0 for allz >0.
168
We havef0(z) =ka(k)zk−k(1 +z)k−1. We claim that the equationf0(z) = 0 has an unique positive rootz0. Consider the equationf0(z) = 0 forz >0. It is equivalent to
1 z + 1
k
·1 z =a(k)
The left-hand side is a strictly decreasing function and the limits whenztends to 0 and∞
169
are∞and 0, respectively. Asa(k) is a positive constant, there exists an unique rootz0>0.
170
Observe that functionfis decreasing in (0, z0) and increasing in (z0,+∞), sof(z)≥f(z0)
171
for allz >0. Hence, by choosing
172
b(k) =
k
k+ 1a(k)z0k+1−(1 +z0)k
= (1 +z0)k−1 1 + z0
k+ 1
(2)
173
it follows thatf(z)≥0∀z >0.
174
We study the positive rootz0 of equation
175
a(k)zk−(1 +z)k−1= 0 (3)
176
Note that f0(1) = k(a(k)−2k−1) < 0 since 0 < a(k) ≤1. Thus, z0 > 1. For the sake
177
of simplicity, we define the function g(k) such that z0 = k−g(k1) where 0 < g(k) < k−1.
178
Equation (3) is equivalent to
179
1 + g(k) k−1
k−1
g(k) = (k−1)a(k)
180
Note thatew/2<1 +w < ew forw∈(0,1). For w:= gk−(k1), we obtain the following upper
181
and lower bounds for the term (k−1)a(k):
182
eg(k)/2g(k)<(k−1)a(k)< eg(k)g(k) (4)
183
Recall the definition of Lambert W function. For each y ∈R+, W(y) is defined to be
184
solution of the equationxex=y. Note that,xexis increasing with respect to x, henceW(·)
185
is increasing.
186
By definition of the Lambert W function and Equation (4), we get that
187
W((k−1)a(k))< g(k)<2W
(k−1)a(k) 2
(5)
188
First, consider the case where limk→∞(k−1)a(k) =∞. The asymptotic sequence for
189
W(x) asx→+∞ is the following: W(x) = lnx−ln lnx+ ln lnlnxx+O
ln lnx lnx
2
. So, for
190
large enough k,W((k−1)a(k)) = Θ(log((k−1)a(k))). Sincez0= k−g(k1), from Equation (5),
191
we get z0 = Θ
k log(ka(k))
. Therefore, by (2) we haveb(k) = Θ
αk·
k logka(k)
k−1 for
192
some constantα.
193
Second, consider the case where (k−1)a(k) is bounded by some constants. So by (5), we
194
haveg(k) = Θ(1). Thereforez0= Θ(k) which again impliesb(k) = Θ αk·kk−1
for some
195
constantα.
196
Third, we consider the case where limk→∞(k−1)a(k) = 0. We focus on the Taylor series W0 ofW around 0. It can be found using the Lagrange inversion and is given by
W0(x) =
∞
X
i=1
(−i)i−1
i! xi=x−x2+O(1)x3.
Thus, forklarge enoughg(k) = Θ((k−1)a(k)). Hence,z0= Θ(1/a(k)). Once again that
197
impliesb(k) = Θ
αk· ka(1k)k
for some constantα. J
198
ILemma 5.For any sequences of non-negative real numbers{a1, a2, . . . , an}and{b1, b2, . . . , bn}
199
and for any polynomial g of degreekwith non-negative coefficients, it holds that
200
n
X
i=1
g
bi+
i−1
X
j=1
aj
−g i−1
X
j=1
aj
≤λ(k)·g n
X
i=1
bi
+µ(k)·g n
X
i=1
ai
201 202
where µ(k) = k−k1 and λ(k) = Θ kk−1
. The same inequality holds for µ(k) = kk−ln1k and
203
λ(k) = Θ (α·klnk)k−1
for some constant α.
204
Proof. Letg(z) =g0zk+g1zk−1+·+gk withgt≥0∀t. The lemma holds since it holds for
205
every ztfor 0≤t≤k. Specifically,
206
n
X
i=1
g
bi+
i−1
X
j=1
aj
−g i−1
X
j=1
aj
=
k
X
t=1
gk−t·
n
X
i=1
bi+
i−1
X
j=1
aj
t
− i−1
X
j=1
aj
t
207
≤
k
X
t=1
gk−t·
t·bi·
bi+
i−1
X
j=1
aj t−1
≤
k
X
t=1
gk−t·
"
λ(t) n
X
i=1
bi t
+µ(t) n
X
i=1
ai t#
208
≤λ(k)·g n
X
i=1
bi
+µ(k)·g n
X
i=1
ai
209 210
The first inequality follows the convex inequality (x+y)k+1−xk+1≤(k+ 1)y(x+y)k. The
211
second inequality follows Lemma 4 (Case (1b) and a(k) = 1/(k+ 1)). The last inequality
212
holds sinceµ(t)≤µ(k) andλ(t)≤λ(k) fort≤k. J
213
S WAT 2 0 1 8