Nguyen Kim Thang

(1)

Problem

2

Nguyen Kim Thang

3

IBISC, Univ Evry, University Paris Saclay

4

Evry, France

5

[email protected]

6

https://orcid.org/0000-0002-6085-9453

7

Abstract

8

In the subspace approximation problem, given m points in Rⁿ and an integer k ≤ n, the goal

9

is to find ak-dimension subspace ofRⁿ that minimizes the`p-norm of the Euclidean distances

10

to the given points. This problem generalizes several subspace approximation problems and has

11

applications from statistics, machine learning, signal processing to biology. Deshpande et al. [4]

12

gave a randomizedO(√

p)-approximation and this bound is proved to be tight assuming NP6= P

13

by Guruswami et al. [7]. It is an intriguing question of determining the performance guarantee

14

of deterministic algorithms for the problem. In this paper, we present a simple deterministic

15

O(√

p)-approximation algorithm with also a simple analysis. That definitely settles the status

16

of the problem in term of approximation up to a constant factor. Besides, the simplicity of the

17

algorithm makes it practically appealing.

18

2012 ACM Subject Classification Theory of computation Approximation algorithms analysis

19

Keywords and phrases Approximation Algorithms, Subspace Approximation

20

Digital Object Identifier 10.4230/LIPIcs.SWAT.2018.30

21

Funding Research supported by the ANR project OATA n^oANR-15-CE40-0015-01.

22

1 Introduction

23

Massive data in high dimension emerge naturally in many domains from machine learning to

24

biology. It has been observed that although data lie in high-dimensional spaces, in practice

25

they have low intrinsic dimension. Dimension-reduction algorithms are essential in many

26

domains such as image processing, personalized medicine, etc. In this paper, we consider

27

the following subspace approximation problem in the context of capturing the underlying

28

low-dimensional structures of given data.

29

Subspace Problem. Given pointsa₁, . . . ,a_m∈Rⁿ and integersp≥1 and 0≤k≤n. Find ak-dimensional linear subspaceW that minimizes the`p-norms of Euclidean distances of these points to W, i.e.,

min

W:dim(W)=k

^m X

i=1

d(ai, W)^p 1/p

The Subspace Problem, which is introduced by Deshpande et al. [4], is a generalization of

30

several sub-space approximation problems which have been widely studied. For example,

31

the well-knownLeast Square Fit Problemis a particular case. In the latter, given a matrix

32

A∈R^m×n and 0≤k≤n, find a matrixB∈R^m×n of rank at most kthat minimizes the

33

Frobenius norm of the differencekA−BkF := P

i,j(Aij−Bij)²1/2. Taking the rows ofA

34

to bea₁, . . . ,amandp= 2,Subspace Problemreduces toLeast Square Fit Problem. Another

35

licensed under Creative Commons License CC-BY

16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018).

Editor: David Eppstein; Article No. 30; pp. 30:1–30:7

Leibniz International Proceedings in Informatics

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

(2)

special case ofSubspace ProblemisRadii Problem. In the latter, given pointsa₁, . . . ,am∈Rⁿ,

36

theirouter (n−k) radius is the minimum, over allk-dimensional linear subspaces, of the

37

maximum Euclidean distances of these points to the subspace. This problem is equivalent

38

toSubspace Problemwithp=∞. Moreover,Subspace Problem is related to other problems

39

such as theL_p-Grothendieck Problem[8] and`_p-Regression Problem[1, 5, 3]. We refer the

40

reader to [4] for more details about the connection between these problems.

41

Deshpande et al. [4] introduced theSubspace Problem and gave a randomizedO(√ p)-

42

approximation algorithm. In their approach, they consider a convex relaxation that optimizes

43

over positive semidefinite matrices and the rank constraint is replaced by a trace constraint.

44

Subsequently, the solution X of the convex relaxation is rounded to a matrix of suitable

45

rank. Intuitively, the authors divide singular vectors of the solutionX into several bins and

46

constructs one vector for each bin by taking a Bernoulli random linear combination of vectors

47

within each bin. The analysis is carried out by powerful techniques coupled with properties

48

of thep^th-moment of sums of Bernoulli random variables. Besides, Deshpande et al. [4]

49

proved that theSubspace Problemis hard to approximate within a factor Ω(√

p) assuming

50

the Unique Games Conjecture (UGC). Later on, bypassing the need for UGC, Guruswami et

51

al. [7] showed that the problem is indeed NP-hard to approximate within a factor Ω(√ p).

52

Our Contribution. In this paper, we present a deterministic greedy algorithm with the same

53

O(√

p)-approximation guarantee. Informally, at any step, the algorithm greedily extends the

54

subspace in order to minimizes the marginal cost of the objective functions. The algorithm

55

(and also the analysis) is extremely simple, which makes it practically appealing. Besides,

56

our algorithm is deterministic whereas the one in [4] is randomized. The analysis is based on

57

a smooth inequality (Lemma 2), which has been originally used in the context of algorithmic

58

game theory in order to bound the quality of equilibrium (price of anarchy) in scheduling

59

games [2]. This allows us to give a deterministic algorithm instead of the randomized one [4]

60

which crucially relies on concentration inequalities in functional analysis in order to bound

61

the moments of sums of Bernoulli random variables. Our result definitely settles the status

62

of the problem in term of approximation up to a constant factor.

63

Related works. The most closely related to our paper is [4] where the results have been

64

summarized earlier. For theLeast Square Fit Problem, the optimal subspace is spanned by the

65

topkright singular vector ofAand that can be computed in timeO(min{n²m, nm²}) [6]. For

66

theRadii Problem,O(√

logm)-approximation withk=n−1 can be implied from the works of

67

Nesterov [10] and Nemirovski et al. [9]. Later on, Varadarajan et al. [11] gave anO(√ logm)-

68

approximation algorithm for arbitraryk. Note that it is well-known that`∞-norm can be

69

approximated by`_logm-norm up to a constant factor. Hence,O(√

logm)-approximation can

70

be deduced from [4] (so our work) by choosingp= logm.

71

2 Greedy Algorithm

72

As in [4], we use a formulation of theSubspace Problemin terms of the orthogonal complement

73

of the subspaceW. Specifically, letz₁, . . . ,zn−k be an orthonormal basis for the orthogonal

74

complement. Let Z ∈ R^n×⁽^n−k⁾ be the matrix with the j^th column vector zj. Then

75

d(ai, W) =ka^T_iZk₂. Hence, the problem is to find an orthonormal basisz₁, . . . ,zn−k of a

76

(3)

(n−k)-dim vector spaceV so that the corresponding matrixZ minimizes

77

m

X

i=1

ka^T_i Zk^p₂=

m

X

i=1

^n−k X

`=1

a^T_iz_`2p/2

78

=

m

X

i=1 n−k

X

j=1

" ^j X

`=1

a^T_i z`

2p/2

− ^j−1

X

`=1

a^T_i z`

2p/2# ,

79 80

where conventionally the sum with no term equals 0.

81

Algorithm. Initially, the subspace U₀ = ∅. For 1≤ j ≤ n−k, choose a vectoruj 6=0

82

orthonormal to the subspace U_j−₁ spanned by u₁, . . . ,u_j−₁ such that it minimizes the

83

marginal increase of the objective, i.e.,

84

uj∈arg min

z⊥Uj−1

m

X

i=1

"

(a^T_iz)²+

j−1

X

`=1

a^T_iu`2^p/2

− ^j−1

X

`=1

a^T_iu`2^p/2#

85 86

In fact, vectoru_jcan be computed by solving a convex program. Specifically, let{e₁, . . . ,e_n−j₊₁}

87

be an arbitrary orthogonal basis of the vector spaceU_j−^⊥₁ (which is orthogonal toU_j−₁).

88

Computingu_j is equivalent to computing the coefficientsb₁, . . . , b_n−j₊₁ in the decomposition

89

ofuj in the basis{e₁, . . . ,e_n−j₊₁}. The convex program is

90

b₁,...,bmin_n−j+1

m

X

i=1

a^T_i ·

n−j+1

X

h=1

bheh

2

+

j−1

X

`=1

a^T_iu`

2p/2

− ^j−1

X

`=1

a^T_iu`

2p/2

91

n−j+1

X

h=1

b²_h= 1

92

b₁, . . . , b_n−j₊₁∈R

9394

Note that in the convex program, variables areb₁, . . . , b_n−j₊₁ (the vectorsu_`’s, e_h’s have

95

been already determined).

96

Analysis. LetV be an optimaln×(n−k) matrix andV be the corresponding vector space spanned by column vectors ofV. In the remaining, we will show that

^m X

i=1

ka^T_iUk^p₂ 1/p

≤O(γp)· ^m

X

i=1

ka^T_iVk^p₂ 1/p

First, we recall the following standard lemma.

97

ILemma 1.For any(n−k)-dim subspaceV, there exists an orthonormal basis{v₁, . . . ,v_n−k}

98

ofV such thatv_j is orthogonal to U_j−₁ for1≤j≤n−k.

99

Proof. We construct an orthogonal basis{v₁, . . . ,v_n−k}ofV by induction. The orthonormal

100

basis is obtained by standard normalizing procedure. Forj= 1, any arbitrary vectorv₁∈V

101

is perpendicular toU₀. Assume that vectorsv₁, . . . ,vj forj <(n−k) have been constructed

102

so that they satisfy the lemma. Since the subspaceV has dimension (n−k) which is strictly

103

larger than j, there exits a vectorwj+1 ∈ V which is independent tov₁, . . . ,vj. Define

104

vectorvj+1:=wj+1−Pru_j(wj+1) wherePru_j(wj+1) is the projection of vectorwj+1 onto

105

the subspaceUj. Sovj+1 is orthogonal touj and is independent tov₁, . . . ,vj. J

106

S WAT 2 0 1 8

(4)

ILemma 2. For any given vectora, for arbitrary vectorsu` and v` with 1≤`≤n−k, it

107

holds that

108

n−k

X

j=1

"

(a^Tv_j)²+

j−1

X

`=1

a^Tu_`2p/2

− j−1

X

`=1

a^Tu_`2p/2#

109

≤µ ^n−k

X

`=1

a^Tu`2^p/2

+λ ^n−k

X

`=1

a^Tv`2^p/2

110 111

whereλ=O (α·^p₂)^p²⁻¹

for some constant αandµ= ^p/_p/²⁻₂¹.

112

Proof. Denoteb_j = (a^Tv_j)² andc_j = (a^Tu_j)² for 1≤j ≤n−k. The lemma inequality

113

reads

114

n−k

X

j=1

"

bj+

j−1

X

`=1

c`

^p/2

− ^j−1

X

`=1

c`

^p/2#

≤µ ^n−k

X

`=1

c`

^p/2

+λ ^n−k

X

`=1

b`

^p/2

115 116

The inequality holds for λ=O (α·^p₂)^p²⁻¹

for some constantαandµ=^p/_p/²⁻₂¹, which has

117

been proved in [2] in the context of algorithmic game theory. For completeness, we put the

118

proof of the above inequality in the appendix (Lemma 5). J

119

ITheorem 3. The greedy algorithm isO(√

p)-approximation.

120

Proof. Recall thatU be the solution of the algorithm where thej^th column vector isuj for

121

1≤j ≤n−k. Letv₁, . . . ,v_n−k be an orthonormal basis ofV that satisfies Lemma 1. We

122

have

123

m

X

i=1

ka^T_iUk^p₂=

m

X

i=1 n−k

X

j=1

"

^j X

`=1

a^T_iu`2^p/2

− ^j−1

X

`=1

a^T_iu`2^p/2#

124

≤

m

X

i=1 n−k

X

j=1

"

(a^T_i vj)²+

j−1

X

`=1

a^T_i u`2^p/2

− ^j−1

X

`=1

a^T_i u`2^p/2#

125

≤

m

X

i=1

"

µ ^n−k

X

`=1

a^T_iu_`2p/2

+λ ^n−k

X

`=1

a^T_i v_`2p/2#

126

=µ

m

X

i=1

ka^T_iUk^p₂+λ

m

X

i=1

ka^T_iVk^p₂

127 128

The first inequality is due to the choice of the algorithm at any stepj(note thatv_j⊥Uj−1 so

129

vjis a candidate at stepj). The second inequality holds by Lemma 2 whereλ=O (α·^p₂)^p²⁻¹

130

for some constantαandµ=^p/_p/²⁻₂¹. Rearranging the terms and taking the p^th-root, we get

131

m

X

i=1

ka^T_iUk^p₂

!1/p

≤O(√ p)·

m

X

i=1

ka^T_iVk^p₂

!1/p

132 133

J

134

References

135

1 Kenneth L Clarkson. Subgradient and sampling algorithms for`₁regression. InProc. 16th

136

Symposium on Discrete algorithms, pages 257–266, 2005.

137

(5)

2 Johanne Cohen, Christoph Dürr, and Nguyen Kim Thang. Smooth inequalities and equilib-

138

rium inefficiency in scheduling games. InInternet and Network Economics, pages 350–363,

139

2012.

140

3 Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W Ma-

141

honey. Sampling algorithms and coresets for`p regression. SIAM Journal on Computing,

142

38(5):2060–2078, 2009.

143

4 Amit Deshpande, Madhur Tulsiani, and Nisheeth K Vishnoi. Algorithms and hardness for

144

subspace approximation. InProc. 22nd Symposium on Discrete Algorithms, pages 482–496,

145

2011.

146

5 Petros Drineas, Michael W Mahoney, and S Muthukrishnan. Sampling algorithms for`₂

147

regression and applications. InProc. 17th Symposium on Discrete algorithm, pages 1127–

148

1136, 2006.

149

6 Gene H Golub and Charles F Van Loan. Matrix computations. Johns Hopkins University

150

Press, 2012.

151

7 Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing UGC

152

from some optimal geometric inapproximability results. ACM Transactions on Algorithms,

153

12(1):6, 2016.

154

8 Guy Kindler, Assaf Naor, and Gideon Schechtman. The UGC hardness threshold of the

155

Lpgrothendieck problem. Mathematics of Operations Research, 35(2):267–283, 2010.

156

9 Arkadi Nemirovski, Cornelis Roos, and Tamás Terlaky. On maximization of quadratic form

157

over intersection of ellipsoids with common center. Mathematical Programming, 86(3):463–

158

473, 1999.

159

10 Yuri Nesterov. Global quadratic optimization via conic relaxation. Université Catholique

160

de Louvain. Center for Operations Research and Econometrics [CORE], 1998.

161

11 Kasturi Varadarajan, Srinivasan Venkatesh, Yinyu Ye, and Jiawei Zhang. Approximating

162

the radii of point sets. SIAM Journal on Computing, 36(6):1764–1776, 2007.

163

S WAT 2 0 1 8

(6)

Appendix: Technical Lemmas

164

In this section, we show technical lemmas. The following lemma has been proved in [2]. We

165

give it here for completeness.

166

ILemma 4 ([2]). Letk be a positive integer. Let 0< a(k)≤1be a function on k. Then, for anyx, y >0, it holds that

y(x+y)^k ≤ k

k+ 1a(k)x^k⁺¹+b(k)y^k⁺¹ whereαis some constant and

b(k) =











Θ α^k· k

logka(k) k−1!

if limk→∞(k−1)a(k) =∞, (1a) Θ α^k·k^k−¹

if (k−1)a(k)are bounded∀k, (1b) Θ

α^k· 1 ka(k)^k

if lim_k→∞(k−1)a(k) = 0. (1c) Proof. Letf(z) := _k₊₁^k a(k)z^k⁺¹−(1 +z)^k+b(k). To show the claim, it is equivalent to

167

prove thatf(z)≥0 for allz >0.

168

We havef⁰(z) =ka(k)z^k−k(1 +z)^k−¹. We claim that the equationf⁰(z) = 0 has an unique positive rootz₀. Consider the equationf⁰(z) = 0 forz >0. It is equivalent to

1 z + 1

k

·1 z =a(k)

The left-hand side is a strictly decreasing function and the limits whenztends to 0 and∞

169

are∞and 0, respectively. Asa(k) is a positive constant, there exists an unique rootz₀>0.

170

Observe that functionfis decreasing in (0, z₀) and increasing in (z₀,+∞), sof(z)≥f(z₀)

171

for allz >0. Hence, by choosing

172

b(k) =

k

k+ 1a(k)z₀^k⁺¹−(1 +z₀)^k

= (1 +z₀)^k−¹ 1 + z₀

k+ 1

(2)

173

it follows thatf(z)≥0∀z >0.

174

We study the positive rootz₀ of equation

175

a(k)z^k−(1 +z)^k−¹= 0 (3)

176

Note that f⁰(1) = k(a(k)−2^k−¹) < 0 since 0 < a(k) ≤1. Thus, z₀ > 1. For the sake

177

of simplicity, we define the function g(k) such that z₀ = ^k−_g₍_k¹₎ where 0 < g(k) < k−1.

178

Equation (3) is equivalent to

179

1 + g(k) k−1

k−1

g(k) = (k−1)a(k)

180

Note thate^w/²<1 +w < e^w forw∈(0,1). For w:= ^g_k−⁽^k₁⁾, we obtain the following upper

181

and lower bounds for the term (k−1)a(k):

182

e^g⁽^k⁾^/²g(k)<(k−1)a(k)< e^g⁽^k⁾g(k) (4)

183

(7)

Recall the definition of Lambert W function. For each y ∈R⁺, W(y) is defined to be

184

solution of the equationxe^x=y. Note that,xe^xis increasing with respect to x, henceW(·)

185

is increasing.

186

By definition of the Lambert W function and Equation (4), we get that

187

W((k−1)a(k))< g(k)<2W

(k−1)a(k) 2

(5)

188

First, consider the case where lim_k→∞(k−1)a(k) =∞. The asymptotic sequence for

189

W(x) asx→+∞ is the following: W(x) = lnx−ln lnx+ ^{ln ln}_ln_x^x+O

ln lnx lnx

2

. So, for

190

large enough k,W((k−1)a(k)) = Θ(log((k−1)a(k))). Sincez₀= ^k−_g₍_k¹₎, from Equation (5),

191

we get z₀ = Θ

k log(ka(k))

. Therefore, by (2) we haveb(k) = Θ

α^k·

k logka(k)

k−1 for

192

some constantα.

193

Second, consider the case where (k−1)a(k) is bounded by some constants. So by (5), we

194

haveg(k) = Θ(1). Thereforez₀= Θ(k) which again impliesb(k) = Θ α^k·k^k−¹

for some

195

constantα.

196

Third, we consider the case where lim_k→∞(k−1)a(k) = 0. We focus on the Taylor series W₀ ofW around 0. It can be found using the Lagrange inversion and is given by

W₀(x) =

∞

X

i=1

(−i)ⁱ⁻¹

i! xⁱ=x−x²+O(1)x³.

Thus, forklarge enoughg(k) = Θ((k−1)a(k)). Hence,z₀= Θ(1/a(k)). Once again that

197

impliesb(k) = Θ

α^k· _ka₍¹_k₎_k

for some constantα. J

198

ILemma 5.For any sequences of non-negative real numbers{a₁, a₂, . . . , a_n}and{b₁, b₂, . . . , b_n}

199

and for any polynomial g of degreekwith non-negative coefficients, it holds that

200

n

X

i=1



g

bi+

i−1

X

j=1

aj

−g ⁱ⁻1

X

j=1

aj



≤λ(k)·g ⁿ

X

i=1

bi

+µ(k)·g ⁿ

X

i=1

ai

201 202

where µ(k) = ^k−_k¹ and λ(k) = Θ k^k−¹

. The same inequality holds for µ(k) = _k^k−_ln¹_k and

203

λ(k) = Θ (α·klnk)^k−¹

for some constant α.

204

Proof. Letg(z) =g₀z^k+g₁z^k−¹+·+g_k withg_t≥0∀t. The lemma holds since it holds for

205

every z^tfor 0≤t≤k. Specifically,

206

n

X

i=1



g

bi+

i−1

X

j=1

aj

−g ⁱ⁻1

X

j=1

aj



=

k

X

t=1

gk−t·

n

X

i=1





bi+

i−1

X

j=1

aj

t

− ⁱ⁻1

X

j=1

aj

t

207 

≤

k

X

t=1

g_k−t·



t·b_i·

b_i+

i−1

X

j=1

a_j t−1

≤

k

X

t=1

g_k−t·

"

λ(t) ⁿ

X

i=1

b_i t

+µ(t) ⁿ

X

i=1

a_i t#

208

≤λ(k)·g ⁿ

X

i=1

b_i

+µ(k)·g ⁿ

X

i=1

a_i

209 210

The first inequality follows the convex inequality (x+y)^k⁺¹−x^k⁺¹≤(k+ 1)y(x+y)^k. The

211

second inequality follows Lemma 4 (Case (1b) and a(k) = 1/(k+ 1)). The last inequality

212

holds sinceµ(t)≤µ(k) andλ(t)≤λ(k) fort≤k. J

213

S WAT 2 0 1 8