HAL Id: hal-02399129
https://hal.archives-ouvertes.fr/hal-02399129v12
Preprint submitted on 23 May 2021 (v12), last revised 14 Dec 2021 (v15)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Self concordant Perceptron for exact computation in linear feasibility
Adrien Chan-Hon-Tong
To cite this version:
Adrien Chan-Hon-Tong. Self concordant Perceptron for exact computation in linear feasibility. 2021.
�hal-02399129v12�
Self concordant Perceptron for exact computation in linear feasibility
Adrien CHAN-HON-TONG May 23, 2021
Abstract
This paper offers a new polynomial algorithm for linear feasibility.
Despite, arithmetic complexity of this algorithm is slightly higher than best state of the art path-following one, the offered algorithm is very suitable for exact computation as simple transformation allows to handle integer matrix.
All bounds required for establishing complexity have simple explicit values (except for the number of steps classically linked with a sub determinant of the input matrix). In particular, rounding process is explicit.
1 Purpose
Linear programming is a central optimization problem. Today, state of the art algo- rithms is central-path log-barrier [10] and/or path-following [12] algorithms which solves linear program withM variables and constraints, and L total binary size in less than O(e √
M L) Newton steps. As each Newton step is mainly the resolution of a M ×M linear system, the arithmetic time complexity of those algorithms is O(Me ω√
M L)whereω is the coefficient of matrix multiplication (3 with simple al- gorithm but2.38with [1]). There are even faster randomized algorithm like [5] which are not in the scope of this paper which is about exact computation.
There is also algorithms with higher complexity but with interesting features. [11]
requires O(Me 2√
M L) Perceptron steps, and thus, O(Me 4√
M L) arithmetic opera- tions. But, it is related to Perceptron [13] and in particular does not rely on matrix inversion (contrary to Newton based). An other is [4]: it requiresO(Me 4L)arithmetic operation but does not require matrix inversion, and, it is linked with other algorithms which are strongly polynomial on special linear program families [3].
This paper aims to offer a new algorithm which is between those two groups of algorithm. This algorithm links Perceptron and self concordance theory, and it requires M L Newton steps (i.e. MωM L arithmetic operations). Thus, at first glance, it is slower than [10, 12] and requires matrix inversion contrary to [11, 4]. Yet, it deals straightforwardly with integer matrix.
This last point is important forexactcomputation because matrix inversion can be done inO(Me ω)arithmeticoperations i.e. by considering operation onZor even onQ
as 1 operation. Now, from exact computation perspective, it is required to considerbi- naryoperations. In this case, binary complexity will depend on how the matrix/vector handled during computations are large.
Typically, most state of art methods rely on scaling i.e. at some point in the al- gorithm, there is an operation likeA = A(I+xxT)in [11], or column(A, k) =
1
2 ×column(A, k)in [4] orµ= 12 ×µin [2] (a classical implementation for central path). Thus, it means that the binary size of the matrix manipulated during the algo- rithm is going to increase quickly. Potentially, the total binary size will become much larger than initial one, in particular as the number of step is larger thanL. Indeed, if one variable is scaled twice each√
M Lsteps, then the final value is scaled by2
√M L
which is larger than the initial total binary sizeLof the matrix.
Currently, only careful path following implementation seems to maintain a frozen binary sizeLbecause scaling is(1 + √1
M)[12] (as(1 + √1
M)
√
M L ≈ 2log(e)L this leads to a binary size ofO(L)). However, [12] admits that this algorithm requires ae precise rounding process depending on severalhiddenconstants.
So, despite it is possible to get a exact algorithm for linear programming with arith- metic complexityO(Me ω√
M L)and mastered binary size, there is an interest to present algorithm with simpler rounding scheme like [8] (or even no matrix inversion like [11]).
Thus, the offered algorithm is between [11] and [12]: faster than [11] (but requir- ing heavy linear algebra), and, simpler than [12] (but with an extra√
M number of iterations). Currently, it seems even simpler than [8] which has same O(Me ωM L) complexity. This situation is summarized by table1.
Algorithm Qtime complexity inversion Lbinary size easy rounding
[13] exponential no no no
[7] exponential yes yes yes
[11] O(Me 2N2√
M L) no no probably
[4] O(Ne 4L) no yes yes
this O(Me ωM L) yes yes yes
[2] O(Ne ω√
N L) yes no no
[12] O(Ne ω√
N L) yes yes no
bolt highlights good feature like not depending on matrix inversion (hard to implement).ωis the exponent of matrix multiplication.
Table 1: Comparison of self concordant Perceptron with state of the art.
2 Sketch of the algorithm
This section provesarithmeticconvergence of the algorithm. Section 3 will focus on the rounding process.
2.1 Underlying theories
First, the offered algorithm works with linear program in form of linear feasibility:
Findingxsuch thatAx > 0for a given matrixA ∈ ZM×N with the prior that one solution to this set of strict inequality exists.
This is not a limitation forexactcomputation, as, any linear program can be en- coded in a linear feasibility instance (with the assumption of the existence of a solution) withsamebinary size and number of variables (see appendix).
Then, the algorithm relies on self concordance theory (see [9] for a complete pre- sentation): IfGis a self concordant function (mainly sum of quadratic, linear, constant and−log), with a minimumG∗, then, Newton descent starting fromxstartallows to findxsuch thatG(x)−G∗ ≤ εinO(G(xe start)−G∗+ log log(1ε))damped New- ton steps. Each step consists inx = x− 1+λ1
G(x)(∇2xG)−1(∇xG)withλG(x) = p(∇xG)T(∇2xG)−1(∇xG). Precisely,
• whileλ(x)≥14, each damped Newton step decreasesGof at least14−log(54)≥
1
50 - thus this so called phase 1 can not last more than50×(G(xstart)−G∗) damped Newton steps.
• as soon as one hasxphasesuch thatλ(xphase) ≤ 14, then,O(log log(e 1ε))addi- tional steps are required to getxsuch thatG(x)−G∗ ≤ε(this is the so called phase 2 with quadratic convergence)
2.2 Self concordant Perceptron
Main definition:∀A∈QM×N, let introduce the self concordant function:
FA(v) = vTAATv
2 −1Tlog(v) =
M
X
i,j=1
vivj×AiATj −
M
X
m=1
log(vm)
Main theorem:For all linear feasibility instancesA∈QM×N withAmATm= 1
• ∃x∗A/ Ax∗A≥1(by definition)
• FA(M11)≤1 +Mlog(M)
• FAhas a minimum (let write itFA∗) with−FA∗ ≤Mlog(x∗TA x∗A)
• for allv,FA(v)−FA∗ ≤ 1
2M x∗ATx∗A+2 ⇒AATv >0
Important remark:In linear feasibility, not allAcould be encountered, yet, any linear program can be encoded into a linear feasibility instance with existingx∗A. Indeed, if there is nox∗Asuch thatAx∗A≥1, then there existsysuch thatAy=0andy >0, and, FAis not bounded (can go to−∞). But, if the instance is a linear feasibility onen, then there existsx∗A, and then, this implies the existence of a minimum as log maintaining
v >0and Cauchy boundvfor a constantvTAAv(x∗TvTv
A x∗A ≤(1x∗TTv)2
Ax∗A ≤vTAATvfrom x∗TA (ATv)≥1Tv).
Notation:FAandx∗Awill writtenF andxwhen it is not ambiguous.
Trivial corollary: self concordance theory on main theorem directly states that O(Me log(xTx) +Mlog(M) + log log(2M xTx)) = O(Me log(xTx))damped Newton steps (starting from M11) allows to compute a solution of the linear fea- sibility instance.
Then, classical results allow to linkxwith a vertex of the polytope{z / Az ≥1}
which implies thatxis linked with a sub determinant ofAand thuslog(xTx) =O(L)e the total binary size of matrixAbefore normalization (next section will detail this point precisely).
Let stress that in this algorithm the so called second phase is negligible because log(L)is not an issue, thus, all the optimization is done during the so called first phase of the optimization. This is the main difference with [8] explaining why the self con- cordant Perceptron will have better binary property.
Indeed, rounding during second phase seems harder than during first phase, yet, rounding is even not required when there is onlylog(L)second phase steps. Thus this paper focuses on the rounding process of the first phase which is simple.
Finally, let stress that x∗A is the solution of support vector machine problem on Awhich exists asAis linearly separable by assumption (let recall that this does not restrict generality: any linear program can be encoded into such linear feasibility in- stance with existing solution - see appendix). Thus,x∗TA x∗Ais the norm of the support vector solution which is the inverse of the marginρin [11]. Thus, this algorithm is easily comparable with [11]: [11] requiresM2√
Mlog(1ρ)Perceptron steps (which require to computeA×current point) i.e. M4√
M Larithmetic operations, while self concordant Perceptron requiresMlog(1ρ)Newton steps i.e. MωMlog(ρ1)arith- metic operations. So self concordant Perceptron is much faster but with the drawback of requiring heavy linear algebra.
2.3 Proof of the claim
2.3.1 Existence of a minimum First,F(M11) = M12
P
i,j
AiATj+Mlog(M)≤1+Mlog(M)(becauseAis normalized and CauchyAiATj ≤p
AiATi ×AjATj= 1).
Then, from Cauchy(ATv)Tx≤ √
vTAATv×xTx. But (ATv)Tx = vT(Ax).
And, by definitionAx≥1(it is unknown but the assumption is that it exists). Injecting this last inequality is interesting asv ≥0: as eachAmx≥1a fortioriAmx >0and vm > 0sovTAm > 0. Even more,∀v ≥0,0 ≤ vT1 ≤vT(Ax) = (ATv)Tx ≤
√vTAATv×xTx. One can even take the square (because both side are positive):
∀v ≥0, (vxTT1)x2 ≤vTAATv. And, independently(vT1)2> vTvbecausev≥0. So
∀v≥0, vxTTvx≤vTAATv.
Let introduce the 1D functionf(t) = 2xt2Tx −log(t), from previous inequality it stands thatF(v)≥P
m
f(vm).
Now,fis a single variable function which goes to infinity whentgoes to 0 (t2→0 but−log(t)→ ∞) or to infinity (t2growths faster thanlog(t)). So,f has a minimum and soFtoo. Let call themf∗andF∗.
AsfandFare smooth, the minimums are characterized by a null derivative or gra- dient.f0(t) = xTtx−1t, so,f0(
√
xTx) = 0, sof∗=f(
√
xTx) = 12−12log(xTx)≥
−log(xTx). Thus, the minimum ofFverifiesF∗≥M f∗≥ −Mlog(xTx).
So the two first assertions of the main theorem are proven.
2.3.2 Normalization, linearization and lemmas
Independently, let remark thatθ(t) =F(tv) = vTAA2 Tvt2−1Tlog(v)−Mlog(t)is minimal whenvTAATv = M. So for anyw, one could build av = µwsuch that vTAATv=M andF(v)≤F(w). In other words, it stands thatFq
M vTAATvv
≤ F(v).
So, let considerv ≥0such thatvTAATv =M. As, vTAATv ≥ (1v)xTx2, novm could be higher than√
M xTxi.e.0≤v≤√
M xTx1.
Let also remark thatF(v+w) = vTAA2 Tv+wTAA2 Tw+wTAATv−1Tlog(v)− 1Tlog(1 +wv) =F(v) +wTAA2 Tw+wTAATv−1Tlog(1 +wv)
Finally, let consider the following lemmas from basic analysis:
1. φ(t) = 12αt2−log(1 +t)≤12(α+ 1)t2−t=ψ(t)fort≥0 2. ψ(α+11 )≤ −2α+21
3. φ(α+11 )≤ −2α+21 i.e.∀α≥0,12(α+1)α 2 −log(1 +α+11 )≤ −2α+21
Lemma1: ψ0(t)−φ0(t) = (α+ 1)t−1−αt+ 1+t1 = t−1 + 1+t1 = 1+tt2 >
0, so ψ(t)−φ(t) always increases. But, ψ(0) = φ(0) = 0 so ψ(t) ≥ φ(t) for t ≥ 0. Lemma2: ψ(α+11 ) = 12(α+ 1)(α+1)1 2 − α+11 = −2α+21 . lemma3 is just lemma1+lemma2.
2.3.3 Convergence
Now, eitherAATv >0(problem solved) or there existsksuch thatAkATv≤0.
Let consider this caseAkATv ≤ 0 andvTAATv = M, and, let introducew = v+v2vk
k+11k.
ThenF(w) = F(v+ v2vk
k+11k) = F(v) + Ak2ATk(v2vk
k+1)2+AkATv× v2vk k+1 − log(1 + v21
k+1). But, AkATv ≤ 0 (by assumption) and AkATk = 1, so F(w) ≤ F(v) +12(vvk2
k+1)2−log(1 +v21 k+1).
And, from lemmas just above (considerα=v2k),F(w)≤F(v)−2v21 k+2.
But,vk ≤√
M xTx, so,F(w)≤F(v)−2M x1Tx+2which is impossible ifF(v)− F∗ < 2M x1Tx+2. So,∀v >0such thatvTAATv =M,F(v)−F∗ ≤ 2M x1Tx+2 ⇒ AATv >0.
Finally, the requirement thatvTAATv=Mcould be remove because normalizing decreasesF:∀v >0,F(v)−F∗≤2M x1Tx+2 ⇒F(q
M
vTAATvv)−F∗≤ 2M x1Tx+2
⇒q
M
vTAATv×AATv >0⇒AATv >0.
This proves the main theorem.
Remark: this idea is Perceptron based: one could increasevand decrease||ATv||
(preciselyFhere) in the same time if there existsk / AkATv <0(i.e. while conver- gence is not reached).
3 Binary property of self concordant Perceptron
The previous section proves that the offered algorithm requires O(e √
M L) damped Newton step. This section focuses on how this step can be done with integer matrix, and, how one can roundvto keep a low binary complexity.
3.1 Removing row normalization
Section 2 presents the algorithm after normalisation of row ofA. This is classical for Perceptron based algorithm. It also gives the same importance to all constraints which is straightforward. Now, this normalization is not required, and, it is not relevant for exact computation. Thus, this section consider the result of section 2 for a raw matrix A ∈ZM×N with total binary sizeL(one may scale rows with low norm to have all rows with similar norm but it is not mandatory).
First, x∗A still exists (currently,x∗Acan be divided byαif all rows are scaled by α). So, existence of F∗ does not change. vstart should be updated because a too largevstartleads to an exponential complexity. Yet,F still decreases when one scales vTAATvsuch thatvTAATv=M (this does not depend onAbeing normalized). So usingvstart=q
M
1TAAT11is still an optimal starting point: this leads toF(vstart)≤ M + M2 log(1TAAT1)− M2 log(M). This value may not be exactly computed but just consideringv = t×1such that vTAATv ∈ [M4,4M]is acceptable. Finally, log(1TAAT1)≤Lso it does not change the number of steps of the first phase.
So, first phase does not change at all depending on the fact thatAis normalized or not.
For the second phase, it changes the bound for convergence becauseF(v+t1k)≤ F(v) + 12AkATkt2−log(1 + vt
k)butAkATk can be as high asL. Thus,tshould be lower thanv2vk
k+1(close to A1
kATk ×v2vk k+1).
So, the resolution will happen only whenF(v)−F∗≤O(e MΥ12xTx)whereΥ2= max
m AmATm≤ 22L. Yet, this is not an issue for the algorithm because convergence is quadratic in phase 2. So, number of additional step is onlyO(log log(Me Υ2xTx)) which is stillO(log(L)).e
So the offered algorithm works with the raw matrixA∈ZM×N with total binary sizeLand findsvsuch thatAATv >0in less thanO(M L)e Newton steps, almost all being in so called first phase whereλ(v)> 14 and each damped Newton step decrease Fby at less501.
3.2 Approximating root computation
Computingλis impossible onQ, but only a coarse approximation is required: F is convex soF(v−θ(∇2vF)−1(∇vF)) ≤ 12(F(v) +F(v−1+λ(v)1 (∇2vF)−1(∇vF))) if 121+λ(v)1 ≤ θ ≤ 1+λ(v)1 . It is therefore sufficient to approximateλby a factor 2 approximation to get (at least) a diminution ofF by 1001 (against 501 with perfect root computation).
Importantly, finding θ such thatθ ≤ √
ρ ≤ θ+ 1is in log(ρ)using bisection, which is too much here. But, findingθ such thatθ ≤ √
ρ ≤ 2θ is in log(log(ρ)) because bisection can be done on power. So, computing an approximation ofλcan be done inO(log(L))e even without specific algorithm for approximating root.
Then, with the same idea, normalizingvTAATv=M is not possible exactly, but vTAATv∈[M4,4M]is possible and still guarantees thatv∈]0,√
4M xTx]M and this decreasesF.
3.3 Rounding
This subsection contains one of the most important claims of the paper: self concordant Perceptron allows a very simple rounding process.
Rounding strategy: LetA ∈ ZM×N withΥ2 = max
m AmATm, and letvsuch thatvTAATv≤4M, then:
∀w∈
0, 1
1000M√ MΥ
M
, F(v+w)≤F(v) + 1 200
In particular,∀v / vTAATv≤4M,
F
f loor(1000M√ MΥ×v1) 1000M√
M υ
...
f loor(1000M√
MΥ×vM) 1000M√
M υ
≤F(v) + 1 200
Proof:
First, the log part only decreases when adding w, thus, only the quadratic part should be considered. SoF(v+w)≤F(v) +12wTAATw+wTAATv.
But,ATw=P
m
wmATmso||ATw|| ≤P
m
wm||ATm|| ≤ ||w||∞MΥ≤ 1
500√ M and
||ATw||2=wTAATw≤ (1000)12M.
So, wTAATv ≤ √
wTAATw×vTAATv ≤ q 1
(500)2M ×4M ≤ 2501 (from Cauchy). And, 12wTAATw=≤ 2×(1000)1 2M ≤ 100050 . Thus, it holds thatF(v+w)≤ F(v) +2001 .
Then, flooring t is a special case of addingτ ∈ [0,1], so the offered rounding scheme correspond to addw∈h
0, 1
1000M√ MΥ
iM
. Corollary:
During all the first phase performing a damped Newton step, a normalization and a flooring with precision1000M√
Mq
maxm AmATmstill decreasesF by at least 2001 (−1001 for approximate damped Newton+2001 for the rounding).
3.4 Performing the Newton step with integers
The 3 key points for a Newton based algorithm to have good behaviour when using exact computation are:
• having an explicit rounding strategy: this is done in section 3.3 ([12] admits the rounding strategy is not trivial with path following)
• being sure that variables will not become too large: this is done in the main theorem asvTAATv≤4Mimplies thatv∈]0,√
4M xTx]M with√
4M xTx= O(L)e (true for path following but not central path [2] which has variable which becomes as high as 1
2
√ M L)
• having linear algebra computation on integer ([12] offers a sketch of such strat- egy but it is based on many not explicit constant and seems to largely increase the size of the matrix to inverse)
This last point is described here for self concordant Perceptron. The Newton direc- tion is given by(∇2vF)−1(∇vF), but, here,
∇2vF =AAT +
1
v12 0 ...
0 ... 0 ... 0 v12 M
But for any not singular matrixD,(∇2vF)−1= (∇2vF)−1D−1D= (D∇2vF)−1D.
Yet,
β×
v21 0 ...
0 ... 0 ... 0 v2M
× ∇2vF =β×
v12 0 ...
0 ... 0 ... 0 vM2
AAT+β×I
Using,β= (400)2M3Υ2, then, the right part is entirely integer.
So, computing the inverse written as H of the integer matrix (400)2M3Υ2 ×
v12 0 ...
0 ... 0 ... 0 vM2
AAT + (400)2M3Υ2 ×Iallows to extract the inverse of∇2vF
which isH×(400)2M3Υ2×
v12 0 ...
0 ... 0 ... 0 vM2
.
Finally, the Newton direction can be computed as
N ewton=H×(400)2M3Υ2×
v21 0 ...
0 ... 0 ... 0 vM2
×
1 v1
...
1 vM
−AATv
Yet, by grouping all term excludingH, one could recovers an integer vector (at a factor400M√
MΥ). So, all computations are done on integer.
Currently, the total binary size of∇2vF seems to be as high asO(Me 2L)and not justL(it is not clear if it is possible to take advantage of the shape of the matrix). Yet, as soon asP, Q∈ZM×M have total binary sizeL, thenP Qcould have a binary size ofO(Me 2L)and not just2Llike for the multiplication of two scalar of binary sizeL.
However, this drawback is completely shared with [12] where each coefficient of the matrix are rounded to2K2Lbut with a not trivialK2.
4 Conclusion
Finally the global self concordant Perceptron is given by Algorithm: self concordant Perceptron:
1. initializev=1
2. while¬(AATv >0)andλ(v)2> 161 (a) computeθ / θ≤λF(v)≤2θ
(b) v=v−1+θ1 (∇2vF)−1(∇vF)using integer matrix (see 3.4) (c) whilevTAATv≥4M,v= 12×v
(d) v= f loor(v×400M√ MΥ) 400M√
MΥ
3. while¬(AATv >0)
(a) computeθa 2 approximation ofλ(v) (b) v=v−1+2θ1 (∇2vF)−1(∇vF)
This algorithm has arithmetic complexity ofO(M L)e Newton steps which is higher by a factor√
M to path following. Yet compared to path following, this algorithm offers a very simple rounding strategy allowing to perform all computation on integer, and guarantee that all variables are properly bounded.
References
[1] Andris Ambainis, Yuval Filmus, and Franc¸ois Le Gall. Fast matrix multiplication:
limitations of the coppersmith-winograd method. In Proceedings of the forty- seventh annual ACM symposium on Theory of Computing, pages 585–593, 2015.
[2] Erling D Anderson, Jacek Gondzio, Csaba M´esz´aros, and Xiaojie Xu. Implemen- tation of interior-point methods for large scale linear programs. InInterior Point Methods of Mathematical Programming, pages 189–252. Springer, 1996.
[3] Sergei Chubanov. A polynomial algorithm for linear optimization which is strongly polynomial under certain conditions on optimal solutions, 2015.
[4] Sergei Chubanov. A polynomial projection algorithm for linear feasibility prob- lems.Mathematical Programming, 2015.
[5] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. InProceedings of the 51st annual ACM SIGACT symposium on theory of computing, 2019.
[6] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learn- ing, 1995.
[7] George B et. al. Dantzig. The generalized simplex method for minimizing a linear form under linear inequality restraints. InPacific Journal of MathematicsAmeri- can Journal of Operations Research, 1955.
[8] H Mansouri and C Roos. Simplified o (nl) infeasible interior-point algorithm for linear optimization using full-newton steps.Optimisation Methods and Software, 22(3):519–530, 2007.
[9] Arkadi Nemirovski. Interior point polynomial time methods in convex program- ming. Lecture notes, 2004.
[10] Yurii Nesterov and Arkadii Nemirovskii.Interior-point polynomial algorithms in convex programming. Siam, 1994.
[11] Javier Pe˜na and Negar Soheili. A deterministic rescaled perceptron algorithm.
Mathematical Programming, 155(1-2):497–510, 2016.
[12] James Renegar. A polynomial-time algorithm, based on newton’s method, for linear programming.Mathematical programming, 40(1):59–93, 1988.
[13] Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 1958.
Appendix: Basic theoretical foundations of linear pro- gramming
Maximal sub determinant of a matrix
An important result for linear programming and linear feasibility is Hadamard bound on the maximal sub determinant of a matrix: ∀A ∈ ZN×N with all entries bounded by2B (i.e. ∀i, j ∈ {1, ..., N},|Ai,j| ≤ 2B), then, Det(A) ≤NN2NB. By exten- sion,A ∈ ZM×N with all entries bounded by2B, any submatrix ofAhas maximal determinant bounded byNN2NB(precisely one could considermin(N, M)instead of N). This maximal sub determinant ofA will be written Ω(A) in the paper i.e.
Ω(A) = max
i1,...,ir,j1,...,jsDet(A{i1,...,ir}×{j1,...js}).
Combining with Cramer rule, it leads that ifAx=b, then,|xn| ≤Ω(A)×(P
m
bm)
and ifxn 6= 0, then,|xn| ≥ Ω(A)1 . It is also true for vertex of a polygon defined by Ax≥b.
Finally, this Hadamard bound can even be refined: ifLis the total binary size ofA thenlog(Ω(A))≤O(L)e (preciselylog(Ω(A))≤O(min(N B, L)), but, for classicale complexity measurement,log(Ω(A))≤O(L)).e
Equivalence between linear programming and linear feasibility
This paper provides an algorithmalgo0which returnsv such thatAATv >0on an inputAassuming ∃x / Ax ≥ 1(undefined behaviour otherwise - v is positive but this does not matter). Trivially,it is thus possible to formalgo1which returnsxsuch thatAx >0on inputAassuming suchxexists by returningATalgo0(A)(undefined behaviour otherwise).
• Thank toalgo1, one could formalgo2(A, b)which returnsxsuch thatAx > b assuming suchxexists (undefined behaviour otherwise). Indeed, let consider any A, b such that∃x / Ax > b, finding such xis equivalent to find a pair x, tsuch thatAx−t×b > 0andt > 0, because xt is then a solution of the original problem. LetA1the matrixAplus1as additional column and(0 1)as additional row. Thus, one can get(x1 t1)by computingalgo1(A)and returning
x1
t1 as output ofalgo2(A, b).
Importantly, only constant number of variables/constraints are added, and, binary size is not increased. So complexity ofalgo2(A, b)is the same thanalgo1(A, b).
• Thank toalgo2, one could formalgo3(A, b)which returnsxsuch thatAx≥b assuming suchxexists. Indeed, if∃x/Ax ≥ b, then a fortiori∃x, tsuch that Ax+t1× > b,0 < t < Ω(A)1 (Ω(A)is the maximal subdeterminant ofA- see beginning of appendix). So, one could callalgo1onA2, b2withA2being Aplus1column plus a row with0andΩ(A)andb2beingbplus two 1. Thus, algo2(A2, b2) =x2, t2.
Now, one could consider greedy improvement of min t
x,t / Ax+t1≥b,t≥0initialized from(x2, t2). Such greedy improvement can be performed by projecting(x, t) on{(x, t)/ Ax+t1≥b}while minimizingt. One such greedy step can simply be done by looking forχ, τsuch thatASχ+t1S = 0andτ =−1withS the saturated rows inAx+t1≥b. If no suchχ, τexists, the greedy improvement has terminated otherwise one could do(x, t) ←(x+µχ, t+µτ)withµsuch thatAx+t1≥b, t≥0. There will be no more thanMsuch greedy purification because one row enter the saturated ones at each step.
When this greedy process terminates, this leads tox,ˆ ˆtwithAˆx+ˆt1≥b,0≤ˆt≤ t2 < Ω(A)1 butx,ˆ ˆtis a vertex ofA. So Cramer rule applies, and sotˆ= Det(SDet(S)t) withSa sub matrix ofAandStthe Cramer partial submatrix related tot. But ˆt ≤t2 ≤ Ω(A)1 , soˆt= 0, and thus,Axˆ≥b. So, this projection ofx2, t2gives x3=algo3(A, b).
Importantly, the binary size ofA2, b2is just twice the binary size ofA, bbecause log(Ω(A))≤L(A), soL(A2) =L(A) + log(Ω(A))≤2L(A)and only a con- stant number of variables constraints are added. so complexity ofalgo3(A, b)is the same thanalgo2(A, b). (Currently, maximal binary size inalgo3is increased by a factorN compared toalgo2- this can have impact on binary complexity but for arithmetic complexity,algo3(A, b)is just twicealgo2(A, b).)
• Let anyA, b-without assumption- solvingAx≥b(or producing a certificate that no solution exists) is equivalent to solve min
z /Az+t1≥b,t≥0t(there is a solution if the minimum is 0). Yet, this last linear program is structurally feasible (x= 0 and a sufficiently largetprovided a feasible point) and bounded becauset≥0.
Thus, primal dual theory gives a system Aprimal−dual(x y) ≥ bprimal−dual
whose solution contains solution of the linear program min
z /Az+t1≥b,t≥0t.
Applyingalgo3(Aprimal−dual, bprimal−dual)provides thus suchxprimal−dual, yprimal−dual from which one could restorex3, t3 with either t3 = 0 and so Ax3≥bort36= 0.
This leads to an algorithmalgo4 which is able to findxsuch thatAx≥ b(or to produce a certificate that no solution exists) without assumption onA, babout the existence or not of suchx.
Importantly, the number of variables-constraints is only scaled two folds when computing the primal dual, so from theoretical point of view, it does not change the complexity betweenalgo4andalgo3.
• Finally, for anyA, b, cwithout any assumption solving min
x /Ax≥bcTxcan be done with 2algo4calls and onealgo3call:
– one to know if the problem is feasible i.e.algo4(A, b) – one on the dual to known if it is boundedalgo4(Adual, bdual)
– and one call toalgo3on the primal dual to get the optimal solution (if pre- vious two computations certify that the problem is feasible and bounded).
Again, from theoretical point of view, the complexity does not change: it only does 3 calls on instances only scaled 2 times. At the end, it returns the optimal solution or a certificate that the problem is not feasible or not bounded.
Thus,algo0 which only returnsv such thatAATv > 0on inputAif there ex- ists such v (undefined behaviour otherwise) allows to build with same complexity algo5(A, b, c)which solves min
x /Ax≥bcTxor returns certificate that problem is either infeasible or unbounded.
Let stress that the opposite way is trivialalgo5(AAT,1,0)is a correct implemen- tation ofalgo1(A)for anyA.
SVM
Assuming∃x∈ QN /Ax >0, then,∃x∈QN ≥1because min 1
mAmx ×xverifies Ax ≥ 1(ifAx > 0) withA ∈ ZM×N, then, Perceptron will converge inO(xe Tx) steps while the self concordant Perceptron inO(Me log(xTx)).
Now, this claim holds for anyx. Typically, let considerxwithAx≥1such that there is a maximal number of rows saturated and a maximal number of null coordinates.
The impossibility to increase the number of constraints implies thatxis the solution of a system of equation containing rows fromAand rows fromI. Such each coordinate of xis bounded by a maximal determinant fromA. So,log(xTx)≤log(Ω(A))≤O(L).e Precisely,log(xTx)≤O(min(L, N B))e but theLpart is the classical way to measure complexity.
Let stress that the bound could also be expressed by introducing the solution of the support vector machine problem [6] which consists in solvingmin xTx
x / Ax≥1.