Modular composition via complex roots

(1)

HAL Id: hal-01455731

https://hal.archives-ouvertes.fr/hal-01455731v2

Preprint submitted on 27 Mar 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Joris van der Hoeven, Grégoire Lecerf

To cite this version:

Joris van der Hoeven, Grégoire Lecerf. Modular composition via complex roots. 2017. �hal- 01455731v2�

(2)

Modular composition via complex roots

JORIS VAN DERHOEVEN^a, GRÉGOIRELECERF^b

Laboratoire d'informatique de l'École polytechnique, CNRS UMR 7161 École polytechnique

91128 Palaiseau Cedex, France a. Email:vdhoeven@lix.polytechnique.fr

b. Email:lecerf@lix.polytechnique.fr Preprint version, March 27, 2017

Modular composition is the problem to compute the composition of two univariate polynomials modulo a third one. For polynomials with coefficients in a finite field, Kedlaya and Umans proved in 2008 that the theoretical complexity for performing this task could be made arbitrarily close to linear. Unfortunately, beyond its major theoretical impact, this result has not led to practically faster implementations yet. In this paper, we explore the particular case when the ground field is the field of computable complex numbers. Ultimately, when the precision becomes sufficiently large, we show that modular compositions may be performed in softly linear time.

1. I

NTRODUCTION

Let 𝕂 be an effective field, and let f,g,h be polynomials in 𝕂[x]. The problem of modular composition is to compute g ∘ f modulo h. Modular composition is an important problem in complexity theory because of its applications to polynomial factorization [14, 15, 16]. It also occurs very naturally whenever one wishes to perform polynomial computations over 𝕂inside an algebraic extension of 𝕂. In addition, given two different representations 𝕂[x] / (h(x)) ≅ 𝕂[x˜]/(h˜(x˜))of an algebraic extension of𝕂, the implementation of an explicit isomorphism actually boils down to modular composition.

Denote by M(n) the number of operations in 𝕂 required to multiply two polynomials of degree<n in𝕂[x]. Let f,gandhbe polynomials in𝕂[x]of respective degrees<n,<nandn.

The naive modular composition algorithm takesO(nM(n))operations in𝕂. In 1978, Brent and Kung [3] gave an algorithm with costO(√n^M(n)+n²). It uses thebaby-step giant-steptechnique due to Paterson and Stockmeyer [21], and even yields a sub-quadratic costO(n^𝜛+√n^M𝕂(n)) when using fast linear algebra (see [13, p. 185]). The constant𝜛 > 1^.5 is such that a√ ×n √n matrix over 𝕂 can be multiplied with another √ ×n n rectangular matrix in time O(n^𝜛). The best current bound𝜛<1^.6667is due to Huang and Pan [12, Theorem 10.1].

A major breakthrough has been achieved by Kedlaya and Umans [15,16] in the case when𝕂 is the ﬁnite ﬁeld𝔽q. For any positive𝜀 > 0, they showed that the compositiong∘f modulo h could be computed with bit complexityO((nlogq)^1+𝜀). Unfortunately, it remains a major open problem to turn this theoretical complexity bound into practically useful implementations.

Quite surprisingly, the existing literature on modular composition does not exploit the simple observation that composition modulo a separable polynomialh∈ 𝕂[x]that splits over𝕂can be reduced to the well known problems of multi-point evaluation and interpolation [6, Chapter 10].

More precisely, assume thath=(x−𝜎1)⋯ (x−𝜎n)is separable, which means thatgcd(h,h')=1. Iff,g∈𝕂[x]are of degree<n, theng∘fmodhcan be computed by evaluatingf at𝜎1,…,𝜎n, by evaluatinggat f(𝜎1),…,f(𝜎n), and by interpolating the evaluations(g∘f)(𝜎1),…,(g∘f)(𝜎n)to yieldg∘fmodh.

1

(3)

Whenever𝕂is algebraically closed and a factorization ofhis known, the latter observation leads to a softly-optimal algorithm for composition moduloh. More generally, if the computation of a factorization of hhas a negligible or acceptable cost, then this approach leads to an efficient method for modular composition. In this paper, we prove a precise complexity result in the case when𝕂is the field of computable complex numbers. In a separate paper [11], we also consider the case when𝕂is a finite field andhhas composite degree; in that case,hcan be factored over suitable field extensions, and similar ideas lead to improved complexity bounds.

In the special case of power series composition (i.e. composition moduloh=xⁿ), our approach is similar in spirit to the analytic algorithm designed by Ritzmann [22]; see also [8]. In order to keep the exposition as simple as possible in this paper, we only study composition modulo separable polynomials. By handling multiplicities with Ritzmann's algorithm, we expect our algorithm to extend to the general case.

The organization of the present paper is as follows. In section2, we specify the complexity model to be used, and various standard notations. In section 3, we give a detailed version of the modular composition algorithm that we sketched above for a separable modulus that splits over𝕂. In order to instantiate this algorithm for the fieldℂ^comof computable complex numbers, we need additional concepts. In section4, we recall basic results aboutball arithmetic[9]. In sec- tion5, we recall the computation model ofstraight-line programs[4]. In section6, we introduce a new ultimate complexity model that is convenient for proving complexity results at a “sufficiently large precision”. This model has the advantage that complexity results over an abstract effective field𝕂can naturally be turned into ultimate complexity results overℂ^com. In section7, we apply this transfer principle to the modular composition algorithm from section3—we expect our framework to be useful in many other situations.

One disadvantage of ultimate complexity analysis is that it does not provide us with any information about the precision from which the ultimate complexity is reached. In practical applications, the input polynomials f, g andh often admit integer or rational coefficients. In these cases, the required bit precision is expected to be of order n(l+n)in the worst case, where n = deghand lis the largest bit size of the coefficients: in fact, this precision allows to compute all the complex roots of h efficiently using algorithms from [18, 19, 24]. This precision should also be sufficient to perform the multi-point polynomial evaluations ofgand f by asymp- totically fast algorithms. We intend to work out more such detailed bit complexity bounds for this situation in a forthcoming paper.

2. P

RELIMINARIES

In the sequel, we consider both the algebraic and bit complexity models for analyzing the costs of our algorithms. The algebraic complexity model expresses the running time in terms of the number of operations in some abstract ground ring or ﬁeld [4, Chapter 4]. The bit complexity modelrelies on Turing machines with a suﬃcient number of tapes [20].

Fundamental algebraic complexity bounds. Let𝕂be an eﬀective ﬁeld. We writeM:ℕ→ℝ^>

for a function that bounds the total cost of a polynomial product algorithm in terms of the number operations in𝕂. In other words, two polynomials of degreesnin𝕂[x]can be multiplied using at mostM(n)arithmetic operations in𝕂. The schoolbook algorithm allows us to takeM(n)=O(n²). The fastest currently known algorithm [5] yieldsM(n) =O(n log nlog logn) =O˜(n). Here, the soft-Oh notation f(n) ∈O˜(g(n))means that f(n) = g(n) log^O(1)g(n)(we refer the reader to [6, Chapter 25, Section 7] for technical details). In order to simplify the cost analysis of our algorithms we make the customary assumption that M(n₁)/n₁⩽^M(n₂)/n₂for all 0 <n₁⩽n₂. Notice that this assumption implies thesuper-additivityofM, namelyM(n₁)+^M(n₂)⩽^M(n₁+n₂) for alln₁⩾0andn₂⩾0.

(4)

Fundamental bit complexity bounds. For bit complexity analyses, we consider Turing machines with suﬃciently many tapes. We write I(n)for a function that bounds the bit-cost of an algorithm which multiplies two integers of bit sizes at most n, for the usual binary repre- sentation. The best known bound [7] forI(n)isO(nlogn8^log^∗ⁿ) =O˜(n). Again, we make the customary assumption thatI(n)/nis nondecreasing.

Multipoint evaluation and interpolation. Let𝕂 again be an eﬀective ﬁeld. The remainder (resp. quotient) of the Euclidean division of g byh in𝕂[x] is denoted by grem h (resp. by g quo h). It may be computed using O(^M(n)) operations in 𝕂, if g and h have degrees ⩽n.

We recall that the gcd of two polynomials of degrees at mostnover 𝕂can be computed using O(^M(n) logn)operations in𝕂 [6, Algorithm 11.4]. Given polynomials f andg₁, …,gl over𝕂 withdegf=nanddegg₁+⋯+deggl=O(n), all the remaindersfremgimay be computed simul- taneously in cost O(^M(n) logl)using a subproduct tree[6, Chapter 10]. The inverse problem, calledChinese remaindering, can be solved with a similar costO(^M(n) logl), assuming that thegi

are pairwise coprime. The fastest known algorithms for these tasks can be found in [1,2,10].

3. A

BSTRACT MODULAR COMPOSITION IN THE SEPARABLE CASE For any ﬁeld𝕂andn∈ℕ, we denote

𝕂[x]<n ≔ {P∈𝕂[x]:degP<n}^.

In this section, 𝕂 represents an abstract algebraically closed ﬁeld of constants. Let h = xⁿ + h_n−1xⁿ⁻¹+⋯ +h₀∈𝕂[x]be a separable monic polynomial, sohadmitsnpairwise distinct roots 𝜎1,…,𝜎nin𝕂. Then we may use the following algorithm for composition moduloh:

Algorithm 1

Input. Polynomials f,g∈𝕂[x]<nand pairwise distinct𝜎1,…,𝜎n∈𝕂. Output. f∘gremh, whereh=(x−𝜎1)⋯(x−𝜎n).

1. Computev₁=f(𝜎1),…,vn=f(𝜎n)using fast multi-point evaluation.

2. Computew₁=g(v₁),…,wn=g(vn)using fast multi-point evaluation.

3. Retrieve𝜚∈𝕂[x]<nwith𝜚(𝜎1)=v₁,…,𝜚(𝜎n)=v_nusing fast interpolation.

4. Return𝜚.

THEOREM1. Algorithm1is correct and requires O(^M(n)logn)operations in𝕂.

Proof. By construction,𝜚(𝜎i) = (g∘f)(𝜎i) = (g∘f remh)(𝜎i)for i= 1, …,n. Sincedeg 𝜚 <n and the𝜎iare pairwise distinct, it follows that𝜚=g∘fremh. This proves the correctness of the algorithm. The complexity bound follows from the fact that steps 1, 2 and 3 takeO(^M(n) logn)

operations in𝕂. □

We wish to apply the theorem in the case when𝕂= ℂ. Of course, on a Turing machine, we can only approximate complex numbers with arbitrarily high precision, and likewise for the field operations inℂ. For given numbersxandy, approximations at precisionpforx+y,x−y,x×yand x/y(whenevery≠ 0) can all be computed in timeO(Î(p)). In view of Theorem1, it is therefore natural to ask whetherp-bit approximations of the coefficients ofg∘fremhmay be computed in timeO(Î(p)^M(n)logn).

In the remainder of this paper we give a positive answer to a carefully formulated version of this question. Our first task is to make the concept of “approximations at precisionp” more precise and to understand the way errors accumulate when performing a sequence of computations at precision p. We rely on “fixed point ball arithmetic” for this matter, as described in the next subsection. At a second stage, we prove a complexity bound for modular composition that holds for a fixed modulushwith known roots𝜎1,…,𝜎nand for sufficiently large working precisionsp.

JORIS VAN DERHOEVEN, GRÉGOIRELECERF 3

(5)

The assumption that the roots𝜎1,…,𝜎nofhare known is actually quite harmless in this con- text for the following reason: as soon as approximations for𝜎1,…,𝜎nare known at a suﬃciently high precision, the computation of even better approximations can be done fast using Newton's method combined with multi-point evaluation. Since we are only interested in the complexity for

“suﬃciently large working precisions”, the computation of the initial approximations of𝜎1,…,𝜎n

can therefore be regarded as a precomputation of negligible cost.

4. B

ALL ARITHMETIC AND STRAIGHT

-

LINE PROGRAMS

4.1. Fixed point numbers

Letabe a real number, we write⌊a⌋for the largest integer less or equal toaand⌊a⌉≔⌊a+ /_{1 2}⌋for the closest integer toa.

Given a precision p∈ ℕ, we denote by𝔻p= ℤ 2^−pthe set of ﬁxed point numbers with p binary digits after the dot. This set𝔻pis clearly stable under addition and subtraction. We can also deﬁne approximate multiplication×pon𝔻pusingx×py=⌊2^px y⌉2^−p, so|x×py−x y| ⩽2^−p−1 for allx,y∈𝔻p.

For any fixed constantK> 0andx,y∈ 𝔻p∩ [−K,K], we notice thatx+yandx−y can be computed in time O(p), whereas x×py can be computed in time I(p) +O(p). Similarly, one may define an approximate inversion𝜄pon𝔻≠p≔ 𝔻p∖ {0}by𝜄p(x) = ⌊2^px⁻¹⌉ 2^−p. For any fixed constantK>0andx∈𝔻≠p∩[−K,K], we may compute𝜄p(x)in timeO(Î(p)).

4.2. Fixed point ball arithmetic

Ball arithmetic is used for providing reliable error bounds for approximate computations. Aball is a setℬ(c,r)={z∈ℝ:|z−c|⩽r}withc∈ℝandr∈ℝ^⩾. From the computational point of view, we represent such balls by their centerscand radiir. We denote by𝔹pthe set of balls with centers in𝔻pand radii in𝔻⩾p. Given vectorsx= (x₁, …,xn) ∈ ℝⁿand𝒙 = (𝒙1, …, 𝒙n) = (ℬ(c₁,r₁), …, ℬ(cn,rn)) ∈ 𝔹np we write x ∈ 𝒙 to mean x₁ ∈ 𝒙1 ∧ ⋯ ∧ xn ∈ 𝒙n, and we also set rad(𝒙) ≔ max(r₁,…,r_n).

LetDbe an open subset ofℝⁿ. We say that𝑫p⊆ 𝔹npis adomain lift at precision pif𝒙 ⊆D for all 𝒙 ∈ 𝑫p. The maximal such lift is given by𝑫p= {𝒙 ∈ 𝔹np: 𝒙 ⊆D}. Given a function f: D→ ℝ^m, aball liftof f at precisionpis a function𝒇p: 𝑫p→ 𝔹mp, where𝑫p= dom 𝒇pis a domain lift of D at precision p, that satisﬁes the inclusion property: for any𝒙 = (𝒙1, …, 𝒙n) ∈ 𝑫npand x=(x₁,…,xn)∈ℝⁿ, we have

x∈𝒙 ⟹ f(x)∈𝒇p(𝒙)^.

Aball lift 𝒇 of f is a computable sequence(𝒇p)p∈ℕof ball lifts at every precision such that for any sequence(𝒙p)p∈ℕwith𝒙p∈dom𝒇p, we have

p→∞limrad(𝒙p)=0 ∧ _p∈ℕ

∩

^𝒙^p^{≠∅ ⟹ lim}^p→∞^rad(𝒇^p^(𝒙^p⁾⁾⁼⁰^.

This condition implies the following:

p→∞limrad(𝒙p)=0 ∧ _p∈ℕ

∩

^𝒙^p^={^x^{} ⟹} _p∈ℕ

∩

^𝒇^p^(𝒙^p^)={^f⁽^x^)}^.

We say that𝒇 ismaximalifdom 𝒇pis the maximal domain lift for eachp. Notice that a function f must be continuous in order to admit a maximal ball lift.

The following formulas deﬁne maximal ball lifts⊕p,⊖pand⊗pat precision pfor the ring operations+, − and×:

ℬ(a,r)⊕pℬ(b,s) ≔ ℬ(a+b,r+s) ℬ(a,r)⊖pℬ(b,s) ≔ ℬ(a−b,r+s)

ℬ(a,r)⊗pℬ(b,s) ≔ ℬ(a×pb,(|a|+r)×ps+|b|×ps+2^1−p)^.

(6)

The extra2^1−pin the formula for multiplication is needed in order to counter the eﬀect of rounding errors that might occur in the three multiplicationsa×pb,(|a|+r)×psand|b|×ps. Forℬ(a,r)∈𝔹p

withr<|a|, the following formula also deﬁnes a maximal ball lift𝜾pat precisionpfor the inversion:

𝜾p(ℬ(a,r)) ≔ ℬ(𝜄p(a),𝜄p(|a|−r)−𝜄p(|a|)+2^1−p)^.

For any ﬁxed constant K> 0anda,r,b,s∈ 𝔻p∩ [−K,K], we notice thatℬ(a,r) ⊕pℬ(b,s), ℬ(a,r)⊖pℬ(b,s),ℬ(a,r)⊗pℬ(b,s)and𝜾p(ℬ(a,r))can be computed in timeO(^I(p)).

Let𝒇 be the ball lift of a function f:D→ ℝ^mwithD⊆ ℝⁿ. Consider a second ball lift𝒈of a functiong:E→ℝ^lwith f(D)⊆E⊆ℝ^m. Then we may deﬁne a ball lift𝒈∘𝒇 of the composition g∘f:D→ ℝ^las follows. For each precisionp, we take(𝒈 ∘ 𝒇 )p= 𝒈p∘ (𝒇p)|Dp, where(𝒇p)|DPis the restriction of𝒇pto the setDp={𝒙∈dom𝒇p:𝒇p(𝒙)∈dom𝒈p}.

We shall use ball arithmetic for the computation of complex functions ℂⁿ→ℂ^m simply through the consideration of real and imaginary parts. This point of view is suﬃcient for the asymptotic complexity point of view of the present paper. Of course, it would be more eﬃcient to directly compute with complex balls (i.e. balls with a complex center and a real radius), but this would involve approximate square roots and ensuing technicalities.

4.3. The Lipschitz property

Assume that we are given the ball lift𝒇 of a functionf:D→ℝ^mwithD⊆ℝⁿ. Given a subsetU⊆D and constants𝜆⩾0,𝜇⩾0, we say that the ball lift𝒇 is(𝜆,𝜇)-LipschitzonUif

∃p₀∈ℕ, ∃𝜚>0, ∀p⩾p₀, ∀𝒙∈𝔹np,

𝒙⊆U ∧ rad(𝒙)⩽𝜚 ⟹ 𝒙∈dom𝒇p ∧ rad(𝒇p(𝒙))⩽𝜆rad(𝒙)+𝜇2^−p^.

For instance, the ball lifts⊕and⊖of addition and subtraction are(2,0)-Lipschitz onℝ². Simi- larly, the ball lift⊗of multiplication is(3𝜆,3)-Lipschitz onU= {(x,y)∈ℝ²:|x| ⩽𝜆, |y| ⩽𝜆}(by taking𝜌=𝜆), whereas the ball lift𝜾of𝜄is(𝜆,3)-Lipschitz onU={x∈ℝ:𝜆^−1/2⩽|x|}.

Given 𝒇 and𝜆 > 0, 𝜇 ⩾ 0 as above, we say that 𝒇 is locally(𝜆, 𝜇)-Lipschitz onU if 𝒇 is (𝜆, 𝜇)-Lipschitz on each compact subset of U. We deﬁne𝒇 to be𝜆-Lipschitz (resp. locally𝜆- Lipschitz) on U if there exists a constant𝜇 > 0 for which 𝒇 is (𝜆, 𝜇)-Lipschitz (resp. locally (𝜆, 𝜇)-Lipschitz). If𝒇 is locally𝜆-Lipschitz onU, then it is not hard to see that f is necessarily locally Lipschitz onU, with Lipschitz constant𝜆. That is,

∀x∈U, ∃𝜂>0, ∀a,b∈ℬ(x,𝜂)∩U, ‖f(b)−f(a)‖∞⩽𝜆‖b−a‖∞.

In fact, the requirement that a computable ball lift𝒇 is𝜆-Lipschitz implies that we have a means to compute high quality error bounds. We ﬁnally deﬁne𝒇 to be Lipschitz (resp. locally Lipschitz) onUif there exists a constant𝜆>0for which𝒇 is𝜆-Lipschitz (resp. locally𝜆-Lipschitz).

LEMMA2. Let 𝒇 be a locally(𝜆, 𝜇)-Lipschitz ball lift of f:D→ℝ^mon an open set U. Let 𝒈be a locally(𝜆',𝜇')-Lipschitz ball lift of g:E→ℝ^lon an open set V. If f(D)⊆E and f(U)⊆V, then 𝒈∘𝒇 is a locally(𝜆𝜆',𝜇𝜆'+𝜇')-Lipschitz ball lift of g∘f on U.

Proof. Consider a compact subsetC ⊆U. Since this implies f(C)to be a compact subset of f(U) ⊆ V, it follows that there exists an 𝜀 > 0 such that f(C) + ℬ(0, 𝜀) ⊆ V. Let p₀∈ ℕ, 0<𝜚<(𝜀−𝜇2^−p)/𝜆and0<𝜚'be such that for anyp⩾p₀,𝒙∈𝔹npand𝒚∈𝔹mp, we have

𝒙⊆C∧rad(𝒙)⩽𝜚 ⟹ 𝒙∈dom𝒇p∧rad(𝒇p(𝒙))⩽𝜆rad(𝒙)+𝜇2^−p<𝜀 (𝒚⊆f(C)+ℬ(0,𝜀))∧rad(𝒚)⩽𝜚' ⟹ 𝒚∈dom𝒈p∧rad(𝒈p(𝒚))⩽𝜆'rad(𝒚)+𝜇'2^−p^.

(7)

Given x∈ 𝔹npwith𝒙 ⊆Candrad(𝒙) ⩽ 𝜚, it follows that𝒚 ≔ 𝒇p(𝒙)satisﬁesrad(𝒚) < 𝜀, whence 𝒚 ⊆ f(C) + ℬ(0, 𝜀). If we also assume that rad(𝒙) ⩽ (𝜚'−𝜇 2^−p)/𝜆, then it also follows that rad(𝒚) ⩽ 𝜚', whence𝒚 ∈ dom 𝒈pandrad(𝒈p(𝒚)) ⩽ 𝜆'(𝜆 rad(𝒙) + 𝜇 2^−p) + 𝜇'2^−p= 𝜆 𝜆'rad(𝒙) + (𝜇 𝜆'+𝜇') 2^−p. In other words, if𝒙⊆Candrad(𝒙)⩽min(𝜚,(𝜚'−𝜇 2^−p)/𝜆), then𝒙∈dom(𝒈p∘𝒇p) andrad((𝒈p∘𝒇p)(𝒙))⩽𝜆𝜆'rad(𝒙)+(𝜇𝜆'+𝜇')2^−p. □

5. S

TRAIGHT

-

LINE PROGRAMS

Asignatureis a ﬁnite or countable set of function symbolsℱtogether with an arityrf ∈ ℕfor each f∈ℱ. Amodelforℱis a setKtogether with a function fK:Uf→KwithUf⊆K^r^ffor each k ∈ ℱ. IfK is a topological space, then Uf is required to be an open subset of K^r^f. Let𝒱 be a countable and ordered set of variable symbols.

Astraight-line programΓwith signatureℱis a sequenceΓ1,…,Γℓof instructions of the form Γk ≡ Xk≔fk

(

^Yk,1,…,Y_k,r_fk

)

,

where fk ∈ ℱand Xk, Y_k,1, …,Y_k,r_fk ∈ 𝒱, together with a subset 𝒪Γ⊆ {X₁, …,X_ℓ} of output variables. Variables that appear for the ﬁrst time in the sequence in the right-hand side of an instruction are calledinput variables. We denote byℐΓthe set of input variables. The numberℓ is called thelengthofΓ.

There exist unique sequences I₁< ⋯ < In and O₁ < ⋯ < Om with ℐΓ = {I₁, …, In} and 𝒪Γ = {O₁, …, Om}. Given a model K ofℱ we can run Γ for inputs in K, provided that the arguments Y_k,1, …,Y_k,r_fkare always in the domain of fkwhen executing the instruction Γk. Let D_Γ,Kbe the set of tuplesI=(I₁,…,I_n)∈Kⁿon whichΓcan be run. GivenI∈Kⁿ, letΓK(I)∈K^m denote the value of (O₁, …,O_m)at the end of the program. Hence Γ gives rise to a function ΓK:D_Γ,K→K^m.

Now assume that(ℝ,(f_ℝ)f∈ℱ)is a model forℱand that we are given a ball lift𝒇 of f_ℝfor eachf∈ℱ. Then𝔹pis also a model forℱat each precisionp, by taking f_𝔹_p=𝒇pfor eachf∈ℱ. Consequently, any SLPΓas above gives rise to both a functionΓℝ:D_Γ,ℝ→ℝ^mand a ball liftΓ𝔹p: D_Γ,𝔹_p→𝔹mp at each precisionp. The sequence(Γ𝔹p)pthus provides us with a ball lift𝚪forΓℝ. PROPOSITION3. If the ball lift 𝒇of f_ℝis Lipschitz for each f∈ℱ, then𝚪is again Lipschitz.

Proof. For each modelK ofℱ, for each variablev∈ 𝒱 and each inputI= (I₁, …,In) ∈D_Γ,K, let v_K,k(I)denote the value ofvafter stepk. We may regardv_K,kas a function fromD_Γ,K toK.

In particular, we obtain a computable sequence of functionsv_𝔹_p_,kthat give rise to a ball lift𝒗^(k) of v_ℝ,k. Let us show by induction over kthat𝒗^(k)is Lipschitz for everyv∈ 𝒱. This is clear for k=0, so letk>0. Ifv≠Xk, then we have𝒗^(k)=𝒗^(k−1); otherwise, we have

𝒗^(k)=𝒇k

(

^𝒀^k,1^(k−1)^,…,𝒀^k,r(k−1)fk

)

^.

In both cases, it follows from Lemma2 that𝒗^(k)is again a Lipschitz ball lift. We conclude by

noticing that𝚪=

(

𝑶1(ℓ),…,𝑶n(ℓ)

)

^. □

6. C

OMPUTABLE NUMBERS AND ULTIMATE COMPLEXITY

A real numberx∈ ℝis said to becomputableif there exists anapproximation algorithm xˇ that takes p∈ ℕon input and producesxˇ(p) ∈ 𝔻pon output with|x−xˇ(p)| ⩽ 2^−p(we say thatxˇ(p)is a2^−p-approximationofx). We denote byℝ^comthe ﬁeld of computable real numbers.

LetT(p)be a nondecreasing function. We say that a computable real numberx∈ ℝ^comhas ultimate complexity T(p)if it admits an approximation algorithmxˇ that computesxˇ(p)in time T(p+ 𝛿)for some ﬁxed constant𝛿∈ℕ. The fact that we allow xˇ(p)to be computed in time T(p + 𝛿)and not T(p) is justiﬁed by the observation that the position of the “binary dot” is somewhat arbitrary in the approximation process of a computable number.

(8)

The notion of approximation algorithm generalizes to vectors with real coeﬃcients: given v∈ (ℝ^com)ⁿ, an approximation algorithm forvas a whole is an algorithmvˇ that takes p∈ ℕon input and returnsvˇ(p) ∈ 𝔻rpon output with|vˇ(p)i−vi| ⩽ 2^−pfori= 1, …,n. This deﬁnition natu- rally extends to any other mathematical objects that can be encoded by vectors of real numbers:

complex numbers (by their real and complex parts), polynomials and matrices (by their vectors of coeﬃcients), etc. The notion of ultimate complexity also extends to any of these objects.

A ball lift𝒇 is said to becomputableif there exists an algorithm for computing𝒇pfor allp∈ℕ. A computable ball lift𝒇 of a function f:D→ℝ^mwithD⊆ℝⁿallows us to compute the restriction of f to D∩ (ℝ^com)ⁿ: givenx ∈D∩ (ℝ^com)ⁿ with approximation algorithmxˇ, by taking 𝒙p= ℬ(xˇ(p),2^−p)∈𝔹np, we have

∩

^p∈ℕ^𝒙^p^={^x^}^,

∩

^p∈ℕ^𝒇^p^(𝒙^p^)={^f⁽^x^)}^{, and}^lim^p→∞^rad(𝒇^p^(𝒙^p⁾⁾⁼⁰^.

Let F be a nondecreasing function and assume that Dis open. We say that𝒇 hasultimate complexity F(p)if for every compact setC⊆D, there exist constantsp₀∈ℕ,𝜚>0and𝛿∈ℕsuch that for any p⩾p₀and𝒙p∈ dom 𝒇pwith𝒙p⊆C andrad(𝒙p) ⩽ 𝜚, we can compute𝒇p(𝒙p)in time F(p+𝛿). For instance,⊕and⊖have ultimate complexityO(p), whereas⊗and𝜾have ultimate complexityO(^I(p)).

PROPOSITION4. Assume that 𝒇 is locally Lipschitz. If 𝒇 has ultimate complexity F(p)and x∈ D∩(ℝ^com)ⁿhas ultimate complexity T(p), then f(x)has ultimate complexity T(p)+F(p). Proof. Letxˇ be an approximation algorithm forxof complexityT(p+ 𝛿), where𝛿 ∈ ℕ. There exist p₀∈ℕand a compact ballCaroundxwithC⊆domf and such that𝒙p=ℬ(xˇ(p),2^−p)∈𝔹np

is included in C for all p ⩾ p₀. There also exists a constant 𝛿'∈ ℕ such that 𝒇p(𝒙p) can be computed in timeF(p+ 𝛿')for allp⩾p₀. Since𝒇 is locally Lipschitz, there exists yet another constant 𝛿''∈ ℕsuch thatrad(𝒇p(𝒙p)) ⩽ 2^𝛿''−p for p⩾ p₀. Forq= p−𝛿''⩾ max (p₀−𝛿'', 0) and 𝛿'''= max (𝛿, 𝛿'), this shows that we may compute a 2^−q-approximation of f(x)in time

T(q+𝛿''')+F(q+𝛿'''). □

PROPOSITION5. Assume that 𝒇 andgare two locally Lipschitz ball lifts of f and g that can be composed. If 𝒇andghave respective ultimate complexities F(p)and G(p), then𝒈∘𝒇has ultimate complexity F(p)+G(p).

Proof. In a similar way as in the proof of Lemma2, the evaluation of(𝒈∘𝒇 )p(𝒙p)for𝒙p∈dom𝒇p

with 𝒙p⊆C andrad(𝒙p) ⩽ 𝜚 boils down to the evaluation of 𝒇p at𝒙p and the evaluation of𝒈p

at𝒚p≔ 𝒇p(𝒙p) ⊆C'≔f(C) + ℬ(0, 𝜀)withrad(𝒚p) ⩽ 𝜚'. Modulo a further lowering of𝜚and𝜚'if necessary, these evaluations can be done in timeF(p+𝛿)andG(p+𝛿')for suitable𝛿,𝛿'∈ℕand

suﬃciently large p. □

THEOREM6. Assume that ℝ is a model for the function symbols ℱ, and that we are given a computable ball lift 𝒇 of f_ℝfor each f∈ℱ. For each f∈ℱ, assume in addition that 𝒇is locally Lipschitz, and let F_fbe a nondecreasing function such that 𝒇has ultimate complexity F_f(p). Let Γ=Γ1,…,Γℓbe an SLP overℱwhose k-th instructionΓkwrites Xk≔fk(Y_k,1,…,Y_k,r_f). Then, the ball lift𝚪of Γℝhas ultimate complexity

F_Γ(p) ≔ F_f₁(p)+⋯+F_f_ℓ(p)^.

Proof. This is a direct consequence of Proposition5. □ COROLLARY7. LetΓbe an SLP of lengthℓoverℱ={0,1,+,−,×,𝜄}(where0and1are naturally seen as constant fonctions of arity zero). Then, there exists a ball lift 𝚪 of Γℝwith ultimate complexity O(^I(p)ℓ).

Proof. We use the ball lifts of section 4 for each f ∈ {+, −, ×, 𝜄}: they are locally Lipschitz and computable with ultimate complexityO(^I(p)). We may thus apply the Theorem6to obtain

F_Γ(p)=O(^I(p)ℓ). □

(9)

7. U

LTIMATE MODULAR COMPOSITION FOR SEPARABLE MODULI LEMMA 8. There exists a constant 𝜅 > 0 such that the following assertion holds. Let f, g ∈ ℂ^com[x]<n, let 𝜎1, …, 𝜎nbe pairwise distinct elements of ℂ^com, and let h= (x−𝜎1) ⋯ (x−𝜎n). Assume that (f₀, …, f_n−1, g₀,…,g_n−1, 𝜎1, …, 𝜎n) has ultimate complexity T(n, p). Then 𝜚 = g∘fremh has ultimate complexity T(n,p)+𝜅^I(p)^M(n)logn.

Proof. The algorithm for fast multi-point evaluation of a polynomialP= ∑_iⁿ⁻¹₌₀Pixⁱ∈ 𝕂[x]<n

at𝜉1, …, 𝜉n∈ 𝕂can be regarded as an SLP overℱ = {0, 1, +,−, ×, 𝜄} of lengthO(^M(n) logn) that takes (P₀, …,P_n−1, 𝜉1, …, 𝜉n) ∈ 𝕂²ⁿon input and that produces(P(𝜉1), …,P(𝜉n)) ∈ 𝕂ⁿon output. Similarly, the algorithm for interpolation can be regarded as an SLP overℱ of length O(^M(n) logn)that takes(𝜉1,…,𝜉n,v₁,…,vn)∈𝕂²ⁿon input and that produces(P₀,…,P_n−1)∈𝕂ⁿ on output withv₁=P(𝜉1),…,vn=P(𝜉n). Altogether, we may regard the entire Algorithm1as an SLPΓoverℱof lengthO(^M(n) logn)that takes(f₀, …,f_n−1,g₀, …,g_n−1, 𝜎0, …, 𝜎n−1) ∈ 𝕂³ⁿon input and that produces(𝜚0,…,𝜚n−1)∈𝕂ⁿon output with𝜌=g∘fremh=∑_i=0ⁿ⁻¹𝜌ixⁱ∈𝕂[x]<n. It follows from Corollary7thatΓℝadmits a ball lift𝚪of ultimate complexityO(^I(p)^M(n) logn).

The conclusion now follows from Proposition4. □

According to the above lemma, we notice that the time complexity for computing 𝜚 = g∘fremhisT(n,p+𝛿)for some constant𝛿that depends onn, f, g, and the𝜎i.

LEMMA9. There exists a constant𝜅>0such that the following assertion holds. Let h∈ℂ^com[x] be separable and monic of degree n, and denote the roots of h by𝜎=(𝜎1,…,𝜎n). If h has ultimate complexity T(n,p), then𝜎has ultimate complexity T(n,p)+𝜅^I(p)^M(n)logn.

Proof. There are many algorithms for the certiﬁed computation of the roots of a separable complex polynomial. We may use any of these algorithms as a “fall back” algorithm in the case that we only need a2^−p-approximation of𝜎at a low precisionpdetermined byhonly.

For general precisionsp, we use the following strategy in order to compute a ball𝝈∈𝔹npwith 𝜎∈𝝈andrad(𝝈)⩽2^−𝛼pfor some fixed threshold_{1 2}/ <𝛼<1. For some suitablep₀∈ℕandp⩽p₀, we use the fall back algorithm. For p>p₀and for a second fixed constant_{1 2}/ < 𝛽 < 1, we first compute a ball enclosure𝝉∈𝔹qnat the lower precisionq=⌈𝛽p⌉using a recursive application of the method. We next compute𝝈using a ball version of the Newton iteration, as explained below.

If this yields a ball𝝈with acceptable radiusrad(𝝈)⩽2^−𝛼p, then we are done. Otherwise, we resort to our fall-back method. Such calls of the fall-back method only occur if the default threshold precisionp₀was chosen too low. Nevertheless, we will show that there exists a thresholdp₁such that the computed𝝈by the Newton iteration always satisﬁesrad(𝝈)⩽2^−𝛼pfor p⩾p₁.

Let us detail how we perform our ball version of the Newton iteration. Recall that𝝉 ∈ 𝔹qn

with 𝜎 ∈ 𝝉 and rad(𝝉) ⩽ 2^−𝛼𝛽p is given. We also assume that we computed once and for all a 2^−p-approximation of h, in the form of a ball polynomial 𝒉p ∈ 𝔹p[i][x] of radius 2^−p that contains h. Now we evaluate 𝒉pand 𝒉'pat each of the points 𝝈1, …, 𝝈n using fast multi-point evaluation. Let us denote the results by 𝒗 = 𝒉p(𝝈) and𝒘 = 𝒉'p(𝝈). Let 𝜏,v andw denote the balls with radius zero and whose centers are the same as for 𝝉, 𝒗 and 𝒘. Using vector notation, the Newton iteration now becomes:

𝝈 = (𝜏⊖p𝜾p(w)⊗pv)⊕p(1⊖p𝜾p(w)⊗p𝒘)⊗p(𝝉⊖p𝜏)^.

If 𝜎 ∈ 𝝉, then it is well-known [17,23] that𝜎 ∈ 𝝈. Sincerad(𝝉) ⩽ 2^−𝛼𝛽p, the fact that multi- point ball evaluation (used for𝒉pand𝒉'p) is locally Lipschitz implies the existence of a constant 𝛿>0withrad(𝝂)⩽2^{𝛿−𝛼𝛽p}andrad(𝒘)⩽2^{𝛿−𝛼𝛽p}. Sinceh'(𝜎i)≠0fori=1,…,n, there also exists a constant 𝛿'> 0with 1−𝜾p(w) 𝒘 ⊆ ℬ(0, 2^𝛿^{'−𝛼𝛽p}). Altogether, this means that there exists a constant𝛿''> 0withrad(𝝈) ⩽ 2^{𝛿''−2𝛼𝛽p}. Let p₁= ⌈𝛿''/(𝛼 (2 𝛽−1))⌉. Then for any p⩾p₁, the Newton iteration provides us with a𝝈withrad(𝝈)⩽2^−𝛼p.

(10)

Let us now analyze the ultimate complexityC(n,p)of our algorithm. For largep⩾p₁, the algorithm essentially performs two multi-point evaluations of ultimate cost𝜅'I(p)^M(n)lognfor some constant𝜅'that does not depend onp, and a recursive call. Consequently,

C(n,p) ⩽ 𝜅'I(p)^M(n)logn+C(n,⌈𝛽p⌉)^. We ﬁnally obtain an other constant𝜅⩾𝜅'such that

C(n,p) = 𝜅^I(p)^M(n)logn,

by summing up the geometric progression and using the fact thatI(p)/pis nondecreasing. The

conclusion now follows from Lemma4. □

Remark 10. A remarkable feature of the above proof is that the precisionp₁at which the Newton iteration can safely be used does not need to be known in advance. In particular, the proof does not require anya prioriknowledge about the Lipschitz constants.

THEOREM11. There exists a constant𝜅 > 0such that the following assertion holds. Let f,g∈ ℂ^com[x]<nand let h∈ℂ^com[x]be separable and monic of degree n. Assume that(f,g,h)has ulti- mate complexity T(n,p). Then𝜚=g∘fremh has ultimate complexity T(n,p)+𝜅^I(p)^M(n)logn.

Proof. This is an immediate consequence of the combination of the two above lemmas. □

8. C

ONCLUSION AND FINAL REMARKS

With some more work, we expect that all above bounds of the form O(Î(p)^M(n) log n)can be lowered toO(Î(n p) logn). Notice that I(n p) logn=O(Î(p)nlogn)for p⩾n, when taking

I(p) = Θ(n logn 8^log^∗ⁿ) [7]. In order to prove this stronger bound using our framework, one might add an auxiliary operation ×^[n]for the product of two polynomials of degrees<n to the set of signatures ℱ. Polynomial products of this kind can be implemented for coefficients in 𝔻p[i]with p⩾nusing Kronecker substitution. For bounded coefficients, this technique allows for the computation of one such product in time O(Î(n p)). By using Theorem 6, a standard complexity analysis should show that multi-point evaluation and interpolation have ultimate com- plexityO(Î(np)logn).

By Theorem11, the actual bit complexity of modular composition is of the formT(n,p+𝛿)+

𝜅^I(p+ 𝛿)^M(n) logn for some value of𝛿 that depends on f,g,h (hence of n). An interesting problem is to get a better grip on this value𝛿, which mainly depends on the geometric proximity of the roots of h.

If f,g,hbelong toℚ[x], thenT(n,p)=O(nI(p))and we may wish to bound𝛿as a function of nand the maximum bit size lof the coeﬃcients of f,g andh. This would involve bit complexity results for root isolation [18, 19,24], for multi-point evaluation, and for interpolation.

The overal complexity should then be compared with the maximal size of the output, namely g∘fremh, which is in general much larger than the input size.

Ifhis not separable, but if a separable decomposition is known, then the techniques developed in this paper could be combined with Ritzmann's algorithm for the composition of formal power series [22]. If such a separable decomposition is not known, then it is an interesting problem to obtain a general algorithm for modular composition with a similar complexity (but this seems far beyond the scope of this paper).

B

IBLIOGRAPHY

[1] D. Bernstein. Scaled remainder trees. Available fromhttps://cr.yp.to/arith/scaledmod-20040820.pdf, 2004.

(11)

[2] A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor,Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.

[3] R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series.J. ACM, 25(4):581–595, 1978.

[4] P. Bürgisser, M. Clausen, and M. A. Shokrollahi.Algebraic complexity theory, volume 315 ofGrundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.

[5] D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.

[6] J. von zur Gathen and J. Gerhard.Modern computer algebra. Cambridge University Press, New York, 2 edition, 2003.

[7] D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication.J. Complexity, 36:1–30, 2016.

[8] J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.

[9] J. van der Hoeven. Ball arithmetic. Technical report, CNRS & École polytechnique, 2011. https://hal.archives- ouvertes.fr/hal-00432152/.

[10] J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://

hal.archives-ouvertes.fr/hal-01403810.

[11] J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, CNRS & École polytechnique, 2017.http://hal.archives-ouvertes.fr/hal-01457074.

[12] Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.

[13] E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.

[14] E. Kaltofen and V. Shoup. Subquadratic-time factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.

[15] K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. InFOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.

[16] K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.

[17] R. Krawczyk. Newton-Algorithmen zur Bestimmung von Nullstellen mit Fehler-schranken. Computing, 4:187–201, 1969.

[18] C. A. Neﬀ and J. H. Reif. An eﬃcient algorithm for the complex roots problem. J. Complexity, 12(2):81–115, 1996.

[19] V. Y. Pan. Univariate polynomials: nearly optimal algorithms for numerical factorization and root-ﬁnding. J.

Symbolic Comput., 33(5):701–733, 2002.

[20] C. H. Papadimitriou.Computational Complexity. Addison-Wesley, 1994.

[21] M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials.SIAM J.Comput., 2(1):60–66, 1973.

[22] P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coeﬃcients.Theoret.

Comput. Sci., 44:1–16, 1986.

[23] S. M. Rump.Kleine Fehlerschranken bei Matrixproblemen.PhD thesis, Universität Karlsruhe, 1980.

[24] A. Schönhage. The fundamental theorem of algebra in terms of computational complexity. Technical report, Preliminary Report of Mathematisches Institut der Universität Tübingen, Germany, 1982.