• Aucun résultat trouvé

Structured FFT and TFT: symmetric and lattice polynomials∗

N/A
N/A
Protected

Academic year: 2022

Partager "Structured FFT and TFT: symmetric and lattice polynomials∗"

Copied!
8
0
0

Texte intégral

(1)

Structured FFT and TFT:

symmetric and lattice polynomials

Joris van der Hoeven

Laboratoire d’informatique UMR 7161 CNRS École polytechnique 91128 Palaiseau Cedex, France

[email protected]

Romain Lebreton

LIRMM UMR 5506 CNRS Université de Montpellier II

Montpellier, France

[email protected]

Éric Schost

Computer Science Department Western University

London, Ontario Canada

[email protected]

ABSTRACT

In this paper, we consider the problem of efficient computa- tions with structured polynomials. We provide complexity results for computing Fourier Transform and Truncated Fourier Transform of symmetric polynomials, and for mul- tiplying polynomials supported on a lattice.

1. INTRODUCTION

Fast computations with multivariate polynomials and power series have been of fundamental importance since the early ages of computer algebra. The representation is an important issue which conditions the performance in an intrinsic way; see [24,33,8] for some historical references.

It is customary to distinguish three main types of repre- sentations: dense, sparse, and functional. Adense represen- tation is made of a compact description of the support of the polynomial and the sequence of its coefficients. The main example concerns block supports – it suffices to store the coordinates of two opposite vertices. In a dense representa- tion all the coefficients of the considered support are stored, even if they are zero. If a polynomial has only a few non-zero terms in its bounding block, we shall prefer to use asparse representation which stores only the sequence of the non- zero terms as pairs of monomials and coefficients. Finally, a functional representationstores a function that can produce values of the polynomials at any given point. This can be a pure blackbox (which means that its internal structure is not supposed to be known) or a specific data structure such asstraight-line programs (see Chapter 4 of [4], for instance, and [11] for a concrete library for the manipulation of poly- nomials with a functional representation).

For dense representations with block supports, it is clas- sical that the algorithms used for the univariate case can be naturally extended: the naive algorithm, Karatsuba’s algorithm, and even Fast Fourier Transforms [7,32,6, 30, 17] can be applied recursively in each variable, with good performance. Another classical approach is the Kronecker

*This work has been partly supported by theDigiteo 2009-36HD grant of the Région Ile-de-France, the ANR grant HPAC (ANR-11- BS02-013), NSERC and the CRC program.

Permission to make digital or hard copies of all or part of this work for per- sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISSAC’13, June 26–29, 2013, Boston, Massachusetts, USA.

Copyright 2013 ACM 978-1-4503-2059-7/13/06 ...$15.00.

substitution which reduces the multivariate product to one variable only; for all these questions, we refer the reader to classical books such as [29, 14]. When the number of variables is fixed and the partial degrees tend to infinity, these techniques lead to softly linear costs.

After the discovery of sparse interpolation [2, 25, 26, 27, 12], probabilistic algorithms with a quasi-linear com- plexity have been developed for sparse polynomial multipli- cation [5]. It has recently be shown that such asymptotically fast algorithms may indeed become more efficient than naive sparse multiplication [20].

In practice however, it frequently happens that multi- variate polynomials with a dense flavor do not admit a block support. For instance, it is common to consider polynomials of a given total degree. In a recent series of works [16,18, 19, 22, 21], we have studied the complexity of polynomial multiplication in this “semi-dense” setting; see also [10,31].

In the case when the supports of the polynomials are ini- tial segments for the partial ordering onNn, the truncated Fourier transform is a useful device for the design of effi- cient algorithms.

Besides polynomials with supports of a special kind, we may also consider what will call “structured polynomials”.

By analogy with linear algebra, such polynomials carry a special structure which might be exploited for the design of more efficient algorithms. In this paper, we turn our attention to a first important example of this kind: polyno- mials which are invariant under the action of certain matrix groups. We consider only two special cases: finite subgroups of Sn and finite groups of diagonal matrices. These cases are already sufficient to address questions raisede.g.in celes- tial mechanics [13]; it is hoped that more general groups can be dealt with using similar ideas.

In the limited scope of this paper, our main objective is to prove complexity results that demonstrate the savings induced by a proper use of the symmetries. Our complexity analyses take into account the number of arithmetic oper- ations in the base field, and as often, we consider that operations with groups and lattices take a constant number of operations. A serious implementation of our algorithms would require an improved study of these aspects.

Of course, there already exists an abundant body of work on some of these questions. Crystallographic FFT algo- rithms date back to [34], with contributions as recent as [28], but are dedicated to crystallographic symmetries. A more general framework due to [1] was recently revisited under the point of view of high-performance code genera- tion [23]; our treatment of permutation groups is in a similar spirit, but to our understanding, these previous papers do

(2)

not prove results such as those we give below (and they only consider the FFT, not its truncated version).

Also our results on diagonal groups, which fit in the gen- eral context of FFTs over lattices, use similar techniques as in a series of papers initiated by [15] and continued as recently as [35,3], but the actual results we prove are not in those references.

2. THE CLASSICAL FFT 2.1 Notation

We will work over an effective base field (or ring)Kwith sufficiently many roots of unity; the main objects are poly- nomialsK[x]8K[x1,, xn]innvariables overK. For any k= (k1,,kn)Nnand P K[x], we denote by xkthe monomialx1k1

xnknand by Pkthe coefficient of P in xk. The support supp(P)of a polynomialPK[x]is the set of exponentskNnsuch thatPk0.

For any subset S ofNn, we define K[x]S as the polyno- mials with support included inS. As an important special case, whenS=Ndn8{0,, d1}n, we denote byK[x]d8

K[x]Sthe set of polynomials with partial degree less thand in all variables.

LetωKbe a primitived-th root of unity (in Section4, it will be convenient to write this roote2pi/d). For anyk= (k1,,kn)Nn, we define ωk= (ωk1,, ωkn). One of our aims is to compute efficiently the map

FFTω: (

K[x]d KNdn P (Pi))i∈Ndn

when P is a “structured polynomial”. Here, KNdn denotes the set of vectors with entries inKindexed byNdn; in other words,vKNdnimplies thatviKfor alliNdn. Likewise, KNdn×Ndndenotes the set of matrices with indices inNdn, that is,MKNdn×Ndnimplies thatMi,jKfor alli,jNdn.

Most of the time, we will taked= 2withN, although we will also need more generaldof mixed radicesd=p1 p. We denote byjthe inner product of two vectors inNn. We also letek8(0,k−1 ,0,1,0,,0), for16k6n, be thek- th element of the canonical basis ofKn. At last, forℓ∈Nand kN2we denote by[k]the bit reversal ofkin lengthand we extend this notation to vectors by[k]s= ([k1]s,,[kn]s).

2.2 The classical multivariate FFT

Let us first consider the in-place computation of a fulln- dimensional FFT of lengthd= 2in each variable. We first recall the notations and operations of the decimation in time variant of the FFT, and we refer the reader to [9] for more details. In what follows,ωis ad-th primitive root of unity.

We start with the FFT of a univariate polynomial P K[x]d. Decimation in time amounts to decomposingP into its even and odd parts, and proceeding recursively, by means ofdecimations applied to the variablex. For06k < d, we will writeck0=Pkfor the input and denote bycs= (cks)06k<d

the result aftersdecimation steps, fors∈ {1,, ℓ}.

At stages, the decimation is computed using butterflies of spanδ82ℓ−s. If iN2sis even and jbelongs toNδthen these butterflies are given by

ciδ+js c(i+1)δ+s j

!

= 1 ω[i]sδ 1 −ω[i]sδ

! ciδ+js−1 c(i+1)δ+js−1

! .

Putting the coefficients of all these linear relations in a matrix BsKNd×Nd, we getcs=Bscs−1. The matrixBs is sparse, with at most two non-zero coefficients on each row and each column; up to permutation, it is block-diagonal, with blocks of size 2. After stages, we get the evalua- tions of P in the bit reversed order:ck=P[k]).

We now adapt this notation to the multivariate case. The computation is still divided instages, each stage doing one decimation in every variable x1,, xn. Therefore, we will denote by ck0

8 Pkthe coefficients of the input forkNd and by cks,tthe coefficients obtained during the s-th stage, after the decimations inx1,, xtare done, withs∈ {1,, ℓ}

and t∈ {0,, n}, so that cks−1,n= cks,0. We abbreviate cks−18cks,0for the coefficients afters1stages.

The intermediate coefficientsckscan be seen as evaluations of intermediate polynomials: for everys∈ {0,, ℓ},iN2ns

and jNδn, one has

ciδ+js = Pjs((ωδ)[i]s) (1) where Pjs=P

i∈N2sn Piδ+jxiare obtained through an s- fold decimation ofP. Equivalently, the coefficientsckssatisfy

ciδ+js = X

i∈N2sn

Piδ+jω[i]s·iδ. (2) Thus,cj=P[j])yields FFTω(P)in bit reversed order.

For the concrete computation of the coefficients cks at stages from the coefficients cks−1 at stage s1, we usen so called “elementary transforms” with respect to each of the variables x1,, xn. For any t ∈ {1,, n}, the coef- ficientscks,tare obtained from the coefficientscks,t−1through butterflies of span δ = 2ℓ−s with respect to the variable xt; this can be rewritten by means of the formula

ciδ+js,t c(i+e

t)δ+j s,t

!

= 1 ω[it]sδ 1 −ω[it]sδ

! ciδ+s,t−1j c(i+e

t)δ+j s,t−1

! (3) for any iN2ns with even t-th coordinate it and j Nδn. Equation (3) can be rewritten more compactly in matrix form cs,t = Bs,t cs,t−1 where Bs,t is a sparse matrix in KNdn×Ndnwhich is naturally indexed by pairs of multi-indices in Ndn. Setting Bs=Bs,nBs,1KNdn×Ndn,we also obtain a short formula forcsas a function of cs−1:

cs = Bscs−1 (4)

Remark that the matricesBs,1,,Bs,n commute pairwise.

Notice also that each of the rows and columns ofBs,thas at most 2 non zero entries and consequently those of Bs contains at most 2n non zero entries. For this reason, we can apply the matrices Bs,t (resp. Bs) within O(|Ndn|) = O(dn)(resp. O(n dn)) operations in K. Finally, the fulln- dimensional FFT of lengthd= 2costsF(d, n)83/2ℓ n dn= 3/2dnlog(dn)operations inK(see [14, Section 8.2]).

Example 1. Let us make these matrices explicit for poly- nomials in n= 2 variables and degree less than d= 4, so that = 2. We start with the first decimation inx1whose butterflies of spanδare captured byBs,twithδ= 2ℓ−s= 21, s= 1andt= 1. It takes as inputck0=ck1,08Pkand outputs ck1,1for kN42={0,,3}2. For any j1N2={0,1}and k2∈ {0,,3}, we let

c(j1,11,k2) c(2+j1,1 1,k2)

!

=

1 1 1 −1

c(j1,01,k2) c(2+j1,0 1,k2)

! .

(3)

So, B1,1 is a 16 by 16 matrix, which is made of diagonal blocks of11 −11in a suitable basis. The decimation inx2

during stage1is similar : c(k1,21,j2) c(k1,21,2+j2)

!

=

1 1 1 −1

c(k1,11,j2) c(k1,11,2+j2)

!

for j2 N2 and k1 N4. Consequently, B1,2 is made of diagonal blocks11 −11 (in another basis thanB1,1). Their productB18B1,2B1,1corresponds to the operations

c(j11,j2)

c(2+j1 1,j2)

c(j11,2+j2)

c(2+j1 1,2+j2)

=

1 1 1 1

1 −1 1 −1 1 1 −1 −1 1 −1 −1 1

c(j01,j2)

c(2+j0 1,j2)

c(j01,2+j2)

c(2+j0 1,2+j2)

forj1,j2N2. ThusB1is made of such 4 by 4 matrices on the diagonal (in yet another basis). Note that in general, the matricesBs,tandBsare still made of diagonal blocks, but these blocks vary along the diagonal.

We can sum it all up in the two following algorithms.

Algorithm Butterfly

Input:n, degreed, stages, index(2i,j)of the butterfly with iN2ns−1, j Nδn, root of unity ω and coefficients cKN2nfor the butterfly (cb8c(2is−1+b)δ+j forbN2n).

Output: the output of the butterflydKN2n

Fort= 1,, ndo //decimation inxt

ForbN2n={0,1}nsuch thatbt= 0do

r=ω[2it]sδcb+et //butterfly in one variable db=cb+r

db+et=cbr Return db

Algorithm FFT

Input: n, degreed,ωand coefficientsc0KNdnof P Output: coefficientscKNdnin bit reversed order

Fors= 1,, ℓdo //stages

δ8d/2s

ForiN2ns−1,jN2nℓ−sdo //pick a butterfly ForbN2ndo cb8c(2is−1+b)δ+j

db=Butterfly(n, d, s,i,j,cb) ForbN2ndo c(2is +b)δ+j

8db Return c

In the last section, we will also use the ring isomorphism K[x]d/(x1d1,, xnd1) [K[x]δ/(x1δ1,, xnδ1)]2ns

P Qs= (Qis(x))i∈N2sn

with Qis(x)8 P

j∈Nδn

P

i∈N2sn Piδ+jωi·iδ

ix)j for i N2ns, and i x)j = (ωi1 x1)j1 in xn)jn. These polynomials generalize the decomposition of a univariate polynomialsP intoPmod(xd/21)andPmod(xd/2+ 1).

We could obtain them through a decimation in frequency, but it is enough to remark that we can reconstruct them from the coefficientscsthanks to the formula

Qis(x) = X

j∈Nδn

c[i]s sδ+jix)j. (5)

3. THE SYMMETRIC FFT

In this section, we letGSn be a permutation group.

The groupGacts on the polynomialsK[x]via the map ϕg:

( K[x] K[x]

P(x1,, xn) Pg(x)8P(xg(1),, xg(n)). We denote byK[x]G8stabGK[x] the set of polynomials invariant under the action of G. Our main result here is that one can save a factor (roughly) |G|when computing the FFT of an invariant polynomial. Our approach is in the same spirit as the one in [1], where similar statements on the savings induced by (very general) symmetries can be found.

However, we are not aware of published results similar to Theorem7below.

3.1 The bivariate case

Let P K[x1, x2] be a symmetric bivariate polynomial of partial degrees less than d= 8, so that = 3; let also ωKbe a primitive eighth root of unity. We detail on this easy example the methods used to exploit the symmetries to decrease the cost of the FFT.

The coefficients of P are placed on a 8 × 8 grid. The bivariate classical FFT consists in the application of butter- flies of sizeδ82ℓ−sforsfrom1to3, as in Figure1. When some butterflies overlap, we draw only the shades of all but one of them. The result of stages= 3is the set of evaluations (P(ωi1, ωi2))i∈N8

2in bit reversed order.

Figure 1. Classical bivariate FFT

We will remark below that each stagespreserves the sym- metry of the input. In particular, since our polynomial P is symmetric, the coefficients ck1 at stage 1are symmetric too, so we only need to compute the coefficientsc(k1 1,k2)for (k1,k2)in thefundamental domain(sometimes called asym- metric unit)06k26k1<8. We choose to compute at stage s only the butterflies which involves at least an element of F; the set of indices that are included in a butterfly of span δ= 2ℓ−sthat meetsF will be called the extensionFδof F.

Figure 2. Symmetric bivariate FFT

Every butterfly of stage1meets the fundamental domain, so we do not save anything there. However, we save 4out of 16 butterflies at stage2and6out of 16 butterflies at the third stage. Asymptotically in d, we gain a factor two in the number of arithmetic operations inKcompared to the classical FFT; this corresponds to the order of the group.

(4)

3.2 Notation

The groupGalso acts onNnwithg(i) = (ig(1),,ig(n)) for any iNn. This action is consistent with the action on polynomials sinceϕg(xk) =xg−1(k). BecauseGacts on polynomials, it acts on the vector of its coefficients. More generally, ifI⊆NnandvKG(I), we denote byvgKIthe vector defined byvig=vg(i)for alliI. IfI⊆Nnis stable by G,i.e. G(I)I, we define the set of invariant vectors KI ,GbyKI ,G8{vKI:∀gG,vg=v}.

For any two setsI , J satisfyingJ⊆INn, we define the restriction vJKJ of vKI by (vJ)j=vj for all jJ.

Recall the definition of the lexicographical order6lex : for alli,jNn,i<lexj if there existsk∈ {0,, n1}such thatit=jt for any16t6k and ik+1<jk+1. Finally for any subset S Nn stable by G, we define a fundamental domainSGof the action of GonS bySG8{iS:∀gG, i>lexg(i)}together with the projectionπG:NnNnsuch asG(i)}8G({i})(Nn)G.

3.3 Fundamental domains

LetF=F[d]8(Ndn)Gbe the fundamental domain associ- ated to the action ofGonNdn. AnyG-symmetric vectorc KNdncan be reconstructed from its restrictioncFKF using the formulaci= (cF)πG(i). As it turns out, the in-place FFT algorithm from Section2.2admits the important property that the coefficients at all stage are stillG-symmetric.

Lemma2. LetPK[x]dand the vectorsc0,,cKNdn be as in Section 2.2. Then ifPK[x]dG,c0,,cKNdn,G. Proof. GivenPK[x]dG,kNdnand gG, we clearly havecg(k)0 =ck0. For anys∈ {0,, ℓ},i,iN2ns, jN2nℓ−s

and g G Sn, we also notice that g(i 2ℓ−s + j) = g(i) 2ℓ−s+ g(j),[g(i)]s=g([i]s) and g(i)·g(i) =i·i. Hence, using Equation (2), we get

csg(i2ℓ−s+j) = csg(i)2ℓ−s+g(j)

= X

i∈N2sn

Pi2ℓ−s+g(j)ω[g(i)]s·i2ℓ−s

= X

i∈N2sn

Pg(i)2ℓ−s+g(j)ω[g(i)]s·g(i)2ℓ−s

= X

i∈N2sn

Pg(i2ℓ−s+j)ωg([i]s)·g(i)2ℓ−s

= X

i∈N2sn

Pi2ℓ−s+jω[i]s·i2ℓ−s

= ci2s ℓ−s+j.

Thus,csg(k)=cks for allkNdn, whencec0,,cKNdn,G. This lemma implies that for the computation of the FFT of aG-symmetric polynomialP, it suffices to compute the projectionscF0,,cF.

In order to apply formula (4) for the computation ofcFs as a function of cFs−1, it is not necessary to completely recon- structcs−1, due to the sparsity of the matrixBs. Instead, we define theδ-expansionof the setFbyF={kǫδ:kF , ǫ∈N2n}, whereijstands for the “bitwise exclusive or” ofi andj. For anykNdn, the set{kǫδ:ǫ∈N2n}describes the vertices of the butterfly of spanδthat includes the pointk.

Thus,Fδis indeed the set of indices ofcs−1that are involved in the computation of csvia the formulacs=Bscs−1.

Lemma 3. For any δ ∈ {1, 2, 4, , 2ℓ−1}, let F[δ] = {kǫ:kF ,ǫNδn},so that FδF[2δ]. Then we have F[δ]=δ F[d/δ]+Nδn.

Proof. Assume that iF[d/δ]. Then clearly, δiF, whence δi+NδnF[δ]. Conversely, if iF[δ], then there exists a jF with j=iǫand ǫNδn. Letk=⌊j/δ⌋.

For any gG, we have g(k) =⌊g(j)/δ6lex⌊j/δ=k.

Consequently,⌊j/δ⌋ ∈F[d/δ], whence iδ⌊j/δ⌋+Nδn. For the proof of the next lemma, for(j , k)∈ {1,, n}2, definej ,k={iNn:ij=ik}andΥ =Ndn\S

kjj ,k. Lemma 4. There exists a constant C (depending on n) such that

|F[d]| −|G|dn

6C dn−1.

Proof. With the notation above, we have|∆j ,kNdn|= dn−1. TakingC=n2, it follows that

S

kjj ,k

Ndn 6 C dn−1. On the other hand, the orbit of any element inΥ underGcontains exactly|G|elements and one element inF. In other words,|FΥ|=|Υ|/|G|and so06|F| − |Υ|/|G|6 C dn−1. Finally, 06(dn− |Υ|)/|G|6C/|G|dn−1 which implies−C/|G|dn−16|F| −dn/|G|6C dn−1.

3.4 The symmetric FFT

LetcFs−1δ be the restriction of cs−1 toFδ and notice that KFδ is stable underBs,i.e. Bs(KFδ)KFδ. Therefore the restrictionBKsof the mapBs:KNdnKNdnto vectors inKFδ is aK-algebra morphism fromKFδtoKFδ. By construction, we now havecFsδ=BKscFs−1δ .This allows us to computecFs

as a function of cFs−1using

cFs = (BKsξδ(cFs−1))F (6) where ξδ(cFs−1)denotes theδ-expansioncFs−1δ of cFs−1.

Remark 5.We could have saved a few more operations by only computing the coefficientscFs. To do so, we would have performed just the operations of a butterfly corresponding to vertices inside F. However, the potential gain of com- plexity is only linear in the numbers of monomialsdn.

The formula (6) yields a straightforward way to compute the direct FFT of aG-symmetric polynomial:

Algorithm Symmetric-FFT

Input:n,dand coefficientscF0 KNdnof PK[x]dGinF Output:cFKNdnin bit reversed order

Fors= 1,, ℓdo δ8d/2s

ForiN2ns−1,jN2nℓ−ss.t. 2iδ+jFδdo ForbN2ndocb8cπG((2i+b)δ+j)

s−1

db=Butterfly(n, d, s,i,j,cb) ForbN2ndocπsG((2i+b)δ+j)

8db

Returnc

Remark 6.The main challenge for an actual implementa- tion of our algorithms consists in finding a way to iterate on sets likeFδwithout too much overhead. We give a hint on how to iterate overF in Lemma8(see also [23] for similar considerations).

Références

Documents relatifs

1. f is an idempotent w.l.p. As the definition of the w.l.p. func- tions almost coincide with that of the Sugeno integral, many properties of the Sugeno inte- gral can be applied

Our results generalize both the case of independent arguments, which was studied in Marichal [6], and the special situation where no constant lifetimes are considered (the

This note is organized as follows. In Section 2 we recall the basic material related to lattice polynomial functions and order statistics. In Section 3 we provide the announced

Since weighted lattice polynomial functions include Sugeno integrals, lat- tice polynomial functions, and order statistics, our results encompass the corre- sponding formulas for

Such standardizations have been used recently to give F -expansions of various symmetric functions including plethysms of Schur functions [14], the modified Macdonald polynomials

In this case, equation (4.4) reduces to a much smaller computation. For example, when composing atomic transforms for the inverse TFT, returning the first value of B 2 is actu-

We gather here a few classical facts about symmetric functions that will be used through- out the rest of this paper, and refer to [12] for a systematic exposition. Aside from

Eventually if the algebra admits several bases as algebra or vector spaces, we need the corresponding algorithms allowing to expand on a new basis a symmetric polynomial given on an