Structured FFT and TFT:
symmetric and lattice polynomials
∗Joris van der Hoeven
Laboratoire d’informatique UMR 7161 CNRS École polytechnique 91128 Palaiseau Cedex, France
Romain Lebreton
LIRMM UMR 5506 CNRS Université de Montpellier II
Montpellier, France
Éric Schost
Computer Science Department Western University
London, Ontario Canada
ABSTRACT
In this paper, we consider the problem of efficient computa- tions with structured polynomials. We provide complexity results for computing Fourier Transform and Truncated Fourier Transform of symmetric polynomials, and for mul- tiplying polynomials supported on a lattice.
1. INTRODUCTION
Fast computations with multivariate polynomials and power series have been of fundamental importance since the early ages of computer algebra. The representation is an important issue which conditions the performance in an intrinsic way; see [24,33,8] for some historical references.
It is customary to distinguish three main types of repre- sentations: dense, sparse, and functional. Adense represen- tation is made of a compact description of the support of the polynomial and the sequence of its coefficients. The main example concerns block supports – it suffices to store the coordinates of two opposite vertices. In a dense representa- tion all the coefficients of the considered support are stored, even if they are zero. If a polynomial has only a few non-zero terms in its bounding block, we shall prefer to use asparse representation which stores only the sequence of the non- zero terms as pairs of monomials and coefficients. Finally, a functional representationstores a function that can produce values of the polynomials at any given point. This can be a pure blackbox (which means that its internal structure is not supposed to be known) or a specific data structure such asstraight-line programs (see Chapter 4 of [4], for instance, and [11] for a concrete library for the manipulation of poly- nomials with a functional representation).
For dense representations with block supports, it is clas- sical that the algorithms used for the univariate case can be naturally extended: the naive algorithm, Karatsuba’s algorithm, and even Fast Fourier Transforms [7,32,6, 30, 17] can be applied recursively in each variable, with good performance. Another classical approach is the Kronecker
*This work has been partly supported by theDigiteo 2009-36HD grant of the Région Ile-de-France, the ANR grant HPAC (ANR-11- BS02-013), NSERC and the CRC program.
Permission to make digital or hard copies of all or part of this work for per- sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
ISSAC’13, June 26–29, 2013, Boston, Massachusetts, USA.
Copyright 2013 ACM 978-1-4503-2059-7/13/06 ...$15.00.
substitution which reduces the multivariate product to one variable only; for all these questions, we refer the reader to classical books such as [29, 14]. When the number of variables is fixed and the partial degrees tend to infinity, these techniques lead to softly linear costs.
After the discovery of sparse interpolation [2, 25, 26, 27, 12], probabilistic algorithms with a quasi-linear com- plexity have been developed for sparse polynomial multipli- cation [5]. It has recently be shown that such asymptotically fast algorithms may indeed become more efficient than naive sparse multiplication [20].
In practice however, it frequently happens that multi- variate polynomials with a dense flavor do not admit a block support. For instance, it is common to consider polynomials of a given total degree. In a recent series of works [16,18, 19, 22, 21], we have studied the complexity of polynomial multiplication in this “semi-dense” setting; see also [10,31].
In the case when the supports of the polynomials are ini- tial segments for the partial ordering onNn, the truncated Fourier transform is a useful device for the design of effi- cient algorithms.
Besides polynomials with supports of a special kind, we may also consider what will call “structured polynomials”.
By analogy with linear algebra, such polynomials carry a special structure which might be exploited for the design of more efficient algorithms. In this paper, we turn our attention to a first important example of this kind: polyno- mials which are invariant under the action of certain matrix groups. We consider only two special cases: finite subgroups of Sn and finite groups of diagonal matrices. These cases are already sufficient to address questions raisede.g.in celes- tial mechanics [13]; it is hoped that more general groups can be dealt with using similar ideas.
In the limited scope of this paper, our main objective is to prove complexity results that demonstrate the savings induced by a proper use of the symmetries. Our complexity analyses take into account the number of arithmetic oper- ations in the base field, and as often, we consider that operations with groups and lattices take a constant number of operations. A serious implementation of our algorithms would require an improved study of these aspects.
Of course, there already exists an abundant body of work on some of these questions. Crystallographic FFT algo- rithms date back to [34], with contributions as recent as [28], but are dedicated to crystallographic symmetries. A more general framework due to [1] was recently revisited under the point of view of high-performance code genera- tion [23]; our treatment of permutation groups is in a similar spirit, but to our understanding, these previous papers do
not prove results such as those we give below (and they only consider the FFT, not its truncated version).
Also our results on diagonal groups, which fit in the gen- eral context of FFTs over lattices, use similar techniques as in a series of papers initiated by [15] and continued as recently as [35,3], but the actual results we prove are not in those references.
2. THE CLASSICAL FFT 2.1 Notation
We will work over an effective base field (or ring)Kwith sufficiently many roots of unity; the main objects are poly- nomialsK[x]8K[x1,, xn]innvariables overK. For any k= (k1,,kn)∈Nnand P ∈K[x], we denote by xkthe monomialx1k1
xnknand by Pkthe coefficient of P in xk. The support supp(P)of a polynomialP∈K[x]is the set of exponentsk∈Nnsuch thatPk0.
For any subset S ofNn, we define K[x]S as the polyno- mials with support included inS. As an important special case, whenS=Ndn8{0,, d−1}n, we denote byK[x]d8
K[x]Sthe set of polynomials with partial degree less thand in all variables.
Letω∈Kbe a primitived-th root of unity (in Section4, it will be convenient to write this roote2pi/d). For anyk= (k1,,kn)∈Nn, we define ωk= (ωk1,, ωkn). One of our aims is to compute efficiently the map
FFTω: (
K[x]d KNdn P (P(ωi))i∈Ndn
when P is a “structured polynomial”. Here, KNdn denotes the set of vectors with entries inKindexed byNdn; in other words,v∈KNdnimplies thatvi∈Kfor alli∈Ndn. Likewise, KNdn×Ndndenotes the set of matrices with indices inNdn, that is,M∈KNdn×Ndnimplies thatMi,j∈Kfor alli,j∈Ndn.
Most of the time, we will taked= 2ℓwithℓ∈N, although we will also need more generaldof mixed radicesd=p1 pℓ. We denote byi·jthe inner product of two vectors inNn. We also letek8(0,k−1 ,0,1,0,,0), for16k6n, be thek- th element of the canonical basis ofKn. At last, forℓ∈Nand k∈N2ℓwe denote by[k]ℓthe bit reversal ofkin lengthℓand we extend this notation to vectors by[k]s= ([k1]s,,[kn]s).
2.2 The classical multivariate FFT
Let us first consider the in-place computation of a fulln- dimensional FFT of lengthd= 2ℓin each variable. We first recall the notations and operations of the decimation in time variant of the FFT, and we refer the reader to [9] for more details. In what follows,ωis ad-th primitive root of unity.
We start with the FFT of a univariate polynomial P ∈ K[x]d. Decimation in time amounts to decomposingP into its even and odd parts, and proceeding recursively, by means ofℓdecimations applied to the variablex. For06k < d, we will writeck0=Pkfor the input and denote bycs= (cks)06k<d
the result aftersdecimation steps, fors∈ {1,, ℓ}.
At stages, the decimation is computed using butterflies of spanδ82ℓ−s. If i∈N2sis even and jbelongs toNδthen these butterflies are given by
ciδ+js c(i+1)δ+s j
!
= 1 ω[i]sδ 1 −ω[i]sδ
! ciδ+js−1 c(i+1)δ+js−1
! .
Putting the coefficients of all these linear relations in a matrix Bs∈KNd×Nd, we getcs=Bscs−1. The matrixBs is sparse, with at most two non-zero coefficients on each row and each column; up to permutation, it is block-diagonal, with blocks of size 2. After ℓ stages, we get the evalua- tions of P in the bit reversed order:ckℓ=P(ω[k]ℓ).
We now adapt this notation to the multivariate case. The computation is still divided inℓstages, each stage doing one decimation in every variable x1,, xn. Therefore, we will denote by ck0
8 Pkthe coefficients of the input fork∈Nd and by cks,tthe coefficients obtained during the s-th stage, after the decimations inx1,, xtare done, withs∈ {1,, ℓ}
and t∈ {0,, n}, so that cks−1,n= cks,0. We abbreviate cks−18cks,0for the coefficients afters−1stages.
The intermediate coefficientsckscan be seen as evaluations of intermediate polynomials: for everys∈ {0,, ℓ},i∈N2ns
and j∈Nδn, one has
ciδ+js = Pjs((ωδ)[i]s) (1) where Pjs=P
i′∈N2sn Pi′δ+jxi′are obtained through an s- fold decimation ofP. Equivalently, the coefficientsckssatisfy
ciδ+js = X
i′∈N2sn
Pi′δ+jω[i]s·i′δ. (2) Thus,cℓj=P(ω[j]ℓ)yields FFTω(P)in bit reversed order.
For the concrete computation of the coefficients cks at stages from the coefficients cks−1 at stage s−1, we usen so called “elementary transforms” with respect to each of the variables x1,, xn. For any t ∈ {1,, n}, the coef- ficientscks,tare obtained from the coefficientscks,t−1through butterflies of span δ = 2ℓ−s with respect to the variable xt; this can be rewritten by means of the formula
ciδ+js,t c(i+e
t)δ+j s,t
!
= 1 ω[it]sδ 1 −ω[it]sδ
! ciδ+s,t−1j c(i+e
t)δ+j s,t−1
! (3) for any i∈N2ns with even t-th coordinate it and j ∈Nδn. Equation (3) can be rewritten more compactly in matrix form cs,t = Bs,t cs,t−1 where Bs,t is a sparse matrix in KNdn×Ndnwhich is naturally indexed by pairs of multi-indices in Ndn. Setting Bs=Bs,nBs,1∈KNdn×Ndn,we also obtain a short formula forcsas a function of cs−1:
cs = Bscs−1 (4)
Remark that the matricesBs,1,,Bs,n commute pairwise.
Notice also that each of the rows and columns ofBs,thas at most 2 non zero entries and consequently those of Bs contains at most 2n non zero entries. For this reason, we can apply the matrices Bs,t (resp. Bs) within O(|Ndn|) = O(dn)(resp. O(n dn)) operations in K. Finally, the fulln- dimensional FFT of lengthd= 2ℓcostsF(d, n)83/2ℓ n dn= 3/2dnlog(dn)operations inK(see [14, Section 8.2]).
Example 1. Let us make these matrices explicit for poly- nomials in n= 2 variables and degree less than d= 4, so that ℓ= 2. We start with the first decimation inx1whose butterflies of spanδare captured byBs,twithδ= 2ℓ−s= 21, s= 1andt= 1. It takes as inputck0=ck1,08Pkand outputs ck1,1for k∈N42={0,,3}2. For any j1∈N2={0,1}and k2∈ {0,,3}, we let
c(j1,11,k2) c(2+j1,1 1,k2)
!
=
1 1 1 −1
c(j1,01,k2) c(2+j1,0 1,k2)
! .
So, B1,1 is a 16 by 16 matrix, which is made of diagonal blocks of11 −11in a suitable basis. The decimation inx2
during stage1is similar : c(k1,21,j2) c(k1,21,2+j2)
!
=
1 1 1 −1
c(k1,11,j2) c(k1,11,2+j2)
!
for j2∈ N2 and k1 ∈ N4. Consequently, B1,2 is made of diagonal blocks11 −11 (in another basis thanB1,1). Their productB18B1,2B1,1corresponds to the operations
c(j11,j2)
c(2+j1 1,j2)
c(j11,2+j2)
c(2+j1 1,2+j2)
=
1 1 1 1
1 −1 1 −1 1 1 −1 −1 1 −1 −1 1
c(j01,j2)
c(2+j0 1,j2)
c(j01,2+j2)
c(2+j0 1,2+j2)
forj1,j2∈N2. ThusB1is made of such 4 by 4 matrices on the diagonal (in yet another basis). Note that in general, the matricesBs,tandBsare still made of diagonal blocks, but these blocks vary along the diagonal.
We can sum it all up in the two following algorithms.
Algorithm Butterfly
Input:n, degreed, stages, index(2i′,j)of the butterfly with i′∈N2ns−1, j ∈Nδn, root of unity ω and coefficients c∈KN2nfor the butterfly (cb8c(2is−1′+b)δ+j forb∈N2n).
Output: the output of the butterflyd∈KN2n
Fort= 1,, ndo //decimation inxt
Forb∈N2n={0,1}nsuch thatbt= 0do
r=ω[2it′]sδcb+et //butterfly in one variable db=cb+r
db+et=cb−r Return db
Algorithm FFT
Input: n, degreed,ωand coefficientsc0∈KNdnof P Output: coefficientscℓ∈KNdnin bit reversed order
Fors= 1,, ℓdo //stages
δ8d/2s
Fori′∈N2ns−1,j∈N2nℓ−sdo //pick a butterfly Forb∈N2ndo cb8c(2is−1′+b)δ+j
db=Butterfly(n, d, s,i′,j,cb) Forb∈N2ndo c(2is ′+b)δ+j
8db Return cℓ
In the last section, we will also use the ring isomorphism K[x]d/(x1d−1,, xnd−1) → [K[x]δ/(x1δ−1,, xnδ−1)]2ns
P Qs= (Qis(x))i∈N2sn
with Qis(x)8 P
j∈Nδn
P
i′∈N2sn Pi′δ+jωi·i′δ
(ωix)j for i ∈ N2ns, and (ωi x)j = (ωi1 x1)j1 (ωin xn)jn. These polynomials generalize the decomposition of a univariate polynomialsP intoPmod(xd/2−1)andPmod(xd/2+ 1).
We could obtain them through a decimation in frequency, but it is enough to remark that we can reconstruct them from the coefficientscsthanks to the formula
Qis(x) = X
j∈Nδn
c[i]s sδ+j(ωix)j. (5)
3. THE SYMMETRIC FFT
In this section, we letG⊆Sn be a permutation group.
The groupGacts on the polynomialsK[x]via the map ϕg:
( K[x] K[x]
P(x1,, xn) Pg(x)8P(xg(1),, xg(n)). We denote byK[x]G8stabGK[x] the set of polynomials invariant under the action of G. Our main result here is that one can save a factor (roughly) |G|when computing the FFT of an invariant polynomial. Our approach is in the same spirit as the one in [1], where similar statements on the savings induced by (very general) symmetries can be found.
However, we are not aware of published results similar to Theorem7below.
3.1 The bivariate case
Let P ∈K[x1, x2] be a symmetric bivariate polynomial of partial degrees less than d= 8, so that ℓ= 3; let also ω∈Kbe a primitive eighth root of unity. We detail on this easy example the methods used to exploit the symmetries to decrease the cost of the FFT.
The coefficients of P are placed on a 8 × 8 grid. The bivariate classical FFT consists in the application of butter- flies of sizeδ82ℓ−sforsfrom1to3, as in Figure1. When some butterflies overlap, we draw only the shades of all but one of them. The result of stages= 3is the set of evaluations (P(ωi1, ωi2))i∈N8
2in bit reversed order.
Figure 1. Classical bivariate FFT
We will remark below that each stagespreserves the sym- metry of the input. In particular, since our polynomial P is symmetric, the coefficients ck1 at stage 1are symmetric too, so we only need to compute the coefficientsc(k1 1,k2)for (k1,k2)in thefundamental domain(sometimes called asym- metric unit)06k26k1<8. We choose to compute at stage s only the butterflies which involves at least an element of F; the set of indices that are included in a butterfly of span δ= 2ℓ−sthat meetsF will be called the extensionFδof F.
Figure 2. Symmetric bivariate FFT
Every butterfly of stage1meets the fundamental domain, so we do not save anything there. However, we save 4out of 16 butterflies at stage2and6out of 16 butterflies at the third stage. Asymptotically in d, we gain a factor two in the number of arithmetic operations inKcompared to the classical FFT; this corresponds to the order of the group.
3.2 Notation
The groupGalso acts onNnwithg(i) = (ig(1),,ig(n)) for any i∈Nn. This action is consistent with the action on polynomials sinceϕg(xk) =xg−1(k). BecauseGacts on polynomials, it acts on the vector of its coefficients. More generally, ifI⊆Nnandv∈KG(I), we denote byvg∈KIthe vector defined byvig=vg(i)for alli∈I. IfI⊆Nnis stable by G,i.e. G(I)⊆I, we define the set of invariant vectors KI ,GbyKI ,G8{v∈KI:∀g∈G,vg=v}.
For any two setsI , J satisfyingJ⊆I⊆Nn, we define the restriction vJ∈KJ of v∈KI by (vJ)j=vj for all j∈J.
Recall the definition of the lexicographical order6lex : for alli,j∈Nn,i<lexj if there existsk∈ {0,, n−1}such thatit=jt for any16t6k and ik+1<jk+1. Finally for any subset S ⊆Nn stable by G, we define a fundamental domainSGof the action of GonS bySG8{i∈S:∀g∈G, i>lexg(i)}together with the projectionπG:Nn→Nnsuch as{πG(i)}8G({i})∩(Nn)G.
3.3 Fundamental domains
LetF=F[d]8(Ndn)Gbe the fundamental domain associ- ated to the action ofGonNdn. AnyG-symmetric vectorc∈ KNdncan be reconstructed from its restrictioncF∈KF using the formulaci= (cF)πG(i). As it turns out, the in-place FFT algorithm from Section2.2admits the important property that the coefficients at all stage are stillG-symmetric.
Lemma2. LetP∈K[x]dand the vectorsc0,,cℓ∈KNdn be as in Section 2.2. Then ifP∈K[x]dG,c0,,cℓ∈KNdn,G. Proof. GivenP∈K[x]dG,k∈Ndnand g∈G, we clearly havecg(k)0 =ck0. For anys∈ {0,, ℓ},i,i′∈N2ns, j∈N2nℓ−s
and g ∈ G ⊆ Sn, we also notice that g(i 2ℓ−s + j) = g(i) 2ℓ−s+ g(j),[g(i)]s=g([i]s) and g(i)·g(i′) =i·i′. Hence, using Equation (2), we get
csg(i2ℓ−s+j) = csg(i)2ℓ−s+g(j)
= X
i′∈N2sn
Pi′2ℓ−s+g(j)ω[g(i)]s·i′2ℓ−s
= X
i′∈N2sn
Pg(i′)2ℓ−s+g(j)ω[g(i)]s·g(i′)2ℓ−s
= X
i′∈N2sn
Pg(i′2ℓ−s+j)ωg([i]s)·g(i′)2ℓ−s
= X
i′∈N2sn
Pi′2ℓ−s+jω[i]s·i′2ℓ−s
= ci2s ℓ−s+j.
Thus,csg(k)=cks for allk∈Ndn, whencec0,,cℓ∈KNdn,G. This lemma implies that for the computation of the FFT of aG-symmetric polynomialP, it suffices to compute the projectionscF0,,cFℓ.
In order to apply formula (4) for the computation ofcFs as a function of cFs−1, it is not necessary to completely recon- structcs−1, due to the sparsity of the matrixBs. Instead, we define theδ-expansionof the setFbyF={k⊞ǫδ:k∈F , ǫ∈N2n}, wherei⊞jstands for the “bitwise exclusive or” ofi andj. For anyk∈Ndn, the set{k⊞ǫδ:ǫ∈N2n}describes the vertices of the butterfly of spanδthat includes the pointk.
Thus,Fδis indeed the set of indices ofcs−1that are involved in the computation of csvia the formulacs=Bscs−1.
Lemma 3. For any δ ∈ {1, 2, 4, , 2ℓ−1}, let F[δ] = {k⊞ǫ:k∈F ,ǫ∈Nδn},so that Fδ⊆F[2δ]. Then we have F[δ]=δ F[d/δ]+Nδn.
Proof. Assume that i∈F[d/δ]. Then clearly, δi∈F, whence δi+Nδn⊆F[δ]. Conversely, if i∈F[δ], then there exists a j∈F with j=i⊞ǫand ǫ∈Nδn. Letk=⌊j/δ⌋.
For any g∈G, we have g(k) =⌊g(j)/δ⌋6lex⌊j/δ⌋=k.
Consequently,⌊j/δ⌋ ∈F[d/δ], whence i∈δ⌊j/δ⌋+Nδn. For the proof of the next lemma, for(j , k)∈ {1,, n}2, define∆j ,k={i∈Nn:ij=ik}andΥ =Ndn\S
kj∆j ,k. Lemma 4. There exists a constant C (depending on n) such that
|F[d]| −|G|dn
6C dn−1.
Proof. With the notation above, we have|∆j ,k∩Ndn|= dn−1. TakingC=n2, it follows that
S
kj∆j ,k
∩Ndn 6 C dn−1. On the other hand, the orbit of any element inΥ underGcontains exactly|G|elements and one element inF. In other words,|F∩Υ|=|Υ|/|G|and so06|F| − |Υ|/|G|6 C dn−1. Finally, 06(dn− |Υ|)/|G|6C/|G|dn−1 which implies−C/|G|dn−16|F| −dn/|G|6C dn−1.
3.4 The symmetric FFT
LetcFs−1δ be the restriction of cs−1 toFδ and notice that KFδ is stable underBs,i.e. Bs(KFδ)⊆KFδ. Therefore the restrictionBKsFδof the mapBs:KNdn→KNdnto vectors inKFδ is aK-algebra morphism fromKFδtoKFδ. By construction, we now havecFsδ=BKsFδcFs−1δ .This allows us to computecFs
as a function of cFs−1using
cFs = (BKsFδξδ(cFs−1))F (6) where ξδ(cFs−1)denotes theδ-expansioncFs−1δ of cFs−1.
Remark 5.We could have saved a few more operations by only computing the coefficientscFs. To do so, we would have performed just the operations of a butterfly corresponding to vertices inside F. However, the potential gain of com- plexity is only linear in the numbers of monomialsdn.
The formula (6) yields a straightforward way to compute the direct FFT of aG-symmetric polynomial:
Algorithm Symmetric-FFT
Input:n,dand coefficientscF0 ∈KNdnof P∈K[x]dGinF Output:cFℓ∈KNdnin bit reversed order
Fors= 1,, ℓdo δ8d/2s
Fori′∈N2ns−1,j∈N2nℓ−ss.t. 2i′δ+j∈Fδdo Forb∈N2ndocb8cπG((2i′+b)δ+j)
s−1
db=Butterfly(n, d, s,i′,j,cb) Forb∈N2ndocπsG((2i′+b)δ+j)
8db
Returncℓ
Remark 6.The main challenge for an actual implementa- tion of our algorithms consists in finding a way to iterate on sets likeFδwithout too much overhead. We give a hint on how to iterate overF in Lemma8(see also [23] for similar considerations).