• Aucun résultat trouvé

Memory-efficient polynomial arithmetic

N/A
N/A
Protected

Academic year: 2022

Partager "Memory-efficient polynomial arithmetic"

Copied!
109
0
0

Texte intégral

(1)

Memory-efficient polynomial arithmetic

Pascal Giorgi1 Bruno Grenet1 Daniel S. Roche2

1 LIRMM, Université de Montpellier 2 CS Department, US Naval Academy

1

(2)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk

For i = 0 to n-1: For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

(3)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

2

(4)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

(5)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: splitF and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

2

(6)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: splitF and G in three or more parts

• FFT-based algorithms:

eval. mult. interp.

(7)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

3

(8)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

(9)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

3

(10)

Space complexity of polynomial arithmetic

First thought:count extra memory apart from input/output - Naive algorithm:O(1)

- Karatsuba, Toom-3, FFT:O(n)

- Other tasks: oftenO(n), sometime O(nlogn)

However, need to precise the complexity model !!!

(11)

Space complexity of polynomial arithmetic

First thought:count extra memory apart from input/output - Naive algorithm:O(1)

- Karatsuba, Toom-3, FFT:O(n)

- Other tasks: oftenO(n), sometime O(nlogn) However, need to precise the complexity model !!!

4

(12)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

(13)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

5

(14)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

(15)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

5

(16)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication X • Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

(17)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

6

(18)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

(19)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

6

(20)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG

space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

(21)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

7

(22)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

(23)

Previous results

Summary of complexities

Algorithms Time complexity Space complexity

naive 2n2+ 2n−1 O(1)

Karatsuba (’62) <6.5nlog(3) ≤2n+ 5 log(n) Karatsuba (Thomé’02) <7nlog(3)n+ 5 log(n)

Karatsuba(Roche’09) <10nlog(3) ≤5 log(n) Toom-3 (’63) < 734 nlog3(5) ≤2n+ 5 log3(n)

FFT(CT’65) 9nlog(2n) +O(n) 2n

FFT(Roche’09) 11nlog(2n) +O(n) O(1)

TFT(HR’10) O(nlog(n)) O(1)

8

(24)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)? Results:

• Yes!

• Almost (for other products)

(25)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)? Results:

• Yes!

• Almost (for other products)

9

(26)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)?

Results:

• Yes!

• Almost (for other products)

(27)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)?

Results:

• Yes!

• Almost (for other products)

9

(28)

Outline

Polynomial products and linear maps

Space-preserving reductions

In-place algorithms from out-of-place algorithms

(29)

Polynomial products and linear maps

(30)

Short product

=

×

n n−1

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

(31)

Short product

×

=

n n−1

low short product high short product

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

11

(32)

Short product

×

=

n n−1

low short product high short product

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

(33)

Middle product

=

× 2n−1

3n−2

=

× 2n−1

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

12

(34)

Middle product

=

×

n n−1

n−1

2n−1

middle product

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

(35)

Middle product

=

×

n n−1

n−1

2n−1

middle product

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

12

(36)

Multiplications as linear maps

Example:

f = 3X2+ 2X+ 1 g =X2+ 2X + 4

fg = 3X4+ 8X3+ 17X2+ 10X + 4

1 2 1 3 2 1

3 2 3

4 2 1

=

4 10 17 8 3

(37)

Multiplications as linear maps

Example:

f = 3X2+ 2X+ 1 g =X2+ 2X + 4

fg = 3X4+ 8X3+ 17X2+ 10X + 4

1 2 1 3 2 1

3 2 3

4 2 1

=

4 10 17 8 3

13

(38)

Multiplications as linear maps Full product:

=

× n

2n−1

(39)

Multiplications as linear maps Short products:

=

× n

n

n−1

14

(40)

Multiplications as linear maps Middle product:

=

× 3n3n−−11

(41)

Multiplications as linear maps Middle product:

=

×

n

14

(42)

Multiplications as linear maps

For simplicity in the presentation we assume

FP

SPhi

MP SPlo

(43)

Space-preserving reductions

(44)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings! 3We provide space/time preserving reductions

(45)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings! 3We provide space/time preserving reductions

15

(46)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings!

3We provide space/time preserving reductions

(47)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings!

3We provide space/time preserving reductions

15

(48)

A relevant notion of reduction

Definitions

• TISP(t(n),s(n)): computable in time t(n) and space s(n)

Ac B:A is computable with oracleB if B∈TISP(t(n),s(n)) then

A∈TISP(c t(n) +o(t(n)),s(n) +O(1))

Ac B:Ac B andBc A

Example

A1B meansAand B are equivalent for both time and space

(49)

A relevant notion of reduction

Definitions

• TISP(t(n),s(n)): computable in time t(n) and space s(n)

Ac B:A is computable with oracleB if B∈TISP(t(n),s(n)) then

A∈TISP(c t(n) +o(t(n)),s(n) +O(1))

Ac B:Ac B andBc A Example

A1B meansAand B are equivalent for both time and space

16

(50)

First results in a nutshell

Theorem

FP

SPhi

2 ≡ ≤1 MP

1

SPlo

(51)

Visual proof

Use offake padding (in input,notin output!)

• SPlo(n)≤MP(n); SPhi(n)≤MP(n−1) SPhi SPlo

= =

• FP(n)≤SPhi(n) + SPlo(n)≤MP(n) + MP(n−1)

FP = =

18

(52)

Visual proof

Use offake padding (in input,notin output!)

• SPlo(n)≤MP(n); SPhi(n)≤MP(n−1) SPhi SPlo

= =

• FP(n)≤SPhi(n) + SPlo(n)≤MP(n) + MP(n−1)

FP = =

(53)

Half-additive full product: hh+f ·g

f × g

+=

n−1

n h

n FP+lo:

RemarkFP+lo1FP+hi using reversal polynomials TheoremFP+2 SP and SP≤3/2FP+

19

(54)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials TheoremFP+2 SP and SP≤3/2FP+

(55)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials

TheoremFP+2 SP and SP≤3/2FP+

19

(56)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials

Theorem + +

(57)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(58)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

(59)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(60)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

(61)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(62)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

× =

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(63)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(64)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(65)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(66)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(67)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(68)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(69)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(70)

Converse directions?

• From FP to SP:

• problem with the output size

• without space restriction: is SP(n)'FP(n/2)?

• From SP to MP:

• partial result:

• up to log(n) increase in time complexity

• techniques from next part

• without space restriction

• FP to MP through Tellegen’s transposition principle

(71)

Converse directions?

• From FP to SP:

• problem with the output size

• without space restriction: is SP(n)'FP(n/2)?

• From SP to MP:

• partial result:

• up to log(n) increase in time complexity

• techniques from next part

• without space restriction

• FP to MP through Tellegen’s transposition principle

22

(72)

Summary of results so far iSP

iMP oMP

oFP iFP

oSP

3/2 2

1

1 2

m/n 1

≤1

(73)

In-place algorithms from

out-of-place algorithms

(74)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

(75)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

24

(76)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

(77)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

24

(78)

Tail recursion and fake padding

• Tail recursion:

• Only one recursive call + last (or first) instruction

• No need of recursive stack avoidO(logn) extra space

Fakepadding:

• Pretend to pad inputs with zeroes

• Make the data structure responsible for it

O(1) increase in memory

Cf.strides in dense linear algebra

• OK in inputs, not in outputs!

(79)

Tail recursion and fake padding

• Tail recursion:

• Only one recursive call + last (or first) instruction

• No need of recursive stack avoidO(logn) extra space

Fakepadding:

• Pretend to pad inputs with zeroes

• Make the data structure responsible for it

O(1) increase in memory

Cf.strides in dense linear algebra

• OK in inputs, not in outputs!

25

(80)

Our results

• In-place full product (half additive) in time (2c+ 7)M(n)

• In-place short product in time(2c+ 5)M(n)

• In-place middle product in time O(M(n) logn)

(81)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

27

(82)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

k

k

2k−1

ck

(83)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

=

= k

k

27

(84)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

k

dn/ke

(85)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ˆf g0) +X2kˆfgˆ

×

× =

× k

dn/ke dn/ke −1

27

(86)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ˆf g0) +X2kˆfgˆ

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

(87)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck + 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

28

(88)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck+ 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

(89)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck+ 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

28

(90)

In-place short product

× =

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(91)

In-place short product

× =

k

k

ck

k

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(92)

In-place short product

× =

k k

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(93)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(94)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(95)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(96)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(97)

In-place middle product

=

×

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

30

(98)

In-place middle product

=

× k

dn/ke

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

(99)

In-place middle product

=

× k

dn/ke nk

n

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

30

Références

Documents relatifs

Under the assumption that the special graph Γ has no symmetry and that its marked points are not periodic, we conversely prove that the set of polynomials such that Γ(P ) = Γ defines

In particular we generate low-level algebraic proofs needed to validate the results of ideal membership testing used in arithmetic circuit verification by translat- ing proofs

In this paper, we give a sharp lower bound for the first (nonzero) Neumann eigenvalue of Finsler-Laplacian in Finsler manifolds in terms of diameter, dimension, weighted

It is based on a product formula of an unfamiliar type (see (2.9) below) which involves two different sets of orthogonal functions which may not consist of

It remarkably replaced the class of Riesz transforms by more general classes of kernel operators obeying a certain point separation criterion for their Fourier multiplier symbols..

Ilias, Immersions minimales, premiere valeur propre du Lapla- cien et volume conforme, Mathematische Annalen, 275(1986), 257-267; Une ine- galit´e du type ”Reilly” pour

The abstract solver starts from the empty state and from a given formula (in this case a logic program Π) and applies transition rules until it reaches either a FailState state, or

Réversible automata form a natural class of automata with links to artificial intelligence [1], and to biprefix codes [4], Moreover réversible automata are used to study