• Aucun résultat trouvé

Memory-efficient polynomial arithmetic

N/A
N/A
Protected

Academic year: 2022

Partager "Memory-efficient polynomial arithmetic"

Copied!
109
0
0

Texte intégral

(1)

Memory-efficient polynomial arithmetic

Pascal Giorgi1 Bruno Grenet1 Daniel S. Roche2

1 LIRMM, Université de Montpellier 2 CS Department, US Naval Academy

1

(2)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk

For i = 0 to n-1: For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

(3)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

2

(4)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: split F and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

(5)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: splitF and G in three or more parts

• FFT-based algorithms:

(F,G)−−→eval. (F(ωi),G(ωi))i mult.

−−−→FG(ωi)i interp.

−−−−→FG

2

(6)

Multiplication of polynomials

Input.F =Pn−1i=0 F[i]Xi and G=Pn−1j=0 G[j]Xj

Output.H =F ×G =P2n−2k=0 H[k]Xk For i = 0 to n-1:

For j = 0 to n-1:

H[i+j] += F[i]*G[j]

• Karatsuba’s algorithm:f0+Xn2f1·g0+Xn2g1

=f0g0+ ((f0+f1)(g0+g1)−f0g0f1g1)Xn2 +f1g1Xn

• Toom-Cook algorithm: splitF and G in three or more parts

• FFT-based algorithms:

eval. mult. interp.

(7)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

3

(8)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

(9)

Time complexity of polynomial arithmetic

• Multiplication: M(n)

• Naïve:O(n2)

• Karatsuba:O(nlog23) =O(n1.585) Karatsuba (1962)

• Toom-3:O(nlog35) =O(n1.465) Toom (1963), Cook (1966)

• FFT-based:

O(nlogn) with 2n-th root of unity Cooley, Tukey (1965)

O(nlognlog logn) Schönhage, Strassen (1971)

• Other tasks:

• Euclidean division:O(M(n))

• GCD:O(M(n) logn)

• Evaluation & interpolation:O(M(n) logn)

• . . .

What about space complexity?

3

(10)

Space complexity of polynomial arithmetic

First thought:count extra memory apart from input/output - Naive algorithm:O(1)

- Karatsuba, Toom-3, FFT:O(n)

- Other tasks: oftenO(n), sometime O(nlogn)

However, need to precise the complexity model !!!

(11)

Space complexity of polynomial arithmetic

First thought:count extra memory apart from input/output - Naive algorithm:O(1)

- Karatsuba, Toom-3, FFT:O(n)

- Other tasks: oftenO(n), sometime O(nlogn) However, need to precise the complexity model !!!

4

(12)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

(13)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

5

(14)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

(15)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication

X

Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

• Variant: inputs must be restored at the end

5

(16)

Space-complexity models

Algebraic-RAM machine:

Standard registers of sizeO(logn)

Algebraicregisters containing one coefficient

• Read-only input / write-only output

• (Close to) classical complexity theory

• Lower bound Ω(n2) on time × space for multiplication X • Read-only input / read-write output

Reasonablefrom a programmer’s viewpoint

• Read-write input and output

• Too permissive in general

(17)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

6

(18)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

(19)

Previous results

Karatsuba’s algorithm:

f0+Xn2f1

·

g0+Xn2g1

=f0g0+((f0+f1)(g0+g1)−f0g0−f1g1)Xn2+f1g1Xn

with some intuition space of 2n

• Thomé (2002) : space ofn+O(logn)

→ careful use output + n temp. registers + O(logn) stack

• Roche (2009): space of only O(logn)

half-additive version (h←h`+fg where deg(h`)<n)

6

(20)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG

space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

(21)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

7

(22)

Previous results

FFT-based algorithms:

(F,G)→(F(ωi),Gi))iFG(ωi)iFG space of 2n : FFT is in-place (overwriting) but # points≈2n

• Roche (2009): space of O(1) whenn= 2k andω2n= 1

→ compute half of the result + recurse

• Harvey-Roche (2010): space ofO(1) whenω2n= 1

→ same with TFT v.d. Hoeven (2004)

(23)

Previous results

Summary of complexities

Algorithms Time complexity Space complexity

naive 2n2+ 2n−1 O(1)

Karatsuba (’62) <6.5nlog(3) ≤2n+ 5 log(n) Karatsuba (Thomé’02) <7nlog(3)n+ 5 log(n)

Karatsuba(Roche’09) <10nlog(3) ≤5 log(n) Toom-3 (’63) < 734 nlog3(5) ≤2n+ 5 log3(n)

FFT(CT’65) 9nlog(2n) +O(n) 2n

FFT(Roche’09) 11nlog(2n) +O(n) O(1)

TFT(HR’10) O(nlog(n)) O(1)

8

(24)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)? Results:

• Yes!

• Almost (for other products)

(25)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)? Results:

• Yes!

• Almost (for other products)

9

(26)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)?

Results:

• Yes!

• Almost (for other products)

(27)

Our problematic

Can every polynomial multiplication algorithm be performed without extra memory?

O(1)-space Karatsuba’s algorithm?

• What about Toom-Cook algorithm?

• What about other products (short and middle)?

Results:

• Yes!

• Almost (for other products)

9

(28)

Outline

Polynomial products and linear maps

Space-preserving reductions

In-place algorithms from out-of-place algorithms

(29)

Polynomial products and linear maps

(30)

Short product

=

×

n n−1

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

(31)

Short product

×

=

n n−1

low short product high short product

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

11

(32)

Short product

×

=

n n−1

low short product high short product

• Low short product: product of truncated power series

• Useful in other algorithms

• Time complexity: M(n)

• Space complexity: O(n)

(33)

Middle product

=

× 2n−1

3n−2

=

× 2n−1

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

12

(34)

Middle product

=

×

n n−1

n−1

2n−1

middle product

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

(35)

Middle product

=

×

n n−1

n−1

2n−1

middle product

• Useful for Newton iteration

GG(1GF) modX2nwithGF = 1 +XnH

• division, square root, . . .

• Time complexity: M(n) → Tellegen’s transposition

• Space complexity: O(n)

12

(36)

Multiplications as linear maps

Example:

f = 3X2+ 2X+ 1 g =X2+ 2X + 4

fg = 3X4+ 8X3+ 17X2+ 10X + 4

1 2 1 3 2 1

3 2 3

4 2 1

=

4 10 17 8 3

(37)

Multiplications as linear maps

Example:

f = 3X2+ 2X+ 1 g =X2+ 2X + 4

fg = 3X4+ 8X3+ 17X2+ 10X + 4

1 2 1 3 2 1

3 2 3

4 2 1

=

4 10 17 8 3

13

(38)

Multiplications as linear maps Full product:

=

× n

2n−1

(39)

Multiplications as linear maps Short products:

=

× n

n

n−1

14

(40)

Multiplications as linear maps Middle product:

=

× 3n3n−−11

(41)

Multiplications as linear maps Middle product:

=

×

n

14

(42)

Multiplications as linear maps

For simplicity in the presentation we assume

FP

SPhi

MP SPlo

(43)

Space-preserving reductions

(44)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings! 3We provide space/time preserving reductions

(45)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings! 3We provide space/time preserving reductions

15

(46)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings!

3We provide space/time preserving reductions

(47)

Relative difficulties of products

• Without space restrictions:

• SPFP and FPSPlo+ SPhi

• MPFP (transposition)

• MPSPlo+ SPhi+ (n1) additions

• Size of inputs and outputs:

• FP : (n,n)2n1

• SPlo: (n,n)n

• SPhi: (n1,n1)n1

• MP : (2n1,n)n

7Reductions unusable in space-restricted settings!

3We provide space/time preserving reductions

15

(48)

A relevant notion of reduction

Definitions

• TISP(t(n),s(n)): computable in time t(n) and space s(n)

Ac B:A is computable with oracleB if B∈TISP(t(n),s(n)) then

A∈TISP(c t(n) +o(t(n)),s(n) +O(1))

Ac B:Ac B andBc A

Example

A1B meansAand B are equivalent for both time and space

(49)

A relevant notion of reduction

Definitions

• TISP(t(n),s(n)): computable in time t(n) and space s(n)

Ac B:A is computable with oracleB if B∈TISP(t(n),s(n)) then

A∈TISP(c t(n) +o(t(n)),s(n) +O(1))

Ac B:Ac B andBc A Example

A1B meansAand B are equivalent for both time and space

16

(50)

First results in a nutshell

Theorem

FP

SPhi

2 ≡ ≤1 MP

1

SPlo

(51)

Visual proof

Use offake padding (in input,notin output!)

• SPlo(n)≤MP(n); SPhi(n)≤MP(n−1) SPhi SPlo

= =

• FP(n)≤SPhi(n) + SPlo(n)≤MP(n) + MP(n−1)

FP = =

18

(52)

Visual proof

Use offake padding (in input,notin output!)

• SPlo(n)≤MP(n); SPhi(n)≤MP(n−1) SPhi SPlo

= =

• FP(n)≤SPhi(n) + SPlo(n)≤MP(n) + MP(n−1)

FP = =

(53)

Half-additive full product: hh+f ·g

f × g

+=

n−1

n h

n FP+lo:

RemarkFP+lo1FP+hi using reversal polynomials TheoremFP+2 SP and SP≤3/2FP+

19

(54)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials TheoremFP+2 SP and SP≤3/2FP+

(55)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials

TheoremFP+2 SP and SP≤3/2FP+

19

(56)

Half-additive full product: hh+f ·g

f × g

+=

n−1 n

h

n FP+hi:

RemarkFP+lo1FP+hi using reversal polynomials

Theorem + +

(57)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(58)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

(59)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(60)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

(61)

From SP to FP+

× +=

× +=

FP+lo(n)≤SPlo(n) + SPhi(n) +n−1

20

(62)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

× =

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(63)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(64)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(65)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(66)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(67)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(68)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

(69)

From FP+ to SP

f0+Xdn/2ef1·g0+Xdn/2eg1=f0g0+Xdn/2e(f0g1+f1g0) modXn

=

×

SPlo(n)≤FP(bn/2c) + FP+lo(bn/2c) + FP+hi(dn/2e)

21

(70)

Converse directions?

• From FP to SP:

• problem with the output size

• without space restriction: is SP(n)'FP(n/2)?

• From SP to MP:

• partial result:

• up to log(n) increase in time complexity

• techniques from next part

• without space restriction

• FP to MP through Tellegen’s transposition principle

(71)

Converse directions?

• From FP to SP:

• problem with the output size

• without space restriction: is SP(n)'FP(n/2)?

• From SP to MP:

• partial result:

• up to log(n) increase in time complexity

• techniques from next part

• without space restriction

• FP to MP through Tellegen’s transposition principle

22

(72)

Summary of results so far iSP

iMP oMP

oFP iFP

oSP

3/2 2

1

1 2

m/n 1

≤1

(73)

In-place algorithms from

out-of-place algorithms

(74)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

(75)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

24

(76)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

(77)

Framework

• In-place algorithms parametrized by out-of-place algorithm

• Out-of-place: usescnextra space

• Constantc known to the algorithm

• Goal:

• Space complexity:O(1)

• Time complexity: closest to the out-of-place algorithm

• Technique:

• Oracle calls in smaller size

Fakepadding

Tailrecursive call

Similar approach for matrix mul. :Boyer, Dumas, Pernet, Zhou (2009)

24

(78)

Tail recursion and fake padding

• Tail recursion:

• Only one recursive call + last (or first) instruction

• No need of recursive stack avoidO(logn) extra space

Fakepadding:

• Pretend to pad inputs with zeroes

• Make the data structure responsible for it

O(1) increase in memory

Cf.strides in dense linear algebra

• OK in inputs, not in outputs!

(79)

Tail recursion and fake padding

• Tail recursion:

• Only one recursive call + last (or first) instruction

• No need of recursive stack avoidO(logn) extra space

Fakepadding:

• Pretend to pad inputs with zeroes

• Make the data structure responsible for it

O(1) increase in memory

Cf.strides in dense linear algebra

• OK in inputs, not in outputs!

25

(80)

Our results

• In-place full product (half additive) in time (2c+ 7)M(n)

• In-place short product in time(2c+ 5)M(n)

• In-place middle product in time O(M(n) logn)

(81)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

27

(82)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

k

k

2k−1

ck

(83)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

=

= k

k

27

(84)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ ˆf g0) +X2kˆfgˆ

×

× =

k

dn/ke

(85)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ˆf g0) +X2kˆfgˆ

×

× =

× k

dn/ke dn/ke −1

27

(86)

In-place FP+ from out-of-place FP

(f0+Xkˆf)·(g0+Xkgˆ) =f0g0+Xk(f0gˆ+ˆf g0) +X2kˆfgˆ

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

(87)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck + 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

28

(88)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck+ 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

(89)

Analysis

×

× =

×

× k

dn/ke dn/ke −1

nk

nk−1

nk

ck+ 2k−1≤nk =⇒ kn+1c+3

T(n) = (2dn/ke −1)(M(k) + 2k−1) +T(n−k)

T(n)≤(2c+ 7)M(n) +o(M(n))

28

(90)

In-place short product

× =

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(91)

In-place short product

× =

k

k

ck

k

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(92)

In-place short product

× =

k k

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(93)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(94)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(95)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

29

(96)

In-place short product

× =

k k

dn/ke

kn/(c+ 2)

T(n) =dn/keM(k)+(dn/ke−1)M(k−1)+2k(dn/ke−1)+T(n−k)

T(n)≤(2c+ 5)M(n) +o(M(n))

(97)

In-place middle product

=

×

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

30

(98)

In-place middle product

=

× k

dn/ke

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

(99)

In-place middle product

=

× k

dn/ke nk

n

• Recursive call on chunks off. . . but withfull g!

T(n,m) =dn/keM(k) +T(n,mk)

T(n,n)

M(n) logc+2

c+1(n) +o(M(n) logn) if M(n) is quasi-linear

O(M(n)) otherwise

30

Références

Documents relatifs

The abstract solver starts from the empty state and from a given formula (in this case a logic program Π) and applies transition rules until it reaches either a FailState state, or

In this paper, we give a sharp lower bound for the first (nonzero) Neumann eigenvalue of Finsler-Laplacian in Finsler manifolds in terms of diameter, dimension, weighted

It remarkably replaced the class of Riesz transforms by more general classes of kernel operators obeying a certain point separation criterion for their Fourier multiplier symbols..

It is based on a product formula of an unfamiliar type (see (2.9) below) which involves two different sets of orthogonal functions which may not consist of

Under the assumption that the special graph Γ has no symmetry and that its marked points are not periodic, we conversely prove that the set of polynomials such that Γ(P ) = Γ defines

Réversible automata form a natural class of automata with links to artificial intelligence [1], and to biprefix codes [4], Moreover réversible automata are used to study

In particular we generate low-level algebraic proofs needed to validate the results of ideal membership testing used in arithmetic circuit verification by translat- ing proofs

Ilias, Immersions minimales, premiere valeur propre du Lapla- cien et volume conforme, Mathematische Annalen, 275(1986), 257-267; Une ine- galit´e du type ”Reilly” pour