Adjoint computation and Backpropagation

(1)

HAL Id: hal-02400746

https://hal.inria.fr/hal-02400746

Submitted on 9 Dec 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Adjoint computation and Backpropagation

Guillaume Pallez

To cite this version:

Guillaume Pallez. Adjoint computation and Backpropagation. Meeting of the Royal Society –

Nu-merical algorithms for high-performance computational science, Apr 2019, London, United Kingdom.

�hal-02400746�

(2)

Adjoint computation and Backpropagation

Guillaume Pallez (Aupy)

Inria & University of Bordeaux

(3)

Who’s who

Automatic Differentiation

Paul Hovland (Argonne)

Navjot Kukreja (Imperial College)

Krishna Narayanan (Argonne)

Machine Learning (I)

Alexis Joly (Inria)

Machine Learning (II)

Alena Shilova (Inria)

Scheduling

Guillaume Pallez (Inria)

Olivier Beaumont (Inria)

(4)

(5)

Adjoint Computation and

Backpropagation

(6)

Ice-sheet model (I)

“In climate modelling, Ice-sheet models use numerical methods to simulate the

evolution, dynamics and thermodynamics of ice sheets.” (wikipedia)

Simpler Version:

proc Model Algorithm(u0, y)

begin Do stuff; for i = 0 to n do ui+1= fi(ui); Do stuff; end /* F (u0) = fn◦ fn−1◦ . . . ◦ f0(u0) */ Compute∇F (u0)y ; end

(7)

Ice-sheet model (I)

“In climate modelling, Ice-sheet models use numerical methods to simulate the

evolution, dynamics and thermodynamics of ice sheets.” (wikipedia)

Simpler Version:

proc Model Algorithm(u0, y)

begin Do stuff; for i = 0 to n do ui+1= fi(ui); Do stuff; end /* F (u0) = fn◦ fn−1◦ . . . ◦ f0(u0) */ Compute∇F (u0)y ; end

(8)

Ice-sheet model (II)

A quick reminder about the gradient:

F (u0) = fn◦ fn−1◦ . . . ◦ f1◦ f0(u0)

∇F (u0)y = J f0(u0)T _{· ∇(fn}_{◦ f1)(u1) · y}

= J f0(u0)T · J f1(u1)T · . . . · J fn−1(un−1)T· J fn(un)T · y

J fT = Transpose Jacobian matrix of f ; ui+1 = fi(ui) = fi(fi−1◦ . . . ◦ f0(u0)) .

(9)

A better solution?

∇F (u

₀

)y = J f

0

(u

0

)

T

· J f

1

(u

1

)

T

· . . . · J f

n−1

(u

n−1

)

T

· J f

n

(u

n

)

T

· y

proc Algo A(u0, y)

begin Do stuff; for i = 0 to n do ui+1= fi(ui); Do stuff; end Compute ∇F (u0)y; end

→ What is the problem with Algo A?

proc Algo B(u0, y)

begin Do stuff; for i = 0 to n do

ui+1= fi(ui);

Do stuff;

vi+1= vi· Jfi+1(ui+1)T;

end end

→ What is the problem with Algo B?

∇F (u0)y = . . .J f0T· J f1T · . . . · J fn−1T · J fnT · y nMatMatops ∇F (u0)y = J f0T· J f1T· . . . · J fn−1T· J fnT· y. . . nMatVecops

(10)

A better solution?

∇F (u

₀

)y = J f

0

(u

0

)

T

· J f

1

(u

1

)

T

· . . . · J f

n−1

(u

n−1

)

T

· J f

n

(u

n

)

T

· y

proc Algo A(u0, y)

proc Algo B(u0, y)

ui+1= fi(ui);

Do stuff;

end end

(11)

A better solution?

∇F (u

₀

)y = J f

0

(u

0

)

T

· J f

1

(u

1

)

T

· . . . · J f

n−1

(u

n−1

)

T

· J f

n

(u

n

)

T

· y

proc Algo A(u0, y)

proc Algo B(u0, y)

ui+1= fi(ui);

Do stuff;

end end

(12)

A better solution?

∇F (u

₀

)y = J f

0

(u

0

)

T

· J f

1

(u

1

)

T

· . . . · J f

n−1

(u

n−1

)

T

· J f

n

(u

n

)

T

· y

proc Algo A(u0, y)

proc Algo B(u0, y)

ui+1= fi(ui);

Do stuff;

end end

(13)

A better solution?

∇F (u

₀

)y = J f

0

(u

0

)

T

· J f

1

(u

1

)

T

· . . . · J f

n−1

(u

n−1

)

T

· J f

n

(u

n

)

T

· y

proc Algo A(u0, y)

proc Algo B(u0, y)

ui+1= fi(ui);

Do stuff;

end end

∇F (u0)y = . . .J f0T· J f1T · . . . · J fn−1T · J fnT · y nMatMatops ∇F (u )y = J f T·J f T· . . . ·J f T·J fnT· y. . . nMatVecops

(14)

Adjoint computation

Fi(xi) = xi+1 i < l ¯ Fi(xi, ¯xi+1) = ¯xi i ≤ l F0 F1 · · · Fl−2 Fl−1 ¯ F0 F¯1 F¯2 · · · F¯l−1 T x0 x1 x2 xl−2 xl−1 ¯ xl ¯ xl−1 ¯ x3 ¯ x2 ¯ x1 ¯ x0 x0 _x1 _x2 xl−1 _xl

(15)

Adjoint computation

Fi(xi) = xi+1 i < l ¯ Fi(xi, ¯xi+1) = ¯xi i ≤ l F0 F1 · · · Fl−2 Fl−1 ¯ F0 F¯1 F¯2 · · · F¯l−1 T x0 x1 x2 xl−2 xl−1 ¯ xl ¯ xl−1 ¯ x3 ¯ x2 ¯ x1 ¯ x0 x0 x1 x2 xl−1 xl

(16)

Adjoint computation

(17)

Adjoint computation

(18)

Adjoint computation

Fi(xi) = xi+1 i < l ¯ Fi(xi, ¯xi+1) = ¯xi i ≤ l F0 F1 · · · Fl−2 Fl−1 ¯ F0 F¯1 F¯2 · · · F¯l−1 T x0 x1 x2 xl−2 xl−1 ¯ xl ¯ xl−1 ¯ x3 ¯ x2 ¯ x1 ¯ x0 x0 x1 x2 xl−1 xl

(19)

Relation to IA? (I)

GoogleNet graph:

(20)

Relation to IA? (I)

(21)

Relation to IA? (II)

4

Derivatives in machine learning

“Backprop” and gradient descent are at the core of all recent advances Computer vision

NVIDIA DRIVE PX 2 segmentation

Top-5 error rate for ImageNet (NVIDIA devblog) Faster R-CNN (Ren et al. 2015)

Speech recognition & synthesis

Word error rates (Huang et al., 2014) Google Neural Machine Translation System (GNMT)

(22)

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T

I Memory to store output of

computations (xi or ¯xi). Initial state: contains x0.

I Cost to write: wm= 0, I Cost to read: rm= 0.

Example of execution

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(23)

Model of execution

F0 F1 F2F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(24)

Model of execution

F0 F1 F2F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(25)

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(26)

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(27)

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(28)

Model of execution

F0 F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(29)

Model of execution

F0 F1 F2 F3 F4

Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(30)

Model of execution

F0 F0 F1 F2 F3 F4 Peak Mem ¯ F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(31)

Model of execution

F0 F0 F1F1 F2 F3 F4 Peak Mem ¯ F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(32)

Model of execution

F0 F0 F1F1 F2F2 F3 F4 Peak Mem ¯ F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(33)

Model of execution

F0 F0 F1F1 F2F2 F3F3 F4 Peak Mem ¯ F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(34)

Model of execution

F0

F0 F1F1 F2F2 F3F3 F4F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(35)

Model of execution

F0

F0 F1F1 F2F2 F3F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(36)

Model of execution

F0

F0 F1F1 F2F2 F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(37)

Model of execution

F0

F0 F1F1 F2 F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(38)

Model of execution

F0

F0 F1 F2 F3 F4 Peak Mem

¯

F0 F1¯ F2F2¯¯ F3¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(39)

Model of execution

¯

F0 F1F1¯¯ F2¯ F3¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(40)

Model of execution

¯ F0¯

F0 F1¯ F2¯ F3¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(41)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T 6

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(42)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(43)

Model of execution

F0

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(44)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(45)

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(46)

Model of execution

F0 F1 F2 F3F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(47)

Model of execution

F0 F1 F2 F3 F4F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(48)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ TT

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(49)

Model of execution

F0

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 3

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(50)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 3

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(51)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 3

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(52)

Model of execution

F0 F1 F2 F3F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 3

Strategy Time Space

Store all $ $$$

Store “none”

$$$ $

No reuse $$ $$

(53)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T 3

Strategy Time Space

Store all $ $$$

Store “none” $$$ $

No reuse $$ $$

(54)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(55)

Model of execution

F0

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(56)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(57)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(58)

Model of execution

F0 F1 F2F2 F3F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(59)

Model of execution

F0 F1 F2F2 F3 F4F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(60)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(61)

Model of execution

F0 F1 F2F2 F3F3 F4 Peak Mem

¯

F0 F1¯ F2¯ F3¯ F4¯ TT 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(62)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4F4¯¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(63)

Model of execution

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(64)

Model of execution

F0

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(65)

Model of execution

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(66)

Model of execution

¯

F0 F1¯ F2F2¯¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(67)

Model of execution

F0

¯

F0 F1¯ F2F2¯¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(68)

Model of execution

¯

F0 F1F1¯¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(69)

Model of execution

¯ F0¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(70)

Model of execution

¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(71)

Model of execution

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(72)

Model of execution

F0

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(73)

Model of execution

F0

¯

F0 F1¯ F2¯ F3F3¯¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(74)

Model of execution

F0

¯

F0 F1¯ F2F2¯¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(75)

Model of execution

¯

F0 F1F1¯¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(76)

Model of execution

¯ F0¯

F0 F1¯ F2¯ F3¯ F4¯ T 4

Strategy Time Space

Store all $ $$$

No reuse $$ $$

(77)

Problem formulation

We want to minimize the makespan of:

Initial state: AC graph: size l Steps: uf, ub Memory: cm, wm= rm= 0, Mini= {x0} Storage k: ck, wk, rk, Sini= ∅ F0 F1 · · · Fl−2 Fl−1 ¯ F0 F¯1 F¯2 · · · F¯l−1 T

(78)

Existing work

Question: How to organize the reverse execution of intermediate steps?

What do we store, what do we recompute?

I Store all: memory expensive

I Recompute all: compute expensive

I Intermediate status?

(79)

Bounded memory

Griewand and Walther, 2000: Revolve(l, cm), optimal algorithm with cmmemory

(80)

Storage hierarchy

A., Herrmann, Hovland, Robert, 2015: Optimal algorithm for two level of storage: cheap bounded memory and costly unbounded disks.

A., Herrmann, 2019: Library of optimal schedules for any number of storage level. (https://gitlab.inria.fr/adjoint-computation)

Forward phase

F0 Fl−1

Revolve(m, cm)

(81)

What directions for AI?

Then what? Are we done? Just let AD people and ML people talk together!

Cut the middle-(scheduling)-people!

(82)

What directions for AI?

Then what? Are we done? Just let AD people and ML people talk together!

Cut the middle-(scheduling)-people!

(83)

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100 − 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executed

simultaneously), shared memory?

I Etc.

(84)

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100 − 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executed

simultaneously), shared memory?

I Etc.

(85)

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100 − 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executed

simultaneously), shared memory?

I Etc.

(86)

Dir. for AI: Graph structure

(87)

Dir. for AI: Graph structure

Remember this google network:

Nice try, we are not going to

(88)

Dir. for AI: Graph structure II

Source: Rao et al., A Deep Siamese Neural Network (...), 2016

Source: Sur´ıs et al., Cross-Modal Embeddings for Video and Audio Retrieval, 2018

(89)

Dir. for AI: Graph structure II

Pitchfork graph

1

_{(aka join graphs):}

Theorem (A., Beaumont, Herrmann, Shilova, 2019)

Given a bounded memory and a pitchfork with a bounded number of “teeth”, we can find in polynomial time the solution that backpropagates it in minimal time.

(90)

A Grasp of the proof? (I)

Three phase algorithm:

1

Forward phase

2

Turn

3

Backward phase

I Traverse all branches. Write some

intermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed data

from one of the branches, backpropagate a

subset of the graph (can write additional

intermediate data)

(91)

A Grasp of the proof? (I)

Three phase algorithm:

1

Forward phase

2

Turn

3

Backward phase

I Traverse all branches. Write some

intermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed data

from one of the branches, backpropagate a

subset of the graph (can write additional

intermediate data)

(92)

A Grasp of the proof? (I)

Three phase algorithm:

1

Forward phase

2

Turn

3

Backward phase

I Traverse all branches. Write some

intermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed data

from one of the branches, backpropagate a

subset of the graph (can write additional

intermediate data)

(93)

A Grasp of the proof? (I)

Three phase algorithm:

1

Forward phase

2

Turn

3

Backward phase

I Traverse all branches. Write some

intermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed data

from one of the branches, backpropagate a

subset of the graph (can write additional

intermediate data)

(94)

A Grasp of the proof? (II)

It relies on key properties of the

backward phase:

I Stability of execution

I Checkpoint persistence

(95)

A Grasp of the proof? (II)

It relies on key properties of the

backward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Stability 1)

If F

i

is “backpropagated”, then there are

(96)

A Grasp of the proof? (II)

It relies on key properties of the

backward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Checkpoint persistence)

If x

i

is stored, until F

i

is

“backpropagated”, there are no F

j

for

(97)

A Grasp of the proof? (II)

It relies on key properties of the

backward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Stability 2)

If x

i

is read, then there are no F

j

on

(98)

A Grasp of the proof? (II)

It relies on key properties of the

backward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

In this case, for a given forward phase,

we get a multi-phase backward phase:

Forward phase

F0 F1 F2 Fl−1

I Where do we schedule the

checkpoints in the forward phase?

I In which order do we execute the

(99)

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!

With a memory of size O(M ):

I Store All can execute a graph of size O(M ) in time O(M ); I Revolve can execute a graph of size O(eM_{) in time O(M e}M_)! I H-Revolve inproves performance by a factor of magnitude.