• Aucun résultat trouvé

Building the Tangent and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic Differentiation tool TAPENADE

N/A
N/A
Protected

Academic year: 2023

Partager "Building the Tangent and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic Differentiation tool TAPENADE"

Copied!
32
0
0

Texte intégral

(1)

HAL Id: inria-00192415

https://hal.inria.fr/inria-00192415v2

Submitted on 29 Nov 2007

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub-

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non,

General Circulation Model OPA with the Automatic Differentiation tool TAPENADE

Moulay Hicham Tber, Laurent Hascoet, Arthur Vidard, Benjamin Dauvergne

To cite this version:

Moulay Hicham Tber, Laurent Hascoet, Arthur Vidard, Benjamin Dauvergne. Building the Tangent

and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic Differentiation

tool TAPENADE. [Research Report] RR-6372, INRIA. 2007, pp.28. �inria-00192415v2�

(2)

inria-00192415, version 2 - 29 Nov 2007 a p p o r t

Thème NUM

Building the Tangent and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic

Differentiation tool TAPENADE

M.H. Tber — L. Hascoët — A. Vidard — B. Dauvergne

N° 6372

Novembre 2007

(3)
(4)

Automati Dierentiation tool TAPENADE

M.H. Tber

, L. Hasoët

, A. Vidard

, B. Dauvergne

Thème NUM Systèmes numériques

Projet Tropis

Rapport de reherhe n°6372 Novembre 2007 28 pages

Abstrat: The oean general irulation model OPA is developed by the

LODYC team at Paris VI university. OPA has reently undergone a major

rewriting, migrating to FORTRAN95, and its adjoint ode needs to be re-

built. For earlier versions, the adjoint of OPA was written by hand at a high

development ost. We use the Automati Dierentiation tool TAPENADE

to build mehanialy the tangent and adjoint odes of OPA. We validate the

dierentiated odes by omparison with divided dierenes, and alsowith an

idential twin experiment. We apply state-of-the-art methods toimprove the

performane of the adjoint ode. In partiular we implement the Griewank

and Walther's binomial hekpointing algorithm whih gives us an optimal

trade-o between time and memory onsumption. We apply a spei strat-

egytodierentiatetheiterativelinearsolverthatomesfromtheimpliittime

stepping sheme.

Key-words: OPA, generalirulation model, TAPENADE, Automati Dif-

ferentiation,reverse mode, Adjoint Code, Chekpointing

INRIASophiaAntipolis,Frane

INRIAGrenoble,Frane

(5)

modèle de irulation générale oéanique OPA

par l'outil de Diérentiation Automatique

TAPENADE

Résumé : Le modèle de irulation générale oéanique OPA est développé

par l'équipe LODYC de l'université Paris VI. La nouvelle version 9 d'OPA

onstitueuneévolutionmajeure,aveenpartiulierunemigrationversFORTRAN95.

LesodesLinéaireTangentetAdjointd'OPA,quiauparavantétaientéritsàla

main,doiventdon être redéveloppés. Nousutilisonsl'outil de Diérentiation

AutomatiqueTAPENADEpouronstruirelesodesTangentetAdjointd'OPA

9. Nous validons les dérivées obtenues par omparaison ave les Diérenes

Divisées etsur deux appliationstest inluant des expérienes jumelles. Nous

utilisonsleshéma de hekpointing réursifbinomialdeGriewanketWalther

pour améliorerlesperformanes du ode adjoint. Nousutilisons une stratégie

spéique pour diérentier le solveur linéaire itératif provenant du shéma

impliited'avanementen temps. Nosrésultatsmontrentunoûtraisonnable,

tanten onsommationmémoireque pour le temps d'exéution de l'adjoint.

Mots-lés : OPA, Cirulation Oéanique, TAPENADE, Diérentiation

Automatique, mode inverse, Code Adjoint, Chekpointing

(6)

1 Introdution

The development of tangent and adjoint models is an important step in ad-

dressingsensitivityanalysisandvariationaldataassimilationproblemsinOe-

anography. Sensitivity analysis is the study of how modeloutput varies with

hanges in model inputs. The sensitivity information given by the adjoint

modelis used diretly to gain anunderstanding of the physial proesses. In

data assimilation, one onsiders a ost funtion whih is a measure of the

model-datamist. The adjointsensitivities are used to build the gradient for

desent algorithms. Similarly the tangent modelis used inthe ontextof the

inrementalalgorithms[3℄to linearizethe ost funtion around abakground

ontrol. For the previous version 8 of the Oean General Cirulation Model

OPA[17℄,Weaveretal[22℄developed thenumerialtangentandadjointodes

by handusing lassialtehniques [5,19℄. Sine then,the OPA modelhas un-

dergone a major update. Partiularly the new versions are fully rewritten in

FORTRAN95. In this paper, we report on the development of tangent and

adjoint odes of OPA using the Automati Dierentiation (AD) tool TAPE-

NADE [12℄. A briefdesription of the OPA modeland the ongurationused

in this work is given in the next setion. In setion 3 we present the prini-

ples of AD and how they are reeted into the funtionalities of the AD tool

TAPENADE. Insetion4wefous onthe mostinterestingdiulties thatwe

enountered in the appliation of AD to suh a large ode. Setion 5 shows

some experiments that validate our derivatives and presents two illustrative

appliations, fousing on omputational aspets rather than impliations for

oeanography. An outlook of further work is given inthe onlusion.

2 The Oean General Cirulation Model OPA

Developed bythe LODYCteamatParisVIuniversity,OPAisaexibleoean

irulation model that an be used either in a regional or in a global oean

onguration. OPA is the oean model omponent of NEMO (Nuleus For

European Modelling of the Oean) and is widely used in the sienti om-

munity. Moreover it is beoming a major ator in operational oeanography

(Merator, ECMWF, UK-Met oe) Itsformulationisbased on the so-alled

primitiveequationsforthetemporalevolutionof oeanveloityurrents,tem-

(7)

perature and salinity in its three horizontal and vertial dimensions. These

equationsarederived fromNavier-Stokesequationsoupledwith astate equa-

tion for water density and heat equation, under Boussinesq and hydrostati

approximations.

Let us introdue the following variables:

U

the veloity vetor,

U = U h + w k

(the subsript

h

denotes the loal horizontal vetor),

T

the poten-

tialtemperature,

S

the salinity,

p

the pressure and

ρ

the in-situdensity. The

vetor invariant form of the primitive equations in an orthogonal set of unit

vetors linked to the earth are writtenas follows

 

 

 

 

 

 

 

 

 

 

 

 

 

 

∂ U h

∂t = −

(∇ × U ) × U + 1

2 ∇ ( U ) 2

h

− f k × U h − 1 ρ 0

∇ h p + D U

∂p

∂z = −ρg

∇ · U = 0

∂T

∂t = −∇ · (T U ) + D T

∂S

∂t = −∇ · (S U ) + D S ρ = ρ (T, S, p)

where

isthegeneralizedderivativevetoroperator,

t

thetime,

z

the vertial

oordinate,

ρ 0

areferenedensity,

f

theCoriolisaeleration,and

g

thegravity

aeleration.

D U , D T

and

D S

aretheparametrizationofsmallsalephysisfor momentum, temperature and salinity, inluding surfae foring terms. A full

desription of the modelbasis, disretization, physial and numerialdetails

an be found in [17℄.

Through this paper, OPA is used in its global free surfae onguration

ORCA-2. In this onguration the model uses a rotated grid with poles on

NorthAmeriaandAsiainordertoavoidthesingularityproblemontheNorth

Pole. The spae resolution is roughly equivalent toa geographial mesh of 2°

by1.3°withameridionalresolutionof0.5°neartheEquator(seegure1). The

Vertial domain, spreading from the surfae to a depth of 5000m, is meshed

using 31 levels with levels 1 to10 in the top 100 meters. The time step is 96

minutes sothat there are 15time steps per day. The modelis fored by heat,

freshwater, and momentum uxes from the atmosphere and/or the sea-ie.

(8)

Figure1: ORCA2 Mesh

The solar radiation penetrates the upper layers of the oean. Zero uxes of

heatand saltareappliedthrough the bottom. On the lateralsolid boundaries

ano-slip ondition isalsoapplied. Initializationof the modelfor temperature

andsalinityisbasedonthe Levitusetal. (1998)limatologywithanullinitial

veloity eld. For more details about the spae time-domain and the oean

physisofORCA-2, werefertothe pagedediated tothisongurationinthe

oialwebsiteof NEMO-OPA 1

.

TheongurationORCA-2isroutinelyusedbyMERCATOR/Meteo-Frane

to ompute the oeani omponent of their seasonal foreasting system. The

size ofOPA-9,200 modulesdening 800 proedures withover 100000 lines of

FORTRAN95, makes itthe largest appliation dierentiated by TAPENADE

to date. The omputational kernel whih is atually dierentiated aounts

for 330 proedures.

1

http://www.lody.jussieu.fr/NEMO/general/desription/ORCA_ong.html

(9)

3 Priniples of AD and the tool TAPENADE

TAPENADE [12℄ is an AD tool developed by the Tropis 2

team at INRIA.

Given the soure of anoriginal program that evaluates a mathematial fun-

tion, and given a seletion of input and output variablesto be dierentiated,

TAPENADEproduesanewsoureprogramthatomputesthepartialderiva-

tives of the seleted outputs with respet to the seleted inputs.

Basially,TAPENADE doesthatbyinsertingadditionalstatementsintoa

opy of the original program. Like other AD tools, TAPENADE is based on

thefundamentalobservationthattheoriginalprogramP,whateveritssizeand

run time, omputes a funtion

F, X ∈IR m 7→ Y ∈IR n

whih is the omposition

of the elementary funtions omputed by eah run-time instrution. In other

words if P exeutes a sequene of elementary statements

I k , k ∈ [1..p]

, then P

atually evaluates

F = f p ◦ f p − 1 ◦ · · · ◦ f 1 ,

whereeah

f k

isthefuntion implementedby

I k

. Therefore onean apply the

hain rule ofderivativealulusto getthe Jaobianmatrix

F

,i.e. the partial

derivatives of eah omponent of

Y

with respet to eah omponent of

X

.

Calling

X 0 = X

and

X k = f k (X k − 1 )

the suessive values of all intermediate variables, i.e. the suessive states of the memory throughout exeution of P,

we get

F (X) = f p (X p − 1 ) × f p− 1 (X p − 2 ) × · · · × f 1 (X 0 ) .

(1)

The derivatives

f k

of eah elementary instrution are easily built, and must

be inserted in the dierentiated program so that eah of them has the values

X k − 1

diretly available for use. This proess yields analyti derivatives, that

are exat up tonumerial auray.

In pratie, two sorts of derivatives are of partiular importane in sien-

ti omputing: the tangent (or diretional) derivatives, and the adjoint (or

reverse) derivatives. In partiular, tangent and adjoint are the two sorts of

derivative programs required for OPA, and TAPENADE provides both. The

tangent derivativeis the produt

Y ˙ = F (X) × X ˙

of the fullJaobiantimes a

diretion

X ˙

inthe input spae. From equation (1), we nd

Y ˙ = F (X) × X ˙ = f p (X p− 1 ) × f p 1 (X p− 2 ) × · · · × f 1 (X 0 ) × X ˙

(2)

2

http://www-sop.inria.fr/tropis/

(10)

whihismostheaplyexeutedfromrighttoleftbeausematrix

×

vetorprod-

uts are muh heaper than matrix

×

matrix produts. This is also the most

onvenient exeution order beause it uses the intermediate values

X k

in the

same order as the program P builds them. On the other hand the adjoint

derivativeis the produt

X = F ′∗ (X) × Y

of the transposed Jaobiantimes a

weight vetor

Y

in the output spae. The resulting

X

is the gradient of the

dot produt

(Y · Y )

. From equation (1), we nd

X = F ′∗ (X) × Y = f 1 ′∗ (X 0 ) × · · · × f p ′∗ 1 (X p− 2 ) × f p ′∗ (X p− 1 ) × Y

(3)

whih is alsomostheaply exeuted from righttoleft. However, this uses the

intermediate values

X k

in the inverse of their buildingorder inP.

Regarding the runtime ost for obtaining the derivatives, both tangent

Y ˙

and adjoint

X

ost only a small multiple of the original program P. The

slowdown fator is less than 4 in theory. In pratie it an be less than 2 for

thetangent,whereasitanreahupto10fortheadjointforareasondisussed

below. Despite its higher ost, the adjoint ode is still by large the heapest

way to obtain gradients. To get the gradient with the tangent mode would

require

m

runs ofthe tangentode, one perdimensionof

X

,whereas this ost

isindependent from

m

with the adjointmode.

The diulty of the adjoint mode lies in the fat that it needs the inter-

mediatevalues

X k

inreverse order. Tothis end, TAPENADE basially usesa

two-sweepsstrategy,alledStore-All. Intherstsweep(theforwardsweep),

aopy of the original programP isrun, together with Push statements that

store intermediate values on a stak just before they get overwritten. In the

seond sweep (the bakward sweep), the derivative statements ompute the

elementaryderivatives

f k ′∗ (X k − 1 )

for

k = p

down to

1

, using Pop statements

to restore the intermediate values as they are required. This inurs a ost in

memory spae as the maximum stak size needed is attained at the end of

the forward sweep, and is thus proportional to the length of the program P.

There is also a runtime penalty for these stak manipulations. TAPENADE

implements a number of strategies [11℄ to mitigate this ost, based on stati

data-owanalysis oftheprogram'sontrolowgraph,reduing thenumberof

values

X k

thatneedtobestored. HoweverforverylongprogramssuhasOPA,

involving unsteady simulations, Store-All an not work alone. TAPENADE

ombinesit with a storage/reomputationtrade-o alled hekpointing.

(11)

Chekpointing redues the maximum stak size at the ost of dupliated

exeutions. Consider apiee

C

of the originalprogramP.Chekpointing

C

as

illustrated on gure 2 means that during the main forward sweep,

C

pushes

no value on the stak. When the bakward sweep reahes bak to the plae

where intermediate values are now missing on the stak, it runs

C

a seond

time, this time with the Store-All strategy i.e. pushingvalues on the stak.

The bakward sweep an then resume safely. To run

C

twie requires that

enoughof its input values, a snapshot, are stored but the size of a snapshot

isgenerallymuhless thanthestaksize usedby

C

. Obviously,thisalsoslows

down the adjoint program. When

C

is well hosen, hekpointing an divide the peak size of the stak by a fator of two. Chekpoints an be nested, in

whih ase both the stak's peak size and the adjoint runtime slowdown an

growaslittleasthelogarithmofthesizeofP.Initsdefaultmode,TAPENADE

applieshekpointing to eah proedureall.

successive sweeps

{ C

Figure 2: Chekpointing applied to the program piee

C

. Rightwards arrows

represent forward sweeps, thik when they store intermediate values on the

stak, thin otherwise. Leftwards arrows represent bakward sweeps. Blak

dots are stores, white dots are retrieves. Small dots are Push and Pops, big

dots are snapshots.

TAPENADE apaity togeneraterobust and eienttangent andadjoint

odes has been demonstrated on several real-world test appliations[15, 7, 1,

13, 16℄. Regarding the appliation language, it an handle programs written

inFORTRAN.Takingintoaountthenewprogrammingonstrutsprovided

by FORTRAN95 has required an important programming eort in the past

few years, mostly to handle modules, strutured data types, array notation,

pointers,anddynamimemoryalloation. SinethenewOPA9isnowwritten

in FORTRAN95, dierentiation of OPA is a very realisti test for the new

TAPENADE 2.2.

(12)

There exist several other AD tools. Restriting to the tools whih, like

TAPENADE, operate by soure transformation, provide tangent and adjoint

modes, use global program analysis to optimize the dierentiated ode, and

have demonstrated their appliability on large industrialodes, we an men-

tion TAF [4℄a pioneer of AD for meteorology, now the standard AD tool for

thepopularMITGlobalCirulationModel. UnlikeTAPENADE's,theadjoint

mode of TAF regenerates the intermediate values

X k

by reomputation from angiven initialpoint. This is alleda Reompute-All strategy. Comparison

with Store-All strategy is getting blurred by nested hekpointing, as the ad-

joint odes grow more alike as more hekpoints are inserted. OpenAD [20℄,

suessor of ADIFOR and ADIC, uses the Store-All strategy. There are ex-

periments to also apply OpenAD to the MIT GCM. The tool Adol-C [10℄,

although using operator overloadinginstead of soure transformation, is very

popularand has been appliedsuessfully tomanyindustrialappliations. Its

adjoint mode an be seen as anextension of the Store-All strategy: not only

theintermediatevaluesarestoredonthestak,butalsotheomputationgraph

tobedierentiated. This allows theAD tooltoperform further optimizations

onthis graph, atthe ost of ahigher memory onsumption.

4 Applying TAPENADE to OPA

We generatedworking tangentand adjointodes forthe omputationalkernel

of OPA, using TAPENADE. Depending on the nal appliation (f setion

5), the atual funtion to dierentiate as well as the input and output vari-

ables may be dierent, but the tehnial diulties that we enountered are

essentially the same. This setion desribesthese points.

4.1 FORTRAN95 onstruts

ThenewOPA9usesextensivelythemodularonstruts ofFORTRAN95. We

hadtoextendthe all-graphinternalrepresentationof TAPENADE tohandle

the nesting of modules and proedures. Essentially this nesting is mirrored

intothe dierentiated ode.

Beause a module an dene private omponents, subroutines in the dif-

ferentiatedmodules donot haveaess toall variablesof the originalmodule.

(13)

Thereforethedierentiatedmodulemustontainitsownopyofalltheoriginal

module's variables,types, and proedures. This is a hange inTAPENADE's

dierentiation model: the dierentiated ode annot just all or use parts of

the originalode; itmust ontain itsown opies of those. In other words, the

dierentiated ode neednot be linked with the original.

The interfae mehanism of FORTRAN95 is a way to implement over-

loaded proedures. This is stati overloading, whih is resolved at ompile

time. Therefore we had to extend TAPENADE type-heking phase to om-

pletely solve the alls to interfaed proedures. Conversely, TAPENADE is

now able to generate interfaes on the dierentiated proedures, so that the

generalstruture of the ode is preserved.

The arraynotationof FORTRAN95is usedsystematiallyinOPA. At the

same time, dierentiation requires that many alls to intrinsi funtions be

split to propagate the derivatives. When these funtions are used on arrays

("elemental"intrinsis)TAPENADE must generate a ode whih is far from

trivial. Forinstane the single statementfrom OPA:

zws(:,:,:) = SQRT(ABS(psal(:,:,:)))

generates inthe adjointmode

abs1 = ABS(psal(:,:,:))

mask = (psal(:,:,:) .GT. 0.0)

...

WHERE (abs1 .EQ. 0.0)

abs1b = 0.0

ELSEWHERE

abs1b = zwsb(:, :, :)/(2.0*SQRT(abs1))

END WHERE

WHERE (.NOT.mask(:, :, :))

psalb(:, :, :) = psalb(:, :, :) - abs1b

ELSEWHERE

psalb(:, :, :) = psalb(:, :, :) + abs1b

END WHERE

Without going in too muh detail into the adjoint dierentiation model, we

observe that the test that is needed to protetthe dierentiated ode against

(14)

the non-dierentiability of SQRT at 0, as well as the test that ontrols the

dierentiation of ABS, have been turned into WHERE onstruts to keep the

runtime benets of array notation. Some temporary variables are introdued

automatially to store ontrol-ow deisions (e.g. abs1 and mask), although

TAPENADE stilldoesn't dothis in anoptimal way on the example.

OPA usespointers anddynamimemoryalloation(allstoALLOCATE and

DEALLOCATE). This is an appliation for the pointer analysis now available

in TAPENADE, nding whether a variable has a derivative, even when this

variable is aessed through a pointer. Unfortunately, dynami alloation is

handled partly, i.e. only in the tangent mode of TAPENADE. In the adjoint

mode, we have no general strategy for memory alloation and TAPENADE

sometimesannotprodueaworkingode. Weunderstandthat theadjointof

an alloate should be a DEALLOCATE, and vie-versa, but some hanges must

be made by hand onthe dierentiated ode to makeit work.

4.2 Chekpointing and hidden variables

OPA reads and writes several data les, not only during the pre- and post-

proessingstages,butalsoduringtheomputationalkernelitself. Soureterms

suhas the wind stress are being read at intermediatetime steps. Also, some

modulesandproeduresdeneprivateSAVEvariables,whosevalueispreserved

but annot be aessed from outside. Although unrelated, these two points

are just examples of a ommon problem: they an make a proedure non

reentrant".

Ifaalled proeduremodiesaninternal SAVEvariable,itbeomesimpos-

siblefromthe outsidealling ontextto allthe proedureaseondtime with

an idential result. Similarly if the alled proedure reads from a previously

openedle,andjustmovesthe readpointerfurtherinthele,thenitbeomes

impossibleto allthe proedure twie and obtain the same values read.

Non reentrant proedures are a problemfor the hekpointing strategy of

the adjointmode. We sawinsetion3 thathekpointing relieson allingthe

hekpointedpieetwie,insuhawaythattheseondallisequivalenttothe

rst. To this end, a suient subset of the exeution ontext, the snapshot,

must besaved andrestored. HiddenvariableslikeaninternalSAVEvariableor

the readpointer insideanopened leannotbesaved nor restored in general.

(15)

Whenhekpointingwould requirehiddenvariablestobeputinthe snapshot,

then hekpointing shouldbe forbidden.

Similarly, when a proedure only alloates some memory, the alloation

must not be done twie. If this proedure is hekpointed, then one must

dealloatethe memory when restoringthe snapshot beforethe dupliate all.

TAPENADE isnot yet able todo this automatially.

TAPENADE has some funtionalities to ope with this hidden variables

problem,but inall ases interation with the user isneessary. First, TAPE-

NADE issues a warning message when a subroutine annot be hekpointed

beauseof aprivate SAVEvariable. Themessageis issuedonlywhen this vari-

able would be part of the snapshot for this proedure. When this happened

for OPA, we just turned by hand the variables in question into publi global

variables in the original ode. In priniplethis ould also be done automati-

ally. Howeverthere are onlya handfulsuh variables,thus developingthis is

not our priority.

When asubroutine isnot reentrant beauseof I/Olepointers orbeause

of isolatedmemory alloationordealloation,then TAPENADE lets the user

label the subroutine so that it must not be hekpointed. For OPA, we took

another strategy: we modied the main I/O subroutines so that they always

rst make sure that the le is opened and then only use diret read into the

lewithoutusing aread I/O pointer. Thusall I/Osubroutines are reentrant.

4.3 Binomial Chekpointing

Automati Dierentiation of OPA is one of the most ambitious appliations

of TAPENADE so far. It means building the adjoint of a piee of ode that

performs an unsteady nonlinear simulation over a very large number of time

steps. Eahtime stepomputes anew statewhosesize rangesinthehundreds

of megabytes. In adjoint mode if no hekpointing was applied, whih means

that all intermediate values were to be stored on a stak, we ould exeute

onlya handfulof time steps before werun outof memory even onour largest

workstation. Chekpointingisompulsory toomputethe adjointoverseveral

thousands of time steps, whih is our goal.

We saw in setion 3 that TAPENADE applies hekpointing at the level

of subroutine alls, i.e. eah all is hekpointed. This easy strategy is often

(16)

far fromoptimal. On one hand several alls are better not hekpointed, and

TAPENADEnowoerstheoptiontomarkseletedallsfornothekpointing.

On the other hand, hekpointing should be applied at other loations. For

example at the top level of the simulation program is a loop over many time

steps. We denitely need an eient hekpointing sheme applied at this

level of time iterations.

One lassial solution used by TAF on the MIT GCM ode [14℄, is alled

multilevelreursivehekpointing. Basially,itsplitsthe ompletetimeinter-

valintoasmallnumberof equidistant intervals,thenapply the same strategy

to eah of the sub-intervals. For instane 64 time steps an be split into 4

largeintervalsof4smallintervalsof4timesteps,asskethed ongure3. This

onsumes a maximum of 9 simultaneous snapshots, and the average number

of dupliate exeutions for a time step is

2.25

. In a more realisti situation,

1000 time steps an be split into 10 large intervals of 10 small intervals of

10 time steps, and one an gure out that this onsumes a maximum of 27

simultaneous snapshots,and the average numberof dupliate exeutions fora

time step is

2.7

.

0 56 60 61 62

58 57 56 52

54 53 52 48

50 49 48 32

46 45 44 40

42 41 40 36

38 37 36 32

34 33 32 16

30 29 28 24

26 25 24 20

22 21 20 16

18 17 16 0

14 13 12 8

10 9 8 4

6 5 4 0

2 1 0

Figure3: Three-levelshekpointing with 64timesteps and9 snapshots. For-

ward omputations go right, adjoint omputations go left. Blak irles rep-

resent writing/taking a snapshot, white irles represent reading an available

snapshot.

(17)

However, itwasshown in [21℄ thatthis strategyis not optimal. Under the

reasonable assumptions that all time steps ost the same run time, and that

thesnapshotneededtorunagainfromtimestep

n

to

n +1

isthesameastorun

fromstep

n

toany later step

n + x

, Griewank andWaltherhaveharaterized the optimal distribution of nested hekpoints, whih follows a binomiallaw.

Withthisoptimalstrategy,bothspatialandtemporalomplexityoftheadjoint

ode grow logarithmially with respet to the number of time steps of the

original simulation. In other words, both the slowdown fator whih grows

like the number of times eah time step is exeuted, and the memory whih

grows like the number of simultaneous snapshots, grow logarithmially with

the total number of time steps.

In real appliations,run-timeand memoryspae donot behave symmetri-

ally. One an always waita little longer for the result, whereas the memory

spae is bounded. Therefore the maximum number of snapshots

d

that an

be stored simultaneously is xed. Then [8℄ shows that the optimal strategy

givesaslowdownfator thatgrowsonlylikethe

d th

rootofthetotalnumberof

timesteps, whihisstillverygood. Figure4shows the optimalhekpointing

strategy for the same problem as gure 3 i.e. 64 time steps with memory for

9 snapshots. The average number of dupliate exeutions for a time step is

only

2

. For the more realisti situation (1000 time steps and memory for 27

snapshots) the average number of dupliate exeutions is only

2.57

.

We implemented this optimal strategy in the adjoint ode of OPA. We

madeour rstexperiments byhand modiationofthe adjointodeprodued

by TAPENADE. Still, TAPENADE produed automatially the proedures

that storeand retrieve thesnapshot, andtherefore the handmodiationwas

benign: given the number of time steps, a general proedure 3

shedules the

optimalsequene of ations (store snapshot, retrievesnapshot, run time step,

run adjoint time step) to dierentiate the omplete simulation. Further ver-

sions of TAPENADE will fully automate this proess. Figure 5 shows the

performanes onOPA. Theyare ingoodagreementwiththe theory. Notie in

partiular the two small inetion points on the urve around 150 iterations

and 800 iterations. Going bak to the optimality proof in [8℄, we see that

the optimalstrategy is partiularlyeient whenthe numberof time steps is

3

AFORTRAN95implementationofthisshedulingproedureanbefoundinwww.inria-

sop/tropis/ftp/Hiham_Tber/

(18)

0 56 60 60 62 58 57 56 51

54 53 52 51 45

49 48 47 46 45 38

43 42 41 40 39 38 30

36 35 34 33 32 31 30 16

28 27 26 25 24 22 22 19

20 19 16

17 16 0

14 13 12 11 10 9 6

7 6 3

4 3 0

1 0

Figure4: Optimalbinomialhekpointing with 64timesteps and9 snapshots

200 400 600 800 1000 1200

4.5 5 5.5 6 6.5 7 7.5

Total number of time steps

Slowdown factor

Figure5: Optimalbinomialhekpointingwith15snapshots: slowdown fator

asafuntionofthe totallengthofthe initialsimulation. Theslowdown fator

isthe run-timeratio of the adjointode ompared tothe originalode.

(19)

exatly

η(d, t) = (d + t)!

d!t!

where

d

isthenumberofsnapshotsand

t

isthenumberofdupliateexeutions

allowed per time step. Forour target mahine

d = 15

and we nd

η(15, 2) = 136

and

η(15, 3) = 816

, whihorresponds tothe inetion pointsof gure 5.

ForthepreviousversionOPA8,theadjointwaswrittenbyhand. Neverthe-

less, even a hand-written adjoint must implementstrategies to retrieve inter-

mediatestates inreverse order that is,something verylose tohekpointing.

Looking at this hand-written adjoint, we rst observe that the hekpointing

strategyisneithermultilevelnoroptimalbinomial. Itismorelikeasinglelevel

strategy,withonesnapshotstoredeveryxednumberoftimesteps. Duringthe

reverse sweep, states between two stored snapshots are rebuiltapproximately

usinglinear interpolation. The advantageis that fewtime stepsare evaluated

twie, and therefore the slowdown fator remainswell below4. We an see at

leasttwodrawbaks. First,this handmanipulationrequiresdeepknowledgeof

the original program and of the underlyingequations. This method does not

blend easily with Automati Dierentiation. It is not yet automated in any

AD tool and thereforetedious and error-proneode manipulationswould still

beneessary. Seond, this introdues approximationerrors intothe omputed

derivatives,whosemathematialbehaviorisunlear. The gradientobtained in

theend isusedinomplexoptimizationsordata-assimilationloops,and small

errors may result in poor onvergene. In any ase, for very large numbers of

time steps, we believe a trade-o between exat binomial hekpointing and

approximate interpolation is worth experimenting. Interpolation is probably

good enough for many variables that vary very slowly, and whih ould be

designated by the end-user, and only the other variables would need to be

stored.

4.4 Iterative linear solver

TheOPAmodelsolvesanelliptiequation atthe endof eah timestep, using

aniterativemethod that generates asequene of approximationsof the exat

solution. The mehanial appliation of AD on this kind of methods gives a

sequene of derivatives of the approximate solutions with the same number

(20)

of iterations as the original solver. The reason is that AD keeps the ow of

ontroloftheoriginalprograminthe dierentiatedprogram. Inpartiularthe

onvergene tests are stillbasedonlyonthe non-dierentiatedvariables. Nat-

urally,one may askwhether and howAD-produedderivativesare reasonable

approximations to the desired derivative of the exat solution. The issues of

derivative onvergene for iterative solvers in relation to AD are disussed in

[6,9, 2℄.

OPA provides two alternative algorithms to solve the ellipti equation:

PCGforPreonditionedConjugateGradient,andSORforSuessiveOver-

Relaxationmethod. Bothalgorithmsgive orretresults forthe originalode,

butPCG isgenerallypreferredthanks toitseieny andvetorizationprop-

erties. However,theAD-dierentiatedodegivesdierentresultsusingthetwo

algorithms. Figure 6 ompares the AD-derivatives with approximate deriva-

tives obtained by divided dierenes. We see that the derivatives obtained

with the SOR algorithm remain orret when the number of time steps in-

reases. On the ontrary, the derivatives obtained with the PCG algorithm

beome ompletelywrong after 80time steps. Notie that this ours in tan-

gent mode as well as in adjoint mode: the derivatives obtained with PCG,

although wrong, remain idential in tangent and adjoint. Our explanation is

thateahiterationofPCGinvolvestheomputationofsalarprodutsofvari-

ables that depend on the state vetor, thus making the numerial algorithm

nonlineareven thoughthe elliptiequationislinear. In[6℄,Gilbert has shown

that the appliation of AD to a xed point iteration gives a derivative xed

pointiterationthat onverges R-linearlytothe desiredderivativeinpartiular

in the ase of a large ontrative iterate or seant updating. Unfortunately

this is not the ase for quasi-Newton iterative solvers suh as PCG, for whih

there isno similaronvergene result to our knowledge.

To solve this problem for the tangent-dierentiated OPA we exploit the

linearity of the ellipti system, and for the adjoint-dierentiated OPA we ex-

ploit the self adjointness property of the ellipti operator [22℄. We an thus

use the original PCG routine itself to solve for the dierentiated linear sys-

tems. Pratially, we do this using the so-alledblak-box feature provided

in TAPENADE. Figure 6 shows that (here for the tangent mode) the PCG

gives the same auray asthe SOR solver.

(21)

Figure6: evolutionoftherelativeerrorbetweentangentderivativeanddivided

dierenes, for the three strategies: SOR and straightforward AD, PCG and

straightforward AD, PCG with the blak box strategy

(22)

In another experiment, we tried to use straightforward AD with the PCG

solver, but this time xing the number of PCG iterations to some very high

value. We observed that the derivatives beome oherent again with divided

dierenes. Thisouldbeanotherwaytosolveourproblem,butitisertainly

expensiveand thehoie ofthe high iterationnumberisdeliate. Thisprob-

lemdenitelydeservesfurtherstudy,andonrmsthegeneralreommendation

not to dierentiate solvers of a nonlinear kind, and use a blak-box strategy

instead.

5 Validation Experiments

5.1 Corretness test

The lassialway tohek for orretnessof the automatiallygenerated tan-

gentand adjoint odes is as follows:

1. Choose an arbitrary input

X

and and arbitrary diretion

X ˙

. Compute

the Divided Dierene

DD = F (X + ε X) ˙ − F (X) ε

for agoodenoughsmall

ε

.

2. Usingthe tangent dierentiated program, ompute

Y ˙ = F (X) × X ˙

.

3. Usingthe adjoint dierentiated program, ompute

X = F ′∗ (X) × Y ˙

and nally hek that

(DD · DD) = ( ˙ Y · Y ˙ ) = (X · X) ˙

. We performed this

test for the omplete global ORCA-2 simulation on 1000 time steps and its

derivative odes. The results are shown in table 1. The values math, and

Table 1: Dot produt test for 1000 time steps

(DD · DD)

(

ε = 10 7

) 4.405352760987440e+08

( ˙ Y · Y ˙ )

4.405346876439977e+08

(X · X) ˙

4.405346876439867e+08

(23)

( ˙ Y · Y ˙ )

and

(X · X ˙ )

math very well, up to the last few digits, whih shows

that the tangent and adjoint odes really ompute the same derivatives, only

inadierentomputationorderasshown byequations(2)and(3). Thevalues

of

(DD · DD)

and

( ˙ Y · Y ˙ )

don't mathsowell,beause of the weakness of the

DividedDierenesapproximation. Figure7shows this weakness: Forasmall

10 −12 10 −10 10 −8 10 −6 10 −4 10 −2

10 −6 10 −4 10 −2 10 0

ε

Figure7: Relative error of Divided Dierenes with respet to AD-generated

derivatives, omputed for various values of of the step size

ε

value of

ε,

the dominant error is due to mahine auray. For a large value

of

ε,

the dominant error is due to the seond derivatives of

F

. The best

ε

minimizesboth errors, but annot eliminatethem ompletely.

5.2 Sensitivity analysis on a long simulation

Oneofthemainappliationofadjointmodelsisthesensitivityanalysisi.e. the

study of how modeloutputvaries with hanges inmodelinputs. Usingdiret

orstatistialmethodswould require many integration of the nonlinear model

whileoneadjointmodelintegrationisenoughtoomputethissensitivity. Asan

example,gure8shows theoutputmapofthesensitivityoftheNorthAtlanti

meridionalheat uxat 29°Nto hanges inthe initialsea surfae temperature

(

SST t 0

) over one year integration period, starting January 1, 1998. This is

(24)

done by omputing the gradient respet to

SST t 0

of

J = Z t N

t 0

Z Z

T.v dxdzdt

where

is the zonal ross setion at 29°N in the North Atlanti,

T

is the

temperature and

v

is the meridionalurrent veloity.

Contours in gure 8 show where variation of initialSST would eet the

mostuponheattransportat29°N.Itshowslargesalepatternsmainlyloated

north of the 29°N parallel and in the Caribbean sea with a strong spot o

Moroo. These results are onsistent with those obtained by Marotzke et al.

([18℄)

This mapwasomputedbythe TAPENADE-generated adjointofOPA on

the global ORCA2 grid, over 5475 time steps (1 year). This experiment was

donewiththe SORalgorithmasthe iterativelinearsolver. TheTAPENADE-

generated adjoint omputed this sensitivity map in a time that is only 8.03

times that of the originalsimulation.

5.3 Data Assimilation

For further validation of the automatially generated derivatives, we arried

out a data assimilation experiment. This was done in a so-alled twin ex-

periment framework whereby the diret modeltrajetory is used to generate

syntheti observations. The initial sea surfae temperature is perturbed by

a white noise and it has to be reovered using variational data assimilation

tehniques. Syntheti observation are given by the sea surfae height (SSH)

and thesea surfae salinity (SSS)generatedfromthe model's originaloutputs

startingfrom the unperturbed SST.

The ost funtion to be minimisedis

J(SST (t 0 )) = Z t N

t 0

k SSH(t) − SSH o (t) k 2 + k SSS(t) − SSS o (t) k 2 dt

(4)

Where the supersript

o

stands for syntheti observation and

SSH(t)

and

SSS(t)

are modeloutput.

For omputing ost issues, only the Antarti zoom of ORCA2 is onsid-

ered, the minimisation is done an iterative gradient searh algorithm where

(25)

Figure8: SensitivitymapoftheNorthAtlantiheattransportat29°N(dotted

line), with respet tohanges in the initialsurfae temperature

(26)

the gradient of

J

is umputed using adjoint tehniques. Figure 9 illustrates the performane ofthe optimizationloopforanintegration periodof 1month

i.e. 450 time steps. The ost funtion dereases by two orders of magnitude.

Figure 10 indiates that the true solution (top panel) is reovered with a

goodapproximation(bottompanel)fromthe randomlyperturbed one(middle

panel),showing the qualityof the derivativesobtained.

10 0 10 1 10 2

10 4 10 5 10 6

Iterations

Cost function

Figure 9: Twin experiment: Convergene of the ost funtion

6 Conlusion and Outlook

The eort to build the tangent and adjoint odes for the previous version 8

of the OPA oean General Cirulation Model has ost several months devel-

opment from an experiened researher. For the new version OPA 9 written

in FORTRAN95, the use of the AD tool TAPENADE signiantly redues

thiseort. Our rstnumerialappliationsshowthequality ofthe derivatives

obtained. This works validatesthe hoie of AD asthe strategy toobtain the

tangent and adjointfor OPA 9, and for the versionsto ome.

At the same time, OPA is the largest FORTRAN95 appliation dierenti-

ated with TAPENADE. This work has pointed at a number of limitationsof

(27)

Figure10: Twin experiment: True eld (top), Initialperturbed eld (middle)

and identied optimalsea surfae temperatures (bottom)

(28)

TAPENADE thathavebeen lifted. Otherlimitationsremain,suhasthenon-

reentrant proedures, whih need to be addressed in future work. Suessful

dierentiation of OPA denitely inreases our ondene inTAPENADE.

This works is also an additional illustration of the superiority of the bi-

nomialhekpointing strategy, ompared tomultilevelhekpointing. By the

standardsofotherappliationelds,e.g. CFD,aslowdownoftheadjointode

of only 7 for a nonsteady simulation on 1000 time steps would be onsidered

very good. By the standards of weather simulation or oean modeling how-

ever, sientists expet yet faster adjoints, at the ost of a radial approxima-

tion. Even ifweonsider that these approximations hange the mathematial

nature of the optimization proess, we understand they are neessary and we

shall study how they an beproposed as anoption by the AD tool.

This workhas underlined several diretionsfor furtherresearh inAD and

ADtools. Someofthemarealreadybeingstudiedbyresearhersinourgroups.

Consideringtheappliation language,two onstrutsneedtobedierentiated

better:

ˆ The next experiment tobe made very soon is to apply TAPENADE to

the parallelized version of OPA. This is neessary before the generated

tangent and adjointodes an be used inprodution ontext.

ˆ TheOPAsoure makesextensiveuse of the preproessor diretivessuh

as#IFDEF.TAPENADE doesnotdealwiththesediretivesbeausethey

donotrespetthesyntatistrutureofaode. Handlingthesediretives

inthe AD tool is inour opinion hopeless. What might bedone though,

is to generate dierentiated odes for eah possible preproessed ode,

anddevise atooltoputthe diretivesbakintothe dierentiatedodes.

Thisismadeeasierifthedierentiatedodelosely followsthe struture

of the original,as isthe ase with TAPENADE.

Consideringspeiallyadjointdierentiation,wehopetoobtainmoreeient

ode through a more systemati exploitation of self-adjointness, e.g. of the

ellipti operator. We also hope to optimize the hekpointing strategy. In

itspresent version, TAPENADE applieshekpointing toeahproedureall.

Usingprolinginformation,webelieveweandetetseveralproedureallsfor

whih hekpointing is useless or ounter-produtive. TAPENADE is already

able touse this informationtoprodue a better adjoint.

(29)

Referenes

[1℄ W. Castaings, D. Dartus, M. Honnorat, F.-X. Le Dimet,Y. Loukili, and

J.Monnier. Automatidierentiation: Atoolforvariationaldataassimi-

lationand adjointsensitivityanalysis foroodmodeling. InBüker etal,

editor, Automati Dierentiation: Appliations, Theory, and Implemen-

tations, LNCSE, pages 249262.Springer, 2005.

[2℄ B. Christianson. Reverse aumulationand attrative xed points. Opti-

mization Methods and Software, 3:311326, 1994.

[3℄ P. Courtier, J.-N. Thépaut, and A. Hollingsworth. A strategy for opera-

tionalimplementationof4d-var,using aninrementalapproah. Q. J.R.

Meteorol. So., 120:13671387,1994.

[4℄ R. Giering. Tangent linear and Adjoint Model Compiler, Users manual,

1997. http://www.autodi.om/tam.

[5℄ R.Gieringand T.Kaminski. Reipesforadjointodeonstrution. ACM

Transations on MathematialSoftware, 24(4):437474,1998.

[6℄ J.C.Gilbert. Automatidierentiationanditerativeproesses. Optimiza-

tion Methods and Software,1:1321, 1992.

[7℄ M.B.Giles,D.Ghate,andM.C.Duta.Usingautomatidierentiationfor

adjointCFD ode development. In Uthup etal, editor, Reent Trends in

Aerospae Design and Optimization, pages 363373. Tata-MGraw Hill,

New Delhi, 2005. Post-SAROD-2005, Bangalore,India.

[8℄ A.Griewank. Ahievinglogarithmi growthoftemporaland spatialom-

plexity in reverse automati dierentiation. Optimization Methods and

Software, 1:3554,1992.

[9℄ A.Griewank,C.Bishof,G.Corliss,A.Carle,andK.Williamson.Deriva-

tiveonvergene foriterativeequationsolvers. OptimizationMethods and

Software, 2:321355, 1993.

(30)

[10℄ A.Griewank,D.Juedes,J.Srinivasan,andC.Tyner. ADOL-C,apakage

for the automati dierentiation of algorithmswritten in C/C++. ACM

Trans.Math. Software,22(2):131167,1996.

[11℄ L. Hasoët and M. Araya-Polo. The adjoint data-ow analyses: For-

malization, properties, and appliations. In Automati Dierentiation:

Appliations,Theory, andTools,LetureNotes inComputationalSiene

and Engineering. Springer, 2005. Seleted papers fromAD2004 Chiago.

[12℄ L. Hasoëtand V. Pasual. Tapenade 2.1user's guide. Tehnial Report

0300, INRIA,2004.

[13℄ L. Hasoët, M. Vázquez, and A. Dervieux. Automati dierentiation for

optimumdesign,appliedtosoniboomredution.InKumaretal.,editor,

Proeedings of ICCSA'03, Montreal, Canada, pages 8594. LNCS 2668,

Springer, 2003.

[14℄ Heimbah, P. and Hill, C. and Giering, R. An eient exat adjoint

of the parallel MIT general irulation model, generated via automati

dierentiation. Future Generation Computer Systems, 21(8):13561371,

2005.

[15℄ J. Kim, E.Hunke, and W. Lipsomb. Sensitivity analysis and parameter

tuningshemeforglobalsea-iemodeling.OeanModelingJournal,14(1

2):6180, 2006.

[16℄ C. Lauvernet, F. Baret, L. Hasoët, and F.-X. LeDimet. Improved esti-

mates of vegetation biophysialvariables frommeris toa images by using

spatial and temporal onstraints. In proeedings of the 9th ISPMSRS,

Beijing, China,2005.

[17℄ G. Made, P. Deleluse, M. Imbard, and C. Levy. Opa8.1 oean general

irulation model referene manual. Tehnial report, Pole de Modelisa-

tion, IPSL, 1998.

[18℄ J. Marotzke, R. Giering, K.Q. Zhang, D. Stammer, C. Hill, and T. Lee.

Constrution of the adjoint MIT oean General Cirulation Model and

(31)

appliation to atlanti heat transport sensitivity. J. Geophys. Res.,

(104(C12)):29,52929,548,1999.

[19℄ O.Talagrand. The use of adjointequationsin numerialmodeling of the

atmospheri irulation. In A. Griewank and G. Corliss, editors, Auto-

mati Dierentiation of Algorithms: Theory, Implementation and Appli-

ation, pages 169180,1991. Philadelphia,Penn: SIAM.

[20℄ J. Utke, U. Naumann, M. Fagan, N. Tallent, Strout M., P. Heimbah,

C. Hill, and C. Wunsh. OpenAD/F: A modular, open-soure tool for

AutomatiDierentiationofFortranodes. TehnialreportANL/MCS-

P1230-0205, Argonne National Laboratory, 2006. Submitted to ACM

TOMS.

[21℄ A. Walther and A. Griewank. Advantages of binomial hekpointing for

memory-reduedadjointalulations.InFeistaueretal,editor,Numerial

MathematisandAdvanedAppliations,pages834843.Springer, Berlin,

2003. Proeedingof ENUMATH 2003.

[22℄ A. Weaver, J. Vialard, and D. Anderson. Three- and four-dimensional

variational assimilation with an oean general irulation model of the

tropialpaioean: I.formulation,internaldiagnostisandonsisteny

heks. Monthly Weather Review, 131(7):13601378, 2003.

(32)

4, rue Jacques Monod - 91893 ORSAY Cedex (France)

Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois - Campus scientifique 615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex (France)

Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France) Unité de recherche INRIA Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier (France) Unité de recherche INRIA Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex (France)

Éditeur

Références

Documents relatifs

This diagnostic approach suggests that two different dynamical regimes shape up the EUC seasonal cycle: in summer and autumn, local forcing by the equatorial zonal wind component

Ein verwandt- schaftliches Empfinden ergab sich vor allem aus dem gemeinsamen Status einer Republik, aber auch aufgrund der ähnliehen politischen Institutionen und

We compare three strategies for reducing excited-state con- tamination: optimizing the smearing parameters to reduce excited-state contamination in correlation functions; using

The variation in the reflected P-wave amplitude evaluated with the ASA, as a function of the inci- dence angle, is compared with the plane wave (PW) reflection coefficient and with

Agricultural waste valuation, Thermal insulation materials, Bio-based composites, Hemp shiv, Black liquor, Moisture Buffer Value, Thermal conductivity.. 1

This requieres using the look-up table for material cost to determine the rate for that component in the given manufacturing location, and scaling by the total

unique firm identifiers in the electronics and related indus- tries for two groups of FDI firms operating in China: firms funded by Taiwanese investors and firms funded by

We investigate here the impact of a range of surface conditions in the South- ern Ocean in the iLOVECLIM model using nine simulations obtained with different LGM boundary