HAL Id: inria-00192415
https://hal.inria.fr/inria-00192415v2
Submitted on 29 Nov 2007
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub-
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non,
General Circulation Model OPA with the Automatic Differentiation tool TAPENADE
Moulay Hicham Tber, Laurent Hascoet, Arthur Vidard, Benjamin Dauvergne
To cite this version:
Moulay Hicham Tber, Laurent Hascoet, Arthur Vidard, Benjamin Dauvergne. Building the Tangent
and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic Differentiation
tool TAPENADE. [Research Report] RR-6372, INRIA. 2007, pp.28. �inria-00192415v2�
inria-00192415, version 2 - 29 Nov 2007 a p p o r t
Thème NUM
Building the Tangent and Adjoint codes of the Ocean General Circulation Model OPA with the Automatic
Differentiation tool TAPENADE
M.H. Tber — L. Hascoët — A. Vidard — B. Dauvergne
N° 6372
Novembre 2007
Automati Dierentiation tool TAPENADE
M.H. Tber
∗
, L. Hasoët
∗
, A. Vidard
†
, B. Dauvergne
∗
Thème NUM Systèmes numériques
Projet Tropis
Rapport de reherhe n°6372 Novembre 2007 28 pages
Abstrat: The oean general irulation model OPA is developed by the
LODYC team at Paris VI university. OPA has reently undergone a major
rewriting, migrating to FORTRAN95, and its adjoint ode needs to be re-
built. For earlier versions, the adjoint of OPA was written by hand at a high
development ost. We use the Automati Dierentiation tool TAPENADE
to build mehanialy the tangent and adjoint odes of OPA. We validate the
dierentiated odes by omparison with divided dierenes, and alsowith an
idential twin experiment. We apply state-of-the-art methods toimprove the
performane of the adjoint ode. In partiular we implement the Griewank
and Walther's binomial hekpointing algorithm whih gives us an optimal
trade-o between time and memory onsumption. We apply a spei strat-
egytodierentiatetheiterativelinearsolverthatomesfromtheimpliittime
stepping sheme.
Key-words: OPA, generalirulation model, TAPENADE, Automati Dif-
ferentiation,reverse mode, Adjoint Code, Chekpointing
∗
INRIASophiaAntipolis,Frane
†
INRIAGrenoble,Frane
modèle de irulation générale oéanique OPA
par l'outil de Diérentiation Automatique
TAPENADE
Résumé : Le modèle de irulation générale oéanique OPA est développé
par l'équipe LODYC de l'université Paris VI. La nouvelle version 9 d'OPA
onstitueuneévolutionmajeure,aveenpartiulierunemigrationversFORTRAN95.
LesodesLinéaireTangentetAdjointd'OPA,quiauparavantétaientéritsàla
main,doiventdon être redéveloppés. Nousutilisonsl'outil de Diérentiation
AutomatiqueTAPENADEpouronstruirelesodesTangentetAdjointd'OPA
9. Nous validons les dérivées obtenues par omparaison ave les Diérenes
Divisées etsur deux appliationstest inluant des expérienes jumelles. Nous
utilisonsleshéma de hekpointing réursifbinomialdeGriewanketWalther
pour améliorerlesperformanes du ode adjoint. Nousutilisons une stratégie
spéique pour diérentier le solveur linéaire itératif provenant du shéma
impliited'avanementen temps. Nosrésultatsmontrentunoûtraisonnable,
tanten onsommationmémoireque pour le temps d'exéution de l'adjoint.
Mots-lés : OPA, Cirulation Oéanique, TAPENADE, Diérentiation
Automatique, mode inverse, Code Adjoint, Chekpointing
1 Introdution
The development of tangent and adjoint models is an important step in ad-
dressingsensitivityanalysisandvariationaldataassimilationproblemsinOe-
anography. Sensitivity analysis is the study of how modeloutput varies with
hanges in model inputs. The sensitivity information given by the adjoint
modelis used diretly to gain anunderstanding of the physial proesses. In
data assimilation, one onsiders a ost funtion whih is a measure of the
model-datamist. The adjointsensitivities are used to build the gradient for
desent algorithms. Similarly the tangent modelis used inthe ontextof the
inrementalalgorithms[3℄to linearizethe ost funtion around abakground
ontrol. For the previous version 8 of the Oean General Cirulation Model
OPA[17℄,Weaveretal[22℄developed thenumerialtangentandadjointodes
by handusing lassialtehniques [5,19℄. Sine then,the OPA modelhas un-
dergone a major update. Partiularly the new versions are fully rewritten in
FORTRAN95. In this paper, we report on the development of tangent and
adjoint odes of OPA using the Automati Dierentiation (AD) tool TAPE-
NADE [12℄. A briefdesription of the OPA modeland the ongurationused
in this work is given in the next setion. In setion 3 we present the prini-
ples of AD and how they are reeted into the funtionalities of the AD tool
TAPENADE. Insetion4wefous onthe mostinterestingdiulties thatwe
enountered in the appliation of AD to suh a large ode. Setion 5 shows
some experiments that validate our derivatives and presents two illustrative
appliations, fousing on omputational aspets rather than impliations for
oeanography. An outlook of further work is given inthe onlusion.
2 The Oean General Cirulation Model OPA
Developed bythe LODYCteamatParisVIuniversity,OPAisaexibleoean
irulation model that an be used either in a regional or in a global oean
onguration. OPA is the oean model omponent of NEMO (Nuleus For
European Modelling of the Oean) and is widely used in the sienti om-
munity. Moreover it is beoming a major ator in operational oeanography
(Merator, ECMWF, UK-Met oe) Itsformulationisbased on the so-alled
primitiveequationsforthetemporalevolutionof oeanveloityurrents,tem-
perature and salinity in its three horizontal and vertial dimensions. These
equationsarederived fromNavier-Stokesequationsoupledwith astate equa-
tion for water density and heat equation, under Boussinesq and hydrostati
approximations.
Let us introdue the following variables:
U
the veloity vetor,U = U h + w k
(the subsripth
denotes the loal horizontal vetor),T
the poten-tialtemperature,
S
the salinity,p
the pressure andρ
the in-situdensity. Thevetor invariant form of the primitive equations in an orthogonal set of unit
vetors linked to the earth are writtenas follows
∂ U h
∂t = −
(∇ × U ) × U + 1
2 ∇ ( U ) 2
h
− f k × U h − 1 ρ 0
∇ h p + D U
∂p
∂z = −ρg
∇ · U = 0
∂T
∂t = −∇ · (T U ) + D T
∂S
∂t = −∇ · (S U ) + D S ρ = ρ (T, S, p)
where
∇
isthegeneralizedderivativevetoroperator,t
thetime,z
the vertialoordinate,
ρ 0
areferenedensity,f
theCoriolisaeleration,andg
thegravityaeleration.
D U , D T
andD S
aretheparametrizationofsmallsalephysisfor momentum, temperature and salinity, inluding surfae foring terms. A fulldesription of the modelbasis, disretization, physial and numerialdetails
an be found in [17℄.
Through this paper, OPA is used in its global free surfae onguration
ORCA-2. In this onguration the model uses a rotated grid with poles on
NorthAmeriaandAsiainordertoavoidthesingularityproblemontheNorth
Pole. The spae resolution is roughly equivalent toa geographial mesh of 2°
by1.3°withameridionalresolutionof0.5°neartheEquator(seegure1). The
Vertial domain, spreading from the surfae to a depth of 5000m, is meshed
using 31 levels with levels 1 to10 in the top 100 meters. The time step is 96
minutes sothat there are 15time steps per day. The modelis fored by heat,
freshwater, and momentum uxes from the atmosphere and/or the sea-ie.
Figure1: ORCA2 Mesh
The solar radiation penetrates the upper layers of the oean. Zero uxes of
heatand saltareappliedthrough the bottom. On the lateralsolid boundaries
ano-slip ondition isalsoapplied. Initializationof the modelfor temperature
andsalinityisbasedonthe Levitusetal. (1998)limatologywithanullinitial
veloity eld. For more details about the spae time-domain and the oean
physisofORCA-2, werefertothe pagedediated tothisongurationinthe
oialwebsiteof NEMO-OPA 1
.
TheongurationORCA-2isroutinelyusedbyMERCATOR/Meteo-Frane
to ompute the oeani omponent of their seasonal foreasting system. The
size ofOPA-9,200 modulesdening 800 proedures withover 100000 lines of
FORTRAN95, makes itthe largest appliation dierentiated by TAPENADE
to date. The omputational kernel whih is atually dierentiated aounts
for 330 proedures.
1
http://www.lody.jussieu.fr/NEMO/general/desription/ORCA_ong.html
3 Priniples of AD and the tool TAPENADE
TAPENADE [12℄ is an AD tool developed by the Tropis 2
team at INRIA.
Given the soure of anoriginal program that evaluates a mathematial fun-
tion, and given a seletion of input and output variablesto be dierentiated,
TAPENADEproduesanewsoureprogramthatomputesthepartialderiva-
tives of the seleted outputs with respet to the seleted inputs.
Basially,TAPENADE doesthatbyinsertingadditionalstatementsintoa
opy of the original program. Like other AD tools, TAPENADE is based on
thefundamentalobservationthattheoriginalprogramP,whateveritssizeand
run time, omputes a funtion
F, X ∈IR m 7→ Y ∈IR n
whih is the ompositionof the elementary funtions omputed by eah run-time instrution. In other
words if P exeutes a sequene of elementary statements
I k , k ∈ [1..p]
, then Patually evaluates
F = f p ◦ f p − 1 ◦ · · · ◦ f 1 ,
whereeah
f k
isthefuntion implementedbyI k
. Therefore onean apply thehain rule ofderivativealulusto getthe Jaobianmatrix
F ′
,i.e. the partialderivatives of eah omponent of
Y
with respet to eah omponent ofX
.Calling
X 0 = X
andX k = f k (X k − 1 )
the suessive values of all intermediate variables, i.e. the suessive states of the memory throughout exeution of P,we get
F ′ (X) = f p ′ (X p − 1 ) × f p− ′ 1 (X p − 2 ) × · · · × f 1 ′ (X 0 ) .
(1)The derivatives
f k ′
of eah elementary instrution are easily built, and mustbe inserted in the dierentiated program so that eah of them has the values
X k − 1
diretly available for use. This proess yields analyti derivatives, thatare exat up tonumerial auray.
In pratie, two sorts of derivatives are of partiular importane in sien-
ti omputing: the tangent (or diretional) derivatives, and the adjoint (or
reverse) derivatives. In partiular, tangent and adjoint are the two sorts of
derivative programs required for OPA, and TAPENADE provides both. The
tangent derivativeis the produt
Y ˙ = F ′ (X) × X ˙
of the fullJaobiantimes adiretion
X ˙
inthe input spae. From equation (1), we ndY ˙ = F ′ (X) × X ˙ = f p ′ (X p− 1 ) × f p ′ − 1 (X p− 2 ) × · · · × f 1 ′ (X 0 ) × X ˙
(2)2
http://www-sop.inria.fr/tropis/
whihismostheaplyexeutedfromrighttoleftbeausematrix
×
vetorprod-uts are muh heaper than matrix
×
matrix produts. This is also the mostonvenient exeution order beause it uses the intermediate values
X k
in thesame order as the program P builds them. On the other hand the adjoint
derivativeis the produt
X = F ′∗ (X) × Y
of the transposed Jaobiantimes aweight vetor
Y
in the output spae. The resultingX
is the gradient of thedot produt
(Y · Y )
. From equation (1), we ndX = F ′∗ (X) × Y = f 1 ′∗ (X 0 ) × · · · × f p ′∗ − 1 (X p− 2 ) × f p ′∗ (X p− 1 ) × Y
(3)whih is alsomostheaply exeuted from righttoleft. However, this uses the
intermediate values
X k
in the inverse of their buildingorder inP.Regarding the runtime ost for obtaining the derivatives, both tangent
Y ˙
and adjointX
ost only a small multiple of the original program P. Theslowdown fator is less than 4 in theory. In pratie it an be less than 2 for
thetangent,whereasitanreahupto10fortheadjointforareasondisussed
below. Despite its higher ost, the adjoint ode is still by large the heapest
way to obtain gradients. To get the gradient with the tangent mode would
require
m
runs ofthe tangentode, one perdimensionofX
,whereas this ostisindependent from
m
with the adjointmode.The diulty of the adjoint mode lies in the fat that it needs the inter-
mediatevalues
X k
inreverse order. Tothis end, TAPENADE basially usesatwo-sweepsstrategy,alledStore-All. Intherstsweep(theforwardsweep),
aopy of the original programP isrun, together with Push statements that
store intermediate values on a stak just before they get overwritten. In the
seond sweep (the bakward sweep), the derivative statements ompute the
elementaryderivatives
f k ′∗ (X k − 1 )
fork = p
down to1
, using Pop statementsto restore the intermediate values as they are required. This inurs a ost in
memory spae as the maximum stak size needed is attained at the end of
the forward sweep, and is thus proportional to the length of the program P.
There is also a runtime penalty for these stak manipulations. TAPENADE
implements a number of strategies [11℄ to mitigate this ost, based on stati
data-owanalysis oftheprogram'sontrolowgraph,reduing thenumberof
values
X k
thatneedtobestored. HoweverforverylongprogramssuhasOPA,involving unsteady simulations, Store-All an not work alone. TAPENADE
ombinesit with a storage/reomputationtrade-o alled hekpointing.
Chekpointing redues the maximum stak size at the ost of dupliated
exeutions. Consider apiee
C
of the originalprogramP.ChekpointingC
asillustrated on gure 2 means that during the main forward sweep,
C
pushesno value on the stak. When the bakward sweep reahes bak to the plae
where intermediate values are now missing on the stak, it runs
C
a seondtime, this time with the Store-All strategy i.e. pushingvalues on the stak.
The bakward sweep an then resume safely. To run
C
twie requires thatenoughof its input values, a snapshot, are stored but the size of a snapshot
isgenerallymuhless thanthestaksize usedby
C
. Obviously,thisalsoslowsdown the adjoint program. When
C
is well hosen, hekpointing an divide the peak size of the stak by a fator of two. Chekpoints an be nested, inwhih ase both the stak's peak size and the adjoint runtime slowdown an
growaslittleasthelogarithmofthesizeofP.Initsdefaultmode,TAPENADE
applieshekpointing to eah proedureall.
successive sweeps
{ C
Figure 2: Chekpointing applied to the program piee
C
. Rightwards arrowsrepresent forward sweeps, thik when they store intermediate values on the
stak, thin otherwise. Leftwards arrows represent bakward sweeps. Blak
dots are stores, white dots are retrieves. Small dots are Push and Pops, big
dots are snapshots.
TAPENADE apaity togeneraterobust and eienttangent andadjoint
odes has been demonstrated on several real-world test appliations[15, 7, 1,
13, 16℄. Regarding the appliation language, it an handle programs written
inFORTRAN.Takingintoaountthenewprogrammingonstrutsprovided
by FORTRAN95 has required an important programming eort in the past
few years, mostly to handle modules, strutured data types, array notation,
pointers,anddynamimemoryalloation. SinethenewOPA9isnowwritten
in FORTRAN95, dierentiation of OPA is a very realisti test for the new
TAPENADE 2.2.
There exist several other AD tools. Restriting to the tools whih, like
TAPENADE, operate by soure transformation, provide tangent and adjoint
modes, use global program analysis to optimize the dierentiated ode, and
have demonstrated their appliability on large industrialodes, we an men-
tion TAF [4℄a pioneer of AD for meteorology, now the standard AD tool for
thepopularMITGlobalCirulationModel. UnlikeTAPENADE's,theadjoint
mode of TAF regenerates the intermediate values
X k
by reomputation from angiven initialpoint. This is alleda Reompute-All strategy. Comparisonwith Store-All strategy is getting blurred by nested hekpointing, as the ad-
joint odes grow more alike as more hekpoints are inserted. OpenAD [20℄,
suessor of ADIFOR and ADIC, uses the Store-All strategy. There are ex-
periments to also apply OpenAD to the MIT GCM. The tool Adol-C [10℄,
although using operator overloadinginstead of soure transformation, is very
popularand has been appliedsuessfully tomanyindustrialappliations. Its
adjoint mode an be seen as anextension of the Store-All strategy: not only
theintermediatevaluesarestoredonthestak,butalsotheomputationgraph
tobedierentiated. This allows theAD tooltoperform further optimizations
onthis graph, atthe ost of ahigher memory onsumption.
4 Applying TAPENADE to OPA
We generatedworking tangentand adjointodes forthe omputationalkernel
of OPA, using TAPENADE. Depending on the nal appliation (f setion
5), the atual funtion to dierentiate as well as the input and output vari-
ables may be dierent, but the tehnial diulties that we enountered are
essentially the same. This setion desribesthese points.
4.1 FORTRAN95 onstruts
ThenewOPA9usesextensivelythemodularonstruts ofFORTRAN95. We
hadtoextendthe all-graphinternalrepresentationof TAPENADE tohandle
the nesting of modules and proedures. Essentially this nesting is mirrored
intothe dierentiated ode.
Beause a module an dene private omponents, subroutines in the dif-
ferentiatedmodules donot haveaess toall variablesof the originalmodule.
Thereforethedierentiatedmodulemustontainitsownopyofalltheoriginal
module's variables,types, and proedures. This is a hange inTAPENADE's
dierentiation model: the dierentiated ode annot just all or use parts of
the originalode; itmust ontain itsown opies of those. In other words, the
dierentiated ode neednot be linked with the original.
The interfae mehanism of FORTRAN95 is a way to implement over-
loaded proedures. This is stati overloading, whih is resolved at ompile
time. Therefore we had to extend TAPENADE type-heking phase to om-
pletely solve the alls to interfaed proedures. Conversely, TAPENADE is
now able to generate interfaes on the dierentiated proedures, so that the
generalstruture of the ode is preserved.
The arraynotationof FORTRAN95is usedsystematiallyinOPA. At the
same time, dierentiation requires that many alls to intrinsi funtions be
split to propagate the derivatives. When these funtions are used on arrays
("elemental"intrinsis)TAPENADE must generate a ode whih is far from
trivial. Forinstane the single statementfrom OPA:
zws(:,:,:) = SQRT(ABS(psal(:,:,:)))
generates inthe adjointmode
abs1 = ABS(psal(:,:,:))
mask = (psal(:,:,:) .GT. 0.0)
...
WHERE (abs1 .EQ. 0.0)
abs1b = 0.0
ELSEWHERE
abs1b = zwsb(:, :, :)/(2.0*SQRT(abs1))
END WHERE
WHERE (.NOT.mask(:, :, :))
psalb(:, :, :) = psalb(:, :, :) - abs1b
ELSEWHERE
psalb(:, :, :) = psalb(:, :, :) + abs1b
END WHERE
Without going in too muh detail into the adjoint dierentiation model, we
observe that the test that is needed to protetthe dierentiated ode against
the non-dierentiability of SQRT at 0, as well as the test that ontrols the
dierentiation of ABS, have been turned into WHERE onstruts to keep the
runtime benets of array notation. Some temporary variables are introdued
automatially to store ontrol-ow deisions (e.g. abs1 and mask), although
TAPENADE stilldoesn't dothis in anoptimal way on the example.
OPA usespointers anddynamimemoryalloation(allstoALLOCATE and
DEALLOCATE). This is an appliation for the pointer analysis now available
in TAPENADE, nding whether a variable has a derivative, even when this
variable is aessed through a pointer. Unfortunately, dynami alloation is
handled partly, i.e. only in the tangent mode of TAPENADE. In the adjoint
mode, we have no general strategy for memory alloation and TAPENADE
sometimesannotprodueaworkingode. Weunderstandthat theadjointof
an alloate should be a DEALLOCATE, and vie-versa, but some hanges must
be made by hand onthe dierentiated ode to makeit work.
4.2 Chekpointing and hidden variables
OPA reads and writes several data les, not only during the pre- and post-
proessingstages,butalsoduringtheomputationalkernelitself. Soureterms
suhas the wind stress are being read at intermediatetime steps. Also, some
modulesandproeduresdeneprivateSAVEvariables,whosevalueispreserved
but annot be aessed from outside. Although unrelated, these two points
are just examples of a ommon problem: they an make a proedure non
reentrant".
Ifaalled proeduremodiesaninternal SAVEvariable,itbeomesimpos-
siblefromthe outsidealling ontextto allthe proedureaseondtime with
an idential result. Similarly if the alled proedure reads from a previously
openedle,andjustmovesthe readpointerfurtherinthele,thenitbeomes
impossibleto allthe proedure twie and obtain the same values read.
Non reentrant proedures are a problemfor the hekpointing strategy of
the adjointmode. We sawinsetion3 thathekpointing relieson allingthe
hekpointedpieetwie,insuhawaythattheseondallisequivalenttothe
rst. To this end, a suient subset of the exeution ontext, the snapshot,
must besaved andrestored. HiddenvariableslikeaninternalSAVEvariableor
the readpointer insideanopened leannotbesaved nor restored in general.
Whenhekpointingwould requirehiddenvariablestobeputinthe snapshot,
then hekpointing shouldbe forbidden.
Similarly, when a proedure only alloates some memory, the alloation
must not be done twie. If this proedure is hekpointed, then one must
dealloatethe memory when restoringthe snapshot beforethe dupliate all.
TAPENADE isnot yet able todo this automatially.
TAPENADE has some funtionalities to ope with this hidden variables
problem,but inall ases interation with the user isneessary. First, TAPE-
NADE issues a warning message when a subroutine annot be hekpointed
beauseof aprivate SAVEvariable. Themessageis issuedonlywhen this vari-
able would be part of the snapshot for this proedure. When this happened
for OPA, we just turned by hand the variables in question into publi global
variables in the original ode. In priniplethis ould also be done automati-
ally. Howeverthere are onlya handfulsuh variables,thus developingthis is
not our priority.
When asubroutine isnot reentrant beauseof I/Olepointers orbeause
of isolatedmemory alloationordealloation,then TAPENADE lets the user
label the subroutine so that it must not be hekpointed. For OPA, we took
another strategy: we modied the main I/O subroutines so that they always
rst make sure that the le is opened and then only use diret read into the
lewithoutusing aread I/O pointer. Thusall I/Osubroutines are reentrant.
4.3 Binomial Chekpointing
Automati Dierentiation of OPA is one of the most ambitious appliations
of TAPENADE so far. It means building the adjoint of a piee of ode that
performs an unsteady nonlinear simulation over a very large number of time
steps. Eahtime stepomputes anew statewhosesize rangesinthehundreds
of megabytes. In adjoint mode if no hekpointing was applied, whih means
that all intermediate values were to be stored on a stak, we ould exeute
onlya handfulof time steps before werun outof memory even onour largest
workstation. Chekpointingisompulsory toomputethe adjointoverseveral
thousands of time steps, whih is our goal.
We saw in setion 3 that TAPENADE applies hekpointing at the level
of subroutine alls, i.e. eah all is hekpointed. This easy strategy is often
far fromoptimal. On one hand several alls are better not hekpointed, and
TAPENADEnowoerstheoptiontomarkseletedallsfornothekpointing.
On the other hand, hekpointing should be applied at other loations. For
example at the top level of the simulation program is a loop over many time
steps. We denitely need an eient hekpointing sheme applied at this
level of time iterations.
One lassial solution used by TAF on the MIT GCM ode [14℄, is alled
multilevelreursivehekpointing. Basially,itsplitsthe ompletetimeinter-
valintoasmallnumberof equidistant intervals,thenapply the same strategy
to eah of the sub-intervals. For instane 64 time steps an be split into 4
largeintervalsof4smallintervalsof4timesteps,asskethed ongure3. This
onsumes a maximum of 9 simultaneous snapshots, and the average number
of dupliate exeutions for a time step is
2.25
. In a more realisti situation,1000 time steps an be split into 10 large intervals of 10 small intervals of
10 time steps, and one an gure out that this onsumes a maximum of 27
simultaneous snapshots,and the average numberof dupliate exeutions fora
time step is
2.7
.0 56 60 61 62
58 57 56 52
54 53 52 48
50 49 48 32
46 45 44 40
42 41 40 36
38 37 36 32
34 33 32 16
30 29 28 24
26 25 24 20
22 21 20 16
18 17 16 0
14 13 12 8
10 9 8 4
6 5 4 0
2 1 0
Figure3: Three-levelshekpointing with 64timesteps and9 snapshots. For-
ward omputations go right, adjoint omputations go left. Blak irles rep-
resent writing/taking a snapshot, white irles represent reading an available
snapshot.
However, itwasshown in [21℄ thatthis strategyis not optimal. Under the
reasonable assumptions that all time steps ost the same run time, and that
thesnapshotneededtorunagainfromtimestep
n
ton +1
isthesameastorunfromstep
n
toany later stepn + x
, Griewank andWaltherhaveharaterized the optimal distribution of nested hekpoints, whih follows a binomiallaw.Withthisoptimalstrategy,bothspatialandtemporalomplexityoftheadjoint
ode grow logarithmially with respet to the number of time steps of the
original simulation. In other words, both the slowdown fator whih grows
like the number of times eah time step is exeuted, and the memory whih
grows like the number of simultaneous snapshots, grow logarithmially with
the total number of time steps.
In real appliations,run-timeand memoryspae donot behave symmetri-
ally. One an always waita little longer for the result, whereas the memory
spae is bounded. Therefore the maximum number of snapshots
d
that anbe stored simultaneously is xed. Then [8℄ shows that the optimal strategy
givesaslowdownfator thatgrowsonlylikethe
d th
rootofthetotalnumberoftimesteps, whihisstillverygood. Figure4shows the optimalhekpointing
strategy for the same problem as gure 3 i.e. 64 time steps with memory for
9 snapshots. The average number of dupliate exeutions for a time step is
only
2
. For the more realisti situation (1000 time steps and memory for 27snapshots) the average number of dupliate exeutions is only
2.57
.We implemented this optimal strategy in the adjoint ode of OPA. We
madeour rstexperiments byhand modiationofthe adjointodeprodued
by TAPENADE. Still, TAPENADE produed automatially the proedures
that storeand retrieve thesnapshot, andtherefore the handmodiationwas
benign: given the number of time steps, a general proedure 3
shedules the
optimalsequene of ations (store snapshot, retrievesnapshot, run time step,
run adjoint time step) to dierentiate the omplete simulation. Further ver-
sions of TAPENADE will fully automate this proess. Figure 5 shows the
performanes onOPA. Theyare ingoodagreementwiththe theory. Notie in
partiular the two small inetion points on the urve around 150 iterations
and 800 iterations. Going bak to the optimality proof in [8℄, we see that
the optimalstrategy is partiularlyeient whenthe numberof time steps is
3
AFORTRAN95implementationofthisshedulingproedureanbefoundinwww.inria-
sop/tropis/ftp/Hiham_Tber/
0 56 60 60 62 58 57 56 51
54 53 52 51 45
49 48 47 46 45 38
43 42 41 40 39 38 30
36 35 34 33 32 31 30 16
28 27 26 25 24 22 22 19
20 19 16
17 16 0
14 13 12 11 10 9 6
7 6 3
4 3 0
1 0
Figure4: Optimalbinomialhekpointing with 64timesteps and9 snapshots
200 400 600 800 1000 1200
4.5 5 5.5 6 6.5 7 7.5
Total number of time steps
Slowdown factor
Figure5: Optimalbinomialhekpointingwith15snapshots: slowdown fator
asafuntionofthe totallengthofthe initialsimulation. Theslowdown fator
isthe run-timeratio of the adjointode ompared tothe originalode.
exatly
η(d, t) = (d + t)!
d!t!
where
d
isthenumberofsnapshotsandt
isthenumberofdupliateexeutionsallowed per time step. Forour target mahine
d = 15
and we ndη(15, 2) = 136
andη(15, 3) = 816
, whihorresponds tothe inetion pointsof gure 5.ForthepreviousversionOPA8,theadjointwaswrittenbyhand. Neverthe-
less, even a hand-written adjoint must implementstrategies to retrieve inter-
mediatestates inreverse order that is,something verylose tohekpointing.
Looking at this hand-written adjoint, we rst observe that the hekpointing
strategyisneithermultilevelnoroptimalbinomial. Itismorelikeasinglelevel
strategy,withonesnapshotstoredeveryxednumberoftimesteps. Duringthe
reverse sweep, states between two stored snapshots are rebuiltapproximately
usinglinear interpolation. The advantageis that fewtime stepsare evaluated
twie, and therefore the slowdown fator remainswell below4. We an see at
leasttwodrawbaks. First,this handmanipulationrequiresdeepknowledgeof
the original program and of the underlyingequations. This method does not
blend easily with Automati Dierentiation. It is not yet automated in any
AD tool and thereforetedious and error-proneode manipulationswould still
beneessary. Seond, this introdues approximationerrors intothe omputed
derivatives,whosemathematialbehaviorisunlear. The gradientobtained in
theend isusedinomplexoptimizationsordata-assimilationloops,and small
errors may result in poor onvergene. In any ase, for very large numbers of
time steps, we believe a trade-o between exat binomial hekpointing and
approximate interpolation is worth experimenting. Interpolation is probably
good enough for many variables that vary very slowly, and whih ould be
designated by the end-user, and only the other variables would need to be
stored.
4.4 Iterative linear solver
TheOPAmodelsolvesanelliptiequation atthe endof eah timestep, using
aniterativemethod that generates asequene of approximationsof the exat
solution. The mehanial appliation of AD on this kind of methods gives a
sequene of derivatives of the approximate solutions with the same number
of iterations as the original solver. The reason is that AD keeps the ow of
ontroloftheoriginalprograminthe dierentiatedprogram. Inpartiularthe
onvergene tests are stillbasedonlyonthe non-dierentiatedvariables. Nat-
urally,one may askwhether and howAD-produedderivativesare reasonable
approximations to the desired derivative of the exat solution. The issues of
derivative onvergene for iterative solvers in relation to AD are disussed in
[6,9, 2℄.
OPA provides two alternative algorithms to solve the ellipti equation:
PCGforPreonditionedConjugateGradient,andSORforSuessiveOver-
Relaxationmethod. Bothalgorithmsgive orretresults forthe originalode,
butPCG isgenerallypreferredthanks toitseieny andvetorizationprop-
erties. However,theAD-dierentiatedodegivesdierentresultsusingthetwo
algorithms. Figure 6 ompares the AD-derivatives with approximate deriva-
tives obtained by divided dierenes. We see that the derivatives obtained
with the SOR algorithm remain orret when the number of time steps in-
reases. On the ontrary, the derivatives obtained with the PCG algorithm
beome ompletelywrong after 80time steps. Notie that this ours in tan-
gent mode as well as in adjoint mode: the derivatives obtained with PCG,
although wrong, remain idential in tangent and adjoint. Our explanation is
thateahiterationofPCGinvolvestheomputationofsalarprodutsofvari-
ables that depend on the state vetor, thus making the numerial algorithm
nonlineareven thoughthe elliptiequationislinear. In[6℄,Gilbert has shown
that the appliation of AD to a xed point iteration gives a derivative xed
pointiterationthat onverges R-linearlytothe desiredderivativeinpartiular
in the ase of a large ontrative iterate or seant updating. Unfortunately
this is not the ase for quasi-Newton iterative solvers suh as PCG, for whih
there isno similaronvergene result to our knowledge.
To solve this problem for the tangent-dierentiated OPA we exploit the
linearity of the ellipti system, and for the adjoint-dierentiated OPA we ex-
ploit the self adjointness property of the ellipti operator [22℄. We an thus
use the original PCG routine itself to solve for the dierentiated linear sys-
tems. Pratially, we do this using the so-alledblak-box feature provided
in TAPENADE. Figure 6 shows that (here for the tangent mode) the PCG
gives the same auray asthe SOR solver.
Figure6: evolutionoftherelativeerrorbetweentangentderivativeanddivided
dierenes, for the three strategies: SOR and straightforward AD, PCG and
straightforward AD, PCG with the blak box strategy
In another experiment, we tried to use straightforward AD with the PCG
solver, but this time xing the number of PCG iterations to some very high
value. We observed that the derivatives beome oherent again with divided
dierenes. Thisouldbeanotherwaytosolveourproblem,butitisertainly
expensiveand thehoie ofthe high iterationnumberisdeliate. Thisprob-
lemdenitelydeservesfurtherstudy,andonrmsthegeneralreommendation
not to dierentiate solvers of a nonlinear kind, and use a blak-box strategy
instead.
5 Validation Experiments
5.1 Corretness test
The lassialway tohek for orretnessof the automatiallygenerated tan-
gentand adjoint odes is as follows:
1. Choose an arbitrary input
X
and and arbitrary diretionX ˙
. Computethe Divided Dierene
DD = F (X + ε X) ˙ − F (X) ε
for agoodenoughsmall
ε
.2. Usingthe tangent dierentiated program, ompute
Y ˙ = F ′ (X) × X ˙
.3. Usingthe adjoint dierentiated program, ompute
X = F ′∗ (X) × Y ˙
and nally hek that
(DD · DD) = ( ˙ Y · Y ˙ ) = (X · X) ˙
. We performed thistest for the omplete global ORCA-2 simulation on 1000 time steps and its
derivative odes. The results are shown in table 1. The values math, and
Table 1: Dot produt test for 1000 time steps
(DD · DD)
(ε = 10 − 7
) 4.405352760987440e+08( ˙ Y · Y ˙ )
4.405346876439977e+08(X · X) ˙
4.405346876439867e+08( ˙ Y · Y ˙ )
and(X · X ˙ )
math very well, up to the last few digits, whih showsthat the tangent and adjoint odes really ompute the same derivatives, only
inadierentomputationorderasshown byequations(2)and(3). Thevalues
of
(DD · DD)
and( ˙ Y · Y ˙ )
don't mathsowell,beause of the weakness of theDividedDierenesapproximation. Figure7shows this weakness: Forasmall
10 −12 10 −10 10 −8 10 −6 10 −4 10 −2
10 −6 10 −4 10 −2 10 0
ε
Figure7: Relative error of Divided Dierenes with respet to AD-generated
derivatives, omputed for various values of of the step size
ε
value of
ε,
the dominant error is due to mahine auray. For a large valueof
ε,
the dominant error is due to the seond derivatives ofF
. The bestε
minimizesboth errors, but annot eliminatethem ompletely.
5.2 Sensitivity analysis on a long simulation
Oneofthemainappliationofadjointmodelsisthesensitivityanalysisi.e. the
study of how modeloutputvaries with hanges inmodelinputs. Usingdiret
orstatistialmethodswould require many integration of the nonlinear model
whileoneadjointmodelintegrationisenoughtoomputethissensitivity. Asan
example,gure8shows theoutputmapofthesensitivityoftheNorthAtlanti
meridionalheat uxat 29°Nto hanges inthe initialsea surfae temperature
(
SST t 0
) over one year integration period, starting January 1, 1998. This isdone by omputing the gradient respet to
SST t 0
ofJ = Z t N
t 0
Z Z
Ω
T.v dxdzdt
where
Ω
is the zonal ross setion at 29°N in the North Atlanti,T
is thetemperature and
v
is the meridionalurrent veloity.Contours in gure 8 show where variation of initialSST would eet the
mostuponheattransportat29°N.Itshowslargesalepatternsmainlyloated
north of the 29°N parallel and in the Caribbean sea with a strong spot o
Moroo. These results are onsistent with those obtained by Marotzke et al.
([18℄)
This mapwasomputedbythe TAPENADE-generated adjointofOPA on
the global ORCA2 grid, over 5475 time steps (1 year). This experiment was
donewiththe SORalgorithmasthe iterativelinearsolver. TheTAPENADE-
generated adjoint omputed this sensitivity map in a time that is only 8.03
times that of the originalsimulation.
5.3 Data Assimilation
For further validation of the automatially generated derivatives, we arried
out a data assimilation experiment. This was done in a so-alled twin ex-
periment framework whereby the diret modeltrajetory is used to generate
syntheti observations. The initial sea surfae temperature is perturbed by
a white noise and it has to be reovered using variational data assimilation
tehniques. Syntheti observation are given by the sea surfae height (SSH)
and thesea surfae salinity (SSS)generatedfromthe model's originaloutputs
startingfrom the unperturbed SST.
The ost funtion to be minimisedis
J(SST (t 0 )) = Z t N
t 0
k SSH(t) − SSH o (t) k 2 + k SSS(t) − SSS o (t) k 2 dt
(4)Where the supersript
o
stands for syntheti observation and
SSH(t)
andSSS(t)
are modeloutput.For omputing ost issues, only the Antarti zoom of ORCA2 is onsid-
ered, the minimisation is done an iterative gradient searh algorithm where
Figure8: SensitivitymapoftheNorthAtlantiheattransportat29°N(dotted
line), with respet tohanges in the initialsurfae temperature
the gradient of
J
is umputed using adjoint tehniques. Figure 9 illustrates the performane ofthe optimizationloopforanintegration periodof 1monthi.e. 450 time steps. The ost funtion dereases by two orders of magnitude.
Figure 10 indiates that the true solution (top panel) is reovered with a
goodapproximation(bottompanel)fromthe randomlyperturbed one(middle
panel),showing the qualityof the derivativesobtained.
10 0 10 1 10 2
10 4 10 5 10 6
Iterations
Cost function
Figure 9: Twin experiment: Convergene of the ost funtion
6 Conlusion and Outlook
The eort to build the tangent and adjoint odes for the previous version 8
of the OPA oean General Cirulation Model has ost several months devel-
opment from an experiened researher. For the new version OPA 9 written
in FORTRAN95, the use of the AD tool TAPENADE signiantly redues
thiseort. Our rstnumerialappliationsshowthequality ofthe derivatives
obtained. This works validatesthe hoie of AD asthe strategy toobtain the
tangent and adjointfor OPA 9, and for the versionsto ome.
At the same time, OPA is the largest FORTRAN95 appliation dierenti-
ated with TAPENADE. This work has pointed at a number of limitationsof
Figure10: Twin experiment: True eld (top), Initialperturbed eld (middle)
and identied optimalsea surfae temperatures (bottom)
TAPENADE thathavebeen lifted. Otherlimitationsremain,suhasthenon-
reentrant proedures, whih need to be addressed in future work. Suessful
dierentiation of OPA denitely inreases our ondene inTAPENADE.
This works is also an additional illustration of the superiority of the bi-
nomialhekpointing strategy, ompared tomultilevelhekpointing. By the
standardsofotherappliationelds,e.g. CFD,aslowdownoftheadjointode
of only 7 for a nonsteady simulation on 1000 time steps would be onsidered
very good. By the standards of weather simulation or oean modeling how-
ever, sientists expet yet faster adjoints, at the ost of a radial approxima-
tion. Even ifweonsider that these approximations hange the mathematial
nature of the optimization proess, we understand they are neessary and we
shall study how they an beproposed as anoption by the AD tool.
This workhas underlined several diretionsfor furtherresearh inAD and
ADtools. Someofthemarealreadybeingstudiedbyresearhersinourgroups.
Consideringtheappliation language,two onstrutsneedtobedierentiated
better:
The next experiment tobe made very soon is to apply TAPENADE to
the parallelized version of OPA. This is neessary before the generated
tangent and adjointodes an be used inprodution ontext.
TheOPAsoure makesextensiveuse of the preproessor diretivessuh
as#IFDEF.TAPENADE doesnotdealwiththesediretivesbeausethey
donotrespetthesyntatistrutureofaode. Handlingthesediretives
inthe AD tool is inour opinion hopeless. What might bedone though,
is to generate dierentiated odes for eah possible preproessed ode,
anddevise atooltoputthe diretivesbakintothe dierentiatedodes.
Thisismadeeasierifthedierentiatedodelosely followsthe struture
of the original,as isthe ase with TAPENADE.
Consideringspeiallyadjointdierentiation,wehopetoobtainmoreeient
ode through a more systemati exploitation of self-adjointness, e.g. of the
ellipti operator. We also hope to optimize the hekpointing strategy. In
itspresent version, TAPENADE applieshekpointing toeahproedureall.
Usingprolinginformation,webelieveweandetetseveralproedureallsfor
whih hekpointing is useless or ounter-produtive. TAPENADE is already
able touse this informationtoprodue a better adjoint.
Referenes
[1℄ W. Castaings, D. Dartus, M. Honnorat, F.-X. Le Dimet,Y. Loukili, and
J.Monnier. Automatidierentiation: Atoolforvariationaldataassimi-
lationand adjointsensitivityanalysis foroodmodeling. InBüker etal,
editor, Automati Dierentiation: Appliations, Theory, and Implemen-
tations, LNCSE, pages 249262.Springer, 2005.
[2℄ B. Christianson. Reverse aumulationand attrative xed points. Opti-
mization Methods and Software, 3:311326, 1994.
[3℄ P. Courtier, J.-N. Thépaut, and A. Hollingsworth. A strategy for opera-
tionalimplementationof4d-var,using aninrementalapproah. Q. J.R.
Meteorol. So., 120:13671387,1994.
[4℄ R. Giering. Tangent linear and Adjoint Model Compiler, Users manual,
1997. http://www.autodi.om/tam.
[5℄ R.Gieringand T.Kaminski. Reipesforadjointodeonstrution. ACM
Transations on MathematialSoftware, 24(4):437474,1998.
[6℄ J.C.Gilbert. Automatidierentiationanditerativeproesses. Optimiza-
tion Methods and Software,1:1321, 1992.
[7℄ M.B.Giles,D.Ghate,andM.C.Duta.Usingautomatidierentiationfor
adjointCFD ode development. In Uthup etal, editor, Reent Trends in
Aerospae Design and Optimization, pages 363373. Tata-MGraw Hill,
New Delhi, 2005. Post-SAROD-2005, Bangalore,India.
[8℄ A.Griewank. Ahievinglogarithmi growthoftemporaland spatialom-
plexity in reverse automati dierentiation. Optimization Methods and
Software, 1:3554,1992.
[9℄ A.Griewank,C.Bishof,G.Corliss,A.Carle,andK.Williamson.Deriva-
tiveonvergene foriterativeequationsolvers. OptimizationMethods and
Software, 2:321355, 1993.
[10℄ A.Griewank,D.Juedes,J.Srinivasan,andC.Tyner. ADOL-C,apakage
for the automati dierentiation of algorithmswritten in C/C++. ACM
Trans.Math. Software,22(2):131167,1996.
[11℄ L. Hasoët and M. Araya-Polo. The adjoint data-ow analyses: For-
malization, properties, and appliations. In Automati Dierentiation:
Appliations,Theory, andTools,LetureNotes inComputationalSiene
and Engineering. Springer, 2005. Seleted papers fromAD2004 Chiago.
[12℄ L. Hasoëtand V. Pasual. Tapenade 2.1user's guide. Tehnial Report
0300, INRIA,2004.
[13℄ L. Hasoët, M. Vázquez, and A. Dervieux. Automati dierentiation for
optimumdesign,appliedtosoniboomredution.InKumaretal.,editor,
Proeedings of ICCSA'03, Montreal, Canada, pages 8594. LNCS 2668,
Springer, 2003.
[14℄ Heimbah, P. and Hill, C. and Giering, R. An eient exat adjoint
of the parallel MIT general irulation model, generated via automati
dierentiation. Future Generation Computer Systems, 21(8):13561371,
2005.
[15℄ J. Kim, E.Hunke, and W. Lipsomb. Sensitivity analysis and parameter
tuningshemeforglobalsea-iemodeling.OeanModelingJournal,14(1
2):6180, 2006.
[16℄ C. Lauvernet, F. Baret, L. Hasoët, and F.-X. LeDimet. Improved esti-
mates of vegetation biophysialvariables frommeris toa images by using
spatial and temporal onstraints. In proeedings of the 9th ISPMSRS,
Beijing, China,2005.
[17℄ G. Made, P. Deleluse, M. Imbard, and C. Levy. Opa8.1 oean general
irulation model referene manual. Tehnial report, Pole de Modelisa-
tion, IPSL, 1998.
[18℄ J. Marotzke, R. Giering, K.Q. Zhang, D. Stammer, C. Hill, and T. Lee.
Constrution of the adjoint MIT oean General Cirulation Model and
appliation to atlanti heat transport sensitivity. J. Geophys. Res.,
(104(C12)):29,52929,548,1999.
[19℄ O.Talagrand. The use of adjointequationsin numerialmodeling of the
atmospheri irulation. In A. Griewank and G. Corliss, editors, Auto-
mati Dierentiation of Algorithms: Theory, Implementation and Appli-
ation, pages 169180,1991. Philadelphia,Penn: SIAM.
[20℄ J. Utke, U. Naumann, M. Fagan, N. Tallent, Strout M., P. Heimbah,
C. Hill, and C. Wunsh. OpenAD/F: A modular, open-soure tool for
AutomatiDierentiationofFortranodes. TehnialreportANL/MCS-
P1230-0205, Argonne National Laboratory, 2006. Submitted to ACM
TOMS.
[21℄ A. Walther and A. Griewank. Advantages of binomial hekpointing for
memory-reduedadjointalulations.InFeistaueretal,editor,Numerial
MathematisandAdvanedAppliations,pages834843.Springer, Berlin,
2003. Proeedingof ENUMATH 2003.
[22℄ A. Weaver, J. Vialard, and D. Anderson. Three- and four-dimensional
variational assimilation with an oean general irulation model of the
tropialpaioean: I.formulation,internaldiagnostisandonsisteny
heks. Monthly Weather Review, 131(7):13601378, 2003.