Variational approach to data assimilation: optimization aspects and adjoint method

(1)

Variational approach to data assimilation: optimization aspects and adjoint method

Eric Blayo

University Grenoble Alpes and INRIA

(2)

Objectives

I

introduce data assimilation as an optimization problem

I

discuss the different forms of the objective functions

I

discuss their properties w.r.t. optimization

I

introduce the adjoint technique for the computation of the gradient

Link with statistical methods: cf lectures by E. Cosme

Variational data assimilation algorithms, tangent and adjoint

codes: cf lectures by M. Nodet and A. Vidard

(3)

Introduction: model problem

Outline

Definition and minimization of the cost function

The adjoint method

(4)

Model problem

Two different available measurements of a single quantity. Which estimation of its true value ?

−→

least squares approach

Example

2 obs y

₁

= 19

^◦

C and y

₂

= 21

^◦

C of the (unknown) present temperature x .

I

Let J(x ) =

¹₂

(x

−

y

1

)

²

+ (x

−

y

2

)

²

I

Min

_x

J (x)

−→

ˆ x = y

₁

+ y

₂

2 = 20

^◦

C

(5)

Model problem

Two different available measurements of a single quantity. Which estimation of its true value ?

−→

least squares approach

Example

2 obs y

₁

= 19

^◦

C and y

₂

= 21

^◦

C of the (unknown) present temperature x.

I

Let J(x ) =

¹₂

(x

−

y

1

)

²

+ (x

−

y

2

)

²

I

Min

_x

J (x)

−→

ˆ x = y

₁

+ y

₂

2 = 20

^◦

C

(6)

Model problem

Observation operator If6=units: y₁=66.2^◦F andy₂=69.8^◦F

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y₁)²+ (H(x)−y₂)²

I Min_x J(x) −→ˆx=20^◦C

Drawback # 1: if observation units are inhomogeneous y1=66.2^◦F andy2=21^◦C

I J(x) =1 2

(H(x)−y1)²+ (x−y2)²

−→xˆ=19.47^◦C !! Drawback # 2: if observation accuracies are inhomogeneous Ify₁is twice more accurate than y₂, one should obtainˆx= 2y₁+y₂

3 =19.67^◦C

−→J should be J(x) =1 2

"x−y₁ 1/2

2

+

x−y₂ 1

2#

(7)

Model problem

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y₁)²+ (H(x)−y₂)²

I J(x) =1 2

(H(x)−y1)²+ (x−y2)²

−→xˆ=19.47^◦C !!

Drawback # 2: if observation accuracies are inhomogeneous Ify₁is twice more accurate than y₂, one should obtainˆx= 2y₁+y₂

3 =19.67^◦C

−→J should be J(x) =1 2

"x−y₁ 1/2

2

+

x−y₂ 1

2#

(8)

Model problem

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y₁)²+ (H(x)−y₂)²

I J(x) =1 2

(H(x)−y1)²+ (x−y2)²

−→xˆ=19.47^◦C !!

Drawback # 2: if observation accuracies are inhomogeneous Ify1is twice more accurate than y2, one should obtainˆx= 2y₁+y₂

3 =19.67^◦C

−→J should be J(x) =1 2

"x−y₁ 1/2

2

+

x−y₂ 1

2#

(9)

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

−

y

1

)

²

σ₁²

+ (H

2

(x)

−

y

2

)

² σ₂²

If

H₁ =H₂=Id:

J(x) = 1 2

(x

−

y

₁

)

² σ₁²

+ 1

2 (x

−

y

₂

)

² σ₂²

which leads to ˆ x = 1

σ₁²

y

1

+ 1

σ₂²

y

2

1

σ₁²

+ 1

σ₂²

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ²₁

+ 1

σ²₂

= [Var (ˆ x)]

⁻¹

| {z }

accuracy

(cf BLUE)

(10)

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

−

y

1

)

²

σ₁²

+ (H

2

(x)

−

y

2

)

² σ₂²

If

H₁ =H₂=Id:

J(x) = 1 2

(x

−

y

₁

)

² σ₁²

+ 1

2 (x

−

y

₂

)

² σ₂²

which leads to ˆ x = 1

σ₁²

y

₁

+ 1

σ₂²

y

₂

1

σ₁²

+ 1

σ₂²

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ²₁

+ 1

σ²₂

= [Var (ˆ x)]

⁻¹

| {z }

accuracy

(cf BLUE)

(11)

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

−

y

1

)

²

σ₁²

+ (H

2

(x)

−

y

2

)

² σ₂²

If

H₁ =H₂=Id:

J(x) = 1 2

(x

−

y

₁

)

² σ₁²

+ 1

2 (x

−

y

₂

)

² σ₂²

which leads to ˆ x = 1

σ₁²

y

₁

+ 1

σ₂²

y

₂

1

σ₁²

+ 1

σ₂²

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ²₁

+ 1

σ²₂

= [Var (ˆ x)]

⁻¹

| {z }

accuracy

(cf BLUE)

(12)

Model problem

Alternative formulation:

background + observation If one considers that y

₁

is a prior (or background) estimate x

_b

for x, and y

2

= y is an independent observation, then:

J(x) = 1 2

(x

−

x

b

)

² σ²_b

| {z } Jb

+ 1 2

(x

−

y)

² σ²_o

| {z } J_o

and

ˆ x =

1

σ_b²

x

b

+ 1

σ_o²

y 1

σ²_b

+ 1

σ_o²

= x

_b

+

σ_b² σ²_b

+

σ²_o

| {z }

gain

(y

−

x

_b

)

| {z }

innovation

(13)

Definition and minimization of the cost function

Outline

Introduction: model problem

Least squares problems

Linear (time independent) problems

The adjoint method

(14)

Definition and minimization of the cost function Least squares problems

Outline

Introduction: model problem

Least squares problems

Linear (time independent) problems

The adjoint method

(15)

Generalization: arbitrary number of unknowns and observations

To be estimated: x=





 x1

... xn





 ∈IRⁿ

Observations: y=





 y1

... yp





 ∈IR^p

Observation operator: y≡H(x), withH:IRⁿ−→IR^p

(16)

Generalization: arbitrary number of unknowns and observations

A simple example of observation operator

If x=





 x1

x2

x3

x4







and y=

an observation of^x¹^+x₂² an observation ofx4

then H(x) =Hx withH=



 1 2

1

2 0 0

0 0 0 1





(17)

Generalization: arbitrary number of unknowns and observations

To be estimated: x=





 x1

... x_n





 ∈IRⁿ

Observations: y=





 y1

... y_p





 ∈IR^p

Observation operator: y≡H(x), withH:IRⁿ−→IR^p Cost function: J(x) = 1

2kH(x)−yk² withk.kto be chosen.

(18)

Reminder: norms and scalar products

u=





 u1

... u_n





 ∈IRⁿ

Euclidian norm: kuk²=u^Tu=

n

X

i=1

u_i²

Associated scalar product: (u,v) =u^Tv=

n

X

i=1

uivi

Generalized norm: letMa symmetric positive definite matrix M-norm: kuk²_M=u^TM u=

n

X

i=1 n

X

j=1

m_iju_iu_j

Associated scalar product: (u,v)M=u^TM v=

n

X

i=1 n

X

j=1

mijuivj

(19)

Generalization: arbitrary number of unknowns and observations To be estimated:

x

=







x

1

.. . x

_n





 ∈IRⁿ

Observations:

y

=







y

₁

.. . y

p





 ∈IR^p

Observation operator:

y≡

H(x), with H :

IRⁿ−→IR^p

Cost function: J(x) = 1

2

kH(x)−yk²

with

k.k

to be chosen.

(Intuitive) necessary (but not sufficient) condition for the existence of a unique minimum:

p

≥

n

(20)

Generalization: arbitrary number of unknowns and observations To be estimated:

x

=







x

1

.. . x

_n





 ∈IRⁿ

Observations:

y

=







y

₁

.. . y

p





 ∈IR^p

Observation operator:

y≡

H(x), with H :

IRⁿ−→IR^p

Cost function: J(x) = 1

2

kH(x)−yk²

with

k.k

to be chosen.

(Intuitive) necessary (but not sufficient) condition for the existence of a unique minimum:

p

≥

n

(21)

Formalism “background value + new observations”

Y

=

x_b

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−x_bk²_b

| {z } J_b

+ 1

2

kH(x)−yk²_o

| {z } J_o

= (x

−x_b

)

^TB⁻¹

(x

−x_b

) + (H(x)

−y)^TR⁻¹

(H(x)

−y)

The necessary condition for the existence of a unique minimum

(p

≥

n) is automatically fulfilled.

(22)

Formalism “background value + new observations”

Y

=

x_b

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−x_bk²_b

| {z } J_b

+ 1

2

kH(x)−yk²_o

| {z } J_o

= (x

−x_b

)

^TB⁻¹

(x

−x_b

) + (H(x)

−y)^TR⁻¹

(H(x)

−y)

The necessary condition for the existence of a unique minimum

(p

≥

n) is automatically fulfilled.

(23)

Formalism “background value + new observations”

Y

=

x_b

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−x_bk²_b

| {z } J_b

+ 1

2

kH(x)−yk²_o

| {z } J_o

= (x

−x_b

)

^TB⁻¹

(x

−x_b

) + (H(x)

−y)^TR⁻¹

(H(x)

−y)

The necessary condition for the existence of a unique minimum

(p

≥

n) is automatically fulfilled.

(24)

If the problem is time dependent

I

Observations are distributed in time:

y

=

y(t).

I

The observation cost function becomes:

J

_o

(x) = 1 2

N

X

i=0

kH_i

(x(t

_i

))

−y(t_i

)k

²_o

I

There is a model describing the evolution of

x:

d

x

dt = M (x) with

x(t

= 0) =

x0

. Then J is often no longer minimized w.r.t.

x, but w.r.t. x₀

only, or to some other parameters.

J

o

(x

0

) = 1 2

N

X

i=0

kH_i

(x(t

i

))−y(t

_i

)k

²_o

= 1 2

N

X

i=0

kH_i

(M

0→ti(x0))−y(t_i

)k

²_o

(25)

If the problem is time dependent

I

Observations are distributed in time:

y

=

y(t).

I

The observation cost function becomes:

J

_o

(x) = 1 2

N

X

i=0

kH_i

(x(t

_i

))

−y(t_i

)k

²_o

I

There is a model describing the evolution of

x:

d

x

dt = M (x) with

x(t

= 0) =

x0

. Then J is often no longer minimized w.r.t.

x, but w.r.t. x₀

only, or to some other parameters.

J

o

(x

0

) = 1 2

N

X

i=0

kH_i

(x(t

i

))−y(t

_i

)k

²_o

= 1 2

N

X

i=0

kH_i

(M

0→ti(x0))−y(t_i

)k

²_o

(26)

If the problem is time dependent

J(x

0

) = 1

2

kx₀−x^b₀k²_b

| {z }

background term

J_b

+ 1 2

N

X

i=0

kH_i

(x(t

i

))

−y(ti

)k

²_o

| {z }

observation term

Jo

(27)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0

kHi(M0→t_i(x0))−y(ti)k²_o

I IfH andM are linearthen J_o is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined: p<n).

Example: let(x₁^t,x₂^t) = (1,1)and y=1.1 an observation of ¹₂(x1+x2).

Jo(x1,x2) = 1 2

x1+x2

2 −1.1 2

(28)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0

I IfH andM are linearthen J_o is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined: p<n).

Example: let(x₁^t,x₂^t) = (1,1)andy =1.1 an observation of ¹₂(x1+x2).

Jo(x1,x2) = 1 2

x1+x2

2 −1.1 2

(29)

Uniqueness of the minimum ?

J(x₀) =J_b(x₀) +J_o(x₀) =1

2kx0−x_bk²_b+1 2

N

X

i=0

kHi(M_0→t_i(x₀))−y(t_i)k²_o

I IfH andM are linearthen Jo is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined).

I AddingJb makes the problem of minimizingJ =Jo+Jbwell posed.

Example: let(x₁^t,x₂^t) = (1,1)andy =1.1 an observation of ¹₂(x1+x2). Let(x₁^b,x₂^b) = (0.9,1.05)

J(x1,x2) =1 2

x1+x2

2 −1.1 2

| {z }

J_o

+1 2

(x1−0.9)²+ (x2−1.05)²

| {z }

J_b

−→(x₁^∗,x₂^∗) = (0.94166...,1.09166...)

(30)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0

I IfH and/orM are nonlinearthenJo is no longer quadratic.

Example: the Lorenz system (1963)









 dx

dt =α(y−x) dy

dt =βx−y−xz dz

dt =−γz+xy

(31)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0









 dx

dt =α(y−x) dy

dt =βx−y−xz dz

dt =−γz+xy

(32)

http://www.chaos-math.org

(33)

Uniqueness of the minimum ?

J(x₀) =J_b(x₀) +J_o(x₀) =1

2kx0−x_bk²_b+1 2

N

X

i=0

kHi(M_0→t_i(x₀))−y(t_i)k²_o









 dx

dt =α(y−x) dy

dt =βx−y−xz dz

dt =−γz+xy

J_o(y₀) = 1 2

N

X

i=0

(x(t_i)−x_obs(t_i))² dt

(34)

Uniqueness of the minimum ?

J(x₀) =J_b(x₀) +J_o(x₀) =1

2kx0−x_bk²_b+1 2

N

X

i=0

kHi(M_0→t_i(x₀))−y(t_i)k²_o









 dx

dt =α(y−x) dy

dt =βx−y−xz dz

dt =−γz+xy

J_o(y₀) = 1 2

N

X

i=0

(x(t_i)−x_obs(t_i))² dt

(35)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0

kHi(M_0→t_i(x0))−y(ti)k²_o

I AddingJb makes it “more quadratic” (Jb is a regularization term), butJ =Jo+Jb may however have several (local) minima.

(36)

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0−xbk²_b+1 2

N

X

i=0

kHi(M_0→t_i(x0))−y(ti)k²_o

I AddingJb makes it “more quadratic” (Jb is a regularization term), butJ =Jo+Jb may however have several (local) minima.

(37)

A fundamental remark before going into minimization aspects

Once J is defined (i.e. once all the ingredients are chosen: control variables, norms, observations. . . ), the problem is entirely defined.

Hence its solution.

The “physical” (i.e. the most important) part of data assimilation lies in the definition of J.

The rest of the job, i.e. minimizing J, is “only” technical work.

(38)

Definition and minimization of the cost function Linear (time independent) problems

Outline

Introduction: model problem

Least squares problems

Linear (time independent) problems

The adjoint method

(39)

Reminder: norms and scalar products

u=





 u1

... u_n





 ∈IRⁿ

Euclidian norm: kuk²=u^Tu=

n

X

i=1

u_i²

Associated scalar product: (u,v) =u^Tv=

n

X

i=1

uivi

Generalized norm: letMa symmetric positive definite matrix M-norm: kuk²_M=u^TM u=

n

X

i=1 n

X

j=1

m_iju_iu_j

Associated scalar product: (u,v)M=u^TM v=

n

X

i=1 n

X

j=1

mijuivj

(40)

Reminder: norms and scalar products

u: Ω⊂IRⁿ −→IR

x −→u(x) u∈L²(Ω)

Euclidian (orL²) norm: kuk²= Z

Ω

u²(x)dx Associated scalar product: (u,v) =

Z

Ω

u(x)v(x)dx

(41)

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension)

Directional (or Gˆateaux) derivativeoff at pointx∈E in direction d∈E:

∂f

∂d(x) = ˆf[x](d) = lim

α→0

f(x+αd)−f(x) α

Example: partial derivatives ∂f

∂xi

are directional derivatives in the direction of the members of the canonical basis (d=ei)

(42)

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension) Gradient (or Fr´echet derivative): E being an Hilbert space,f is

Fr´echet differentiable at pointx ∈E iff

∃p∈E such that f(x+h) =f(x) + (p,h) +o(khk) ∀h∈E p is thederivativeorgradientoff at pointx, denoted f⁰(x)or

∇f(x).

h→(p(x),h)is a linearfunction, calleddifferential functionor tangent linear functionorJacobianoff at pointx

Important (obvious) relationship: ∂f

∂d(x) = (∇f(x),d)

(43)

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension) Gradient (or Fr´echet derivative): E being an Hilbert space,f is

Fr´echet differentiable at pointx ∈E iff

∃p∈E such that f(x+h) =f(x) + (p,h) +o(khk) ∀h∈E p is thederivativeorgradientoff at pointx, denoted f⁰(x)or

∇f(x).

h→(p(x),h)is a linearfunction, calleddifferential functionor tangent linear functionorJacobianoff at pointx

Important (obvious) relationship: ∂f

∂d(x) = (∇f(x),d)

(44)

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverse

Let

M

a p

×

n matrix, with rank n, and

b∈IR^p

.

(hence p≥n)

Let J(x) =

kMx−bk²

= (Mx

−b)^T

(Mx

−b).

J is minimum for ˆ

x

=

M⁺b

, where

M⁺

= (M

^TM)⁻¹M^T

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized norm

Let

N

a p

×

p symmetric definite positive matrix.

Let J

1

(x) =

kMx−bk²_N

= (Mx

−b)^TN

(Mx

−b).

J

1

is minimum for ˆ

x

= (M

^TNM)⁻¹M^TN b.

(45)

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverse

Let

M

a p

×

n matrix, with rank n, and

b∈IR^p

.

(hence p≥n)

Let J(x) =

kMx−bk²

= (Mx

−b)^T

(Mx

−b).

J is minimum for ˆ

x

=

M⁺b

, where

M⁺

= (M

^TM)⁻¹M^T

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized norm

Let

N

a p

×

p symmetric definite positive matrix.

Let J

1

(x) =

kMx−bk²_N

= (Mx

−b)^TN

(Mx

−b).

J

1

is minimum for ˆ

x

= (M

^TNM)⁻¹M^TN b.

(46)

Link with data assimilation

This gives the solution to the problem min

x∈IRⁿ

J

_o

(x) = 1

2

kHx−yk²_o

in the case of a linear observation operator

H.

J

_o

(x) = 1

2 (Hx−y)

^TR⁻¹

(Hx−y)

−→

ˆ

x

= (H

^TR⁻¹H)⁻¹H^TR⁻¹y

(47)

Link with data assimilation

Similarly:

J(x) = J

_b

(x) + J

_o

(x)

= 1

2

kx−x_bk²_b

+ 1

2

kH(x)−yk²_o

= 1

2 (x

−x_b

)

^TB⁻¹

(x

−x_b

) + 1

2 (Hx

−y)^TR⁻¹

(Hx

−y)

= (Mx

−b)^TN

(Mx

−b) =kMx−bk²_N

with

M

=

I_n

H

b

=

x_b

y

N

=

B⁻¹ 0 0 R⁻¹

which leads to ˆ

x

=

x_b

+ (B

⁻¹

+

H^TR⁻¹H)⁻¹H^TR⁻¹

| {z }

gain matrix

(y

−Hx_b

)

| {z }

innovation vector

Remark: The gain matrix also readsBH^T(HBH^T+R)⁻¹

(Sherman-Morrison-Woodbury formula)

(48)

Link with data assimilation

Similarly:

J(x) = J

_b

(x) + J

_o

(x)

= 1

2

kx−x_bk²_b

+ 1

2

kH(x)−yk²_o

= 1

2 (x

−x_b

)

^TB⁻¹

(x

−x_b

) + 1

2 (Hx

−y)^TR⁻¹

(Hx

−y)

= (Mx

−b)^TN

(Mx

−b) =kMx−bk²_N

with

M

=

I_n

H

b

=

x_b

y

N

=

B⁻¹ 0 0 R⁻¹

which leads to ˆ

x

=

x_b

+ (B

⁻¹

+

H^TR⁻¹H)⁻¹H^TR⁻¹

| {z }

gain matrix

(y

−Hx_b

)

| {z }

innovation vector

Remark: The gain matrix also readsBH^T(HBH^T+R)⁻¹

(Sherman-Morrison-Woodbury formula)

(49)

Link with data assimilation

Remark

Hess(J)

| {z }

convexity

=

B⁻¹

+

H^TR⁻¹H

= [Cov (ˆ

x)]⁻¹

| {z }

accuracy

(cf BLUE)

(50)

Remark

Given the size of n and p, it is generally impossible to handle explicitly

H,B

and

R. So the direct computation of the gain

matrix is impossible.

even in the linear case (for which we have an explicit expression

for

ˆx), the computation ofˆx

is performed using an optimization

algorithm.

(51)

The adjoint method

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(52)

The adjoint method Rationale

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(53)

Descent methods

Descent methods for minimizing the cost function require the knowledge of (an estimate of) its gradient.

xk+1=xk+αkdk

withdk =











−∇J(xk) gradient method

−[Hess(J)(xk)]⁻¹∇J(xk) Newton method

−Bk∇J(xk) quasi-Newton methods (BFGS, . . . )

−∇J(xk) +_k∇J(x^k∇J(x^k^)k²

k−1)k²dk−1 conjugate gradient

... ...

(54)

The computation of∇J(xk)may be difficult if the dependency of J with regard to the control variablexis not direct.

Example:

I u(x)solution of an ODE

I K a coefficient of this ODE

I u^obs(x)an observation ofu(x)

I J(K) =1

2ku(x)−u^obs(x)k²

Jˆ[K](k) = (∇J(K),k) =<ˆu,u−u^obs > withuˆ= ∂u

∂k(K) = lim

α→0

uK+αk−uK

α

(55)

The computation of∇J(xk)may be difficult if the dependency of J with regard to the control variablexis not direct.

Example:

I u(x)solution of an ODE

I K a coefficient of this ODE

I u^obs(x)an observation ofu(x)

I J(K) =1

2ku(x)−u^obs(x)k²

Jˆ[K](k) = (∇J(K),k) =<u,ˆ u−u^obs >

withuˆ= ∂u

∂k(K) = lim

α→0

uK+αk−uK

α

(56)

It is often difficult (or even impossible) to obtain the gradient through the computation of growth rates.

Example:

( dx(t))

dt =M(x(t)) t ∈[0,T] x(t =0) =u

withu=





 u1

... uN







J(u) =1 2

Z T 0

kx(t)−x^obs(t)k² −→ requires one model run

∇J(u) =







∂J

∂u₁(u) ...

∂J

∂uN

(u)





 '







[J(u+αe₁)−J(u)]/α ...

[J(u+αeN)−J(u)]/α







−→N+1 model runs

(57)

In most actual applications,N= [u] is large (or even very large: e.g.

N=O(10⁸−10⁹)in meteorology) −→ this method cannot be used.

Alternatively, the adjoint method provides a very efficient way to compute∇J.

On the contrary, do not forget that, if the size of the control variable is very small (<10−20), ∇J can be easily estimated by the computation of growth rates.

(58)

In most actual applications,N= [u] is large (or even very large: e.g.

N=O(10⁸−10⁹)in meteorology) −→ this method cannot be used.

Alternatively, the adjoint method provides a very efficient way to compute∇J.

On the contrary, do not forget that, if the size of the control variable is very small (<10−20), ∇J can be easily estimated by the computation of growth rates.

(59)

Reminder: adjoint operator

General definition:

LetX andY two prehilbertian spaces (i.e. vector spaces with scalar products).

LetA:X −→ Y an operator.

The adjoint operatorA^∗:Y −→ X is defined by:

∀x∈ X,∀y ∈ Y, <Ax,y >Y=<x,A^∗y>X

In the case whereX andY are Hilbert spaces andAis linear, then A^∗ always exists (and is unique).

Adjoint operator in finite dimension:

A:IRⁿ−→IR^m a linear operator (i.e. a matrix). Then its adjoint operatorA^∗ (w.r. to Euclidian norms) isA^T.

(60)

Reminder: adjoint operator

General definition:

LetX andY two prehilbertian spaces (i.e. vector spaces with scalar products).

LetA:X −→ Y an operator.

The adjoint operatorA^∗:Y −→ X is defined by:

∀x∈ X,∀y ∈ Y, <Ax,y >Y=<x,A^∗y>X

In the case whereX andY are Hilbert spaces andAis linear, then A^∗ always exists (and is unique).

Adjoint operator in finite dimension:

A:IRⁿ−→IR^m a linear operator (i.e. a matrix). Then its adjoint operatorA^∗ (w.r. to Euclidian norms) isA^T.

(61)

The adjoint method A simple example

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(62)

The continuous case

The assimilation problem

I

−u⁰⁰(x) +c(x)u⁰(x) =f(x) x ∈]0,1[

u(0) =u(1) =0 f ∈L²(]0,1[)

I c(x)isunknown

I u^obs(x)anobservationofu(x)

I Cost function: J(c) = 1 2

Z 1 0

u(x)−u^obs(x)2

dx

∇J →Gˆateaux-derivative: ˆJ[c](δc) =<∇J(c), δc>

ˆJ[c](δc) = Z 1

0

ˆ u(x)

u(x)−u^obs(x)

dx withˆu= lim

α→0

uc+αδc−uc

α

What is the equation satisfied byˆu?

(63)

The continuous case

The assimilation problem

I

−u⁰⁰(x) +c(x)u⁰(x) =f(x) x ∈]0,1[

u(0) =u(1) =0 f ∈L²(]0,1[)

I c(x)isunknown

I u^obs(x)anobservationofu(x)

I Cost function: J(c) = 1 2

Z 1 0

u(x)−u^obs(x)2

dx

∇J →Gˆateaux-derivative: ˆJ[c](δc) =<∇J(c), δc>

ˆJ[c](δc) = Z 1

0

ˆ u(x)

u(x)−u^obs(x)

dx withˆu= lim

α→0

uc+αδc−uc

α

What is the equation satisfied byuˆ ?