• Aucun résultat trouvé

Variational approach to data assimilation: optimization aspects and adjoint method

N/A
N/A
Protected

Academic year: 2022

Partager "Variational approach to data assimilation: optimization aspects and adjoint method"

Copied!
102
0
0

Texte intégral

(1)

Variational approach to data assimilation: optimization aspects and adjoint method

Eric Blayo

University Grenoble Alpes and INRIA

(2)

Objectives

I

introduce data assimilation as an optimization problem

I

discuss the different forms of the objective functions

I

discuss their properties w.r.t. optimization

I

introduce the adjoint technique for the computation of the gradient

Link with statistical methods: cf lectures by E. Cosme

Variational data assimilation algorithms, tangent and adjoint

codes: cf lectures by M. Nodet and A. Vidard

(3)

Introduction: model problem

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

(4)

Introduction: model problem

Model problem

Two different available measurements of a single quantity. Which estimation of its true value ?

−→

least squares approach

Example

2 obs y

1

= 19

C and y

2

= 21

C of the (unknown) present temperature x .

I

Let J(x ) =

12

(x

y

1

)

2

+ (x

y

2

)

2

I

Min

x

J (x)

−→

ˆ x = y

1

+ y

2

2 = 20

C

(5)

Introduction: model problem

Model problem

Two different available measurements of a single quantity. Which estimation of its true value ?

−→

least squares approach

Example

2 obs y

1

= 19

C and y

2

= 21

C of the (unknown) present temperature x.

I

Let J(x ) =

12

(x

y

1

)

2

+ (x

y

2

)

2

I

Min

x

J (x)

−→

ˆ x = y

1

+ y

2

2 = 20

C

(6)

Introduction: model problem

Model problem

Observation operator If6=units: y1=66.2F andy2=69.8F

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y1)2+ (H(x)−y2)2

I Minx J(x) −→ˆx=20C

Drawback # 1: if observation units are inhomogeneous y1=66.2F andy2=21C

I J(x) =1 2

(H(x)−y1)2+ (x−y2)2

−→xˆ=19.47C !! Drawback # 2: if observation accuracies are inhomogeneous Ify1is twice more accurate than y2, one should obtainˆx= 2y1+y2

3 =19.67C

−→J should be J(x) =1 2

"xy1 1/2

2

+

xy2 1

2#

(7)

Introduction: model problem

Model problem

Observation operator If6=units: y1=66.2F andy2=69.8F

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y1)2+ (H(x)−y2)2

I Minx J(x) −→ˆx=20C

Drawback # 1: if observation units are inhomogeneous y1=66.2F andy2=21C

I J(x) =1 2

(H(x)−y1)2+ (x−y2)2

−→xˆ=19.47C !!

Drawback # 2: if observation accuracies are inhomogeneous Ify1is twice more accurate than y2, one should obtainˆx= 2y1+y2

3 =19.67C

−→J should be J(x) =1 2

"xy1 1/2

2

+

xy2 1

2#

(8)

Introduction: model problem

Model problem

Observation operator If6=units: y1=66.2F andy2=69.8F

I LetH(x) = 9 5x+32

I LetJ(x) = 1 2

(H(x)−y1)2+ (H(x)−y2)2

I Minx J(x) −→ˆx=20C

Drawback # 1: if observation units are inhomogeneous y1=66.2F andy2=21C

I J(x) =1 2

(H(x)−y1)2+ (x−y2)2

−→xˆ=19.47C !!

Drawback # 2: if observation accuracies are inhomogeneous Ify1is twice more accurate than y2, one should obtainˆx= 2y1+y2

3 =19.67C

−→J should be J(x) =1 2

"xy1 1/2

2

+

xy2 1

2#

(9)

Introduction: model problem

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

y

1

)

2

σ12

+ (H

2

(x)

y

2

)

2 σ22

If

H1 =H2=Id:

J(x) = 1 2

(x

y

1

)

2 σ12

+ 1

2

(x

y

2

)

2 σ22

which leads to ˆ x = 1

σ12

y

1

+ 1

σ22

y

2

1

σ12

+ 1

σ22

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ21

+ 1

σ22

= [Var (ˆ x)]

−1

| {z }

accuracy

(cf BLUE)

(10)

Introduction: model problem

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

y

1

)

2

σ12

+ (H

2

(x)

y

2

)

2 σ22

If

H1 =H2=Id:

J(x) = 1 2

(x

y

1

)

2 σ12

+ 1

2

(x

y

2

)

2 σ22

which leads to ˆ x = 1

σ12

y

1

+ 1

σ22

y

2

1

σ12

+ 1

σ22

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ21

+ 1

σ22

= [Var (ˆ x)]

−1

| {z }

accuracy

(cf BLUE)

(11)

Introduction: model problem

Model problem

General form

Minimize J (x) = 1 2

(H

1

(x )

y

1

)

2

σ12

+ (H

2

(x)

y

2

)

2 σ22

If

H1 =H2=Id:

J(x) = 1 2

(x

y

1

)

2 σ12

+ 1

2

(x

y

2

)

2 σ22

which leads to ˆ x = 1

σ12

y

1

+ 1

σ22

y

2

1

σ12

+ 1

σ22

(weighted average)

Remark: J”(ˆ x)

| {z }

convexity

= 1

σ21

+ 1

σ22

= [Var (ˆ x)]

−1

| {z }

accuracy

(cf BLUE)

(12)

Introduction: model problem

Model problem

Alternative formulation:

background + observation If one considers that y

1

is a prior (or background) estimate x

b

for x, and y

2

= y is an independent observation, then:

J(x) = 1 2

(x

x

b

)

2 σ2b

| {z } Jb

+ 1 2

(x

y)

2 σ2o

| {z } Jo

and

ˆ x =

1

σb2

x

b

+ 1

σo2

y 1

σ2b

+ 1

σo2

= x

b

+

σb2 σ2b

+

σ2o

| {z }

gain

(y

x

b

)

| {z }

innovation

(13)

Definition and minimization of the cost function

Outline

Introduction: model problem

Definition and minimization of the cost function

Least squares problems

Linear (time independent) problems

The adjoint method

(14)

Definition and minimization of the cost function Least squares problems

Outline

Introduction: model problem

Definition and minimization of the cost function

Least squares problems

Linear (time independent) problems

The adjoint method

(15)

Definition and minimization of the cost function Least squares problems

Generalization: arbitrary number of unknowns and observations

To be estimated: x=

x1

... xn

 ∈IRn

Observations: y=

y1

... yp

 ∈IRp

Observation operator: yH(x), withH:IRn−→IRp

(16)

Definition and minimization of the cost function Least squares problems

Generalization: arbitrary number of unknowns and observations

A simple example of observation operator

If x=

x1

x2

x3

x4

and y=

an observation ofx1+x22 an observation ofx4

then H(x) =Hx withH=

 1 2

1

2 0 0

0 0 0 1

(17)

Definition and minimization of the cost function Least squares problems

Generalization: arbitrary number of unknowns and observations

To be estimated: x=

x1

... xn

 ∈IRn

Observations: y=

y1

... yp

 ∈IRp

Observation operator: yH(x), withH:IRn−→IRp Cost function: J(x) = 1

2kH(x)−yk2 withk.kto be chosen.

(18)

Definition and minimization of the cost function Least squares problems

Reminder: norms and scalar products

u=

u1

... un

 ∈IRn

Euclidian norm: kuk2=uTu=

n

X

i=1

ui2

Associated scalar product: (u,v) =uTv=

n

X

i=1

uivi

Generalized norm: letMa symmetric positive definite matrix M-norm: kuk2M=uTM u=

n

X

i=1 n

X

j=1

mijuiuj

Associated scalar product: (u,v)M=uTM v=

n

X

i=1 n

X

j=1

mijuivj

(19)

Definition and minimization of the cost function Least squares problems

Generalization: arbitrary number of unknowns and observations To be estimated:

x

=

x

1

.. . x

n

 ∈IRn

Observations:

y

=

y

1

.. . y

p

 ∈IRp

Observation operator:

y

H(x), with H :

IRn−→IRp

Cost function: J(x) = 1

2

kH(x)−yk2

with

k.k

to be chosen.

(Intuitive) necessary (but not sufficient) condition for the existence of a unique minimum:

p

n

(20)

Definition and minimization of the cost function Least squares problems

Generalization: arbitrary number of unknowns and observations To be estimated:

x

=

x

1

.. . x

n

 ∈IRn

Observations:

y

=

y

1

.. . y

p

 ∈IRp

Observation operator:

y

H(x), with H :

IRn−→IRp

Cost function: J(x) = 1

2

kH(x)−yk2

with

k.k

to be chosen.

(Intuitive) necessary (but not sufficient) condition for the existence of a unique minimum:

p

n

(21)

Definition and minimization of the cost function Least squares problems

Formalism “background value + new observations”

Y

=

xb

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−xbk2b

| {z } Jb

+ 1

2

kH(x)−yk2o

| {z } Jo

= (x

xb

)

TB−1

(x

xb

) + (H(x)

y)TR−1

(H(x)

y)

The necessary condition for the existence of a unique minimum

(p

n) is automatically fulfilled.

(22)

Definition and minimization of the cost function Least squares problems

Formalism “background value + new observations”

Y

=

xb

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−xbk2b

| {z } Jb

+ 1

2

kH(x)−yk2o

| {z } Jo

= (x

xb

)

TB−1

(x

xb

) + (H(x)

y)TR−1

(H(x)

y)

The necessary condition for the existence of a unique minimum

(p

n) is automatically fulfilled.

(23)

Definition and minimization of the cost function Least squares problems

Formalism “background value + new observations”

Y

=

xb

y

←−

background

←−

new obs

The cost function becomes:

J(x) = 1

2

kx−xbk2b

| {z } Jb

+ 1

2

kH(x)−yk2o

| {z } Jo

= (x

xb

)

TB−1

(x

xb

) + (H(x)

y)TR−1

(H(x)

y)

The necessary condition for the existence of a unique minimum

(p

n) is automatically fulfilled.

(24)

Definition and minimization of the cost function Least squares problems

If the problem is time dependent

I

Observations are distributed in time:

y

=

y(t).

I

The observation cost function becomes:

J

o

(x) = 1 2

N

X

i=0

kHi

(x(t

i

))

y(ti

)k

2o

I

There is a model describing the evolution of

x:

d

x

dt = M (x) with

x(t

= 0) =

x0

. Then J is often no longer minimized w.r.t.

x, but w.r.t. x0

only, or to some other parameters.

J

o

(x

0

) = 1 2

N

X

i=0

kHi

(x(t

i

))−y(t

i

)k

2o

= 1 2

N

X

i=0

kHi

(M

0→ti(x0))−y(ti

)k

2o

(25)

Definition and minimization of the cost function Least squares problems

If the problem is time dependent

I

Observations are distributed in time:

y

=

y(t).

I

The observation cost function becomes:

J

o

(x) = 1 2

N

X

i=0

kHi

(x(t

i

))

y(ti

)k

2o

I

There is a model describing the evolution of

x:

d

x

dt = M (x) with

x(t

= 0) =

x0

. Then J is often no longer minimized w.r.t.

x, but w.r.t. x0

only, or to some other parameters.

J

o

(x

0

) = 1 2

N

X

i=0

kHi

(x(t

i

))−y(t

i

)k

2o

= 1 2

N

X

i=0

kHi

(M

0→ti(x0))−y(ti

)k

2o

(26)

Definition and minimization of the cost function Least squares problems

If the problem is time dependent

J(x

0

) = 1

2

kx0xb0k2b

| {z }

background term

Jb

+ 1 2

N

X

i=0

kHi

(x(t

i

))

y(ti

)k

2o

| {z }

observation term

Jo

(27)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH andM are linearthen Jo is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined: p<n).

Example: let(x1t,x2t) = (1,1)and y=1.1 an observa- tion of 12(x1+x2).

Jo(x1,x2) = 1 2

x1+x2

2 1.1 2

(28)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH andM are linearthen Jo is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined: p<n).

Example: let(x1t,x2t) = (1,1)andy =1.1 an observa- tion of 12(x1+x2).

Jo(x1,x2) = 1 2

x1+x2

2 1.1 2

(29)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH andM are linearthen Jo is quadratic.

I However it generally does not have a unique minimum, since the number of observations is generally less than the size ofx0 (the problem is underdetermined).

I AddingJb makes the problem of minimizingJ =Jo+Jbwell posed.

Example: let(x1t,x2t) = (1,1)andy =1.1 an observa- tion of 12(x1+x2). Let(x1b,x2b) = (0.9,1.05)

J(x1,x2) =1 2

x1+x2

2 1.1 2

| {z }

Jo

+1 2

(x10.9)2+ (x21.05)2

| {z }

Jb

−→(x1,x2) = (0.94166...,1.09166...)

(30)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

Example: the Lorenz system (1963)













 dx

dt =α(y−x) dy

dt =βx−yxz dz

dt =−γz+xy

(31)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

Example: the Lorenz system (1963)













 dx

dt =α(y−x) dy

dt =βx−yxz dz

dt =−γz+xy

(32)

Definition and minimization of the cost function Least squares problems

http://www.chaos-math.org

(33)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

Example: the Lorenz system (1963)













 dx

dt =α(y−x) dy

dt =βx−yxz dz

dt =−γz+xy

Jo(y0) = 1 2

N

X

i=0

(x(ti)−xobs(ti))2 dt

(34)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

Example: the Lorenz system (1963)













 dx

dt =α(y−x) dy

dt =βx−yxz dz

dt =−γz+xy

Jo(y0) = 1 2

N

X

i=0

(x(ti)−xobs(ti))2 dt

(35)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

I AddingJb makes it “more quadratic” (Jb is a regularization term), butJ =Jo+Jb may however have several (local) minima.

(36)

Definition and minimization of the cost function Least squares problems

Uniqueness of the minimum ?

J(x0) =Jb(x0) +Jo(x0) =1

2kx0xbk2b+1 2

N

X

i=0

kHi(M0→ti(x0))−y(ti)k2o

I IfH and/orM are nonlinearthenJo is no longer quadratic.

I AddingJb makes it “more quadratic” (Jb is a regularization term), butJ =Jo+Jb may however have several (local) minima.

(37)

Definition and minimization of the cost function Least squares problems

A fundamental remark before going into minimization aspects

Once J is defined (i.e. once all the ingredients are chosen: control variables, norms, observations. . . ), the problem is entirely defined.

Hence its solution.

The “physical” (i.e. the most important) part of data assimilation lies in the definition of J.

The rest of the job, i.e. minimizing J, is “only” technical work.

(38)

Definition and minimization of the cost function Linear (time independent) problems

Outline

Introduction: model problem

Definition and minimization of the cost function

Least squares problems

Linear (time independent) problems

The adjoint method

(39)

Definition and minimization of the cost function Linear (time independent) problems

Reminder: norms and scalar products

u=

u1

... un

 ∈IRn

Euclidian norm: kuk2=uTu=

n

X

i=1

ui2

Associated scalar product: (u,v) =uTv=

n

X

i=1

uivi

Generalized norm: letMa symmetric positive definite matrix M-norm: kuk2M=uTM u=

n

X

i=1 n

X

j=1

mijuiuj

Associated scalar product: (u,v)M=uTM v=

n

X

i=1 n

X

j=1

mijuivj

(40)

Definition and minimization of the cost function Linear (time independent) problems

Reminder: norms and scalar products

u: Ω⊂IRn −→IR

x −→u(x) uL2(Ω)

Euclidian (orL2) norm: kuk2= Z

u2(x)dx Associated scalar product: (u,v) =

Z

u(x)v(x)dx

(41)

Definition and minimization of the cost function Linear (time independent) problems

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension)

Directional (or Gˆateaux) derivativeoff at pointxE in direction dE:

∂f

∂d(x) = ˆf[x](d) = lim

α→0

f(x+αd)−f(x) α

Example: partial derivatives ∂f

∂xi

are directional derivatives in the direction of the members of the canonical basis (d=ei)

(42)

Definition and minimization of the cost function Linear (time independent) problems

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension) Gradient (or Fr´echet derivative): E being an Hilbert space,f is

Fr´echet differentiable at pointxE iff

∃p∈E such that f(x+h) =f(x) + (p,h) +o(khk) ∀h∈E p is thederivativeorgradientoff at pointx, denoted f0(x)or

∇f(x).

h→(p(x),h)is a linearfunction, calleddifferential functionor tangent linear functionorJacobianoff at pointx

Important (obvious) relationship: ∂f

∂d(x) = (∇f(x),d)

(43)

Definition and minimization of the cost function Linear (time independent) problems

Reminder: derivatives and gradients

f :E −→IR (E being of finite or infinite dimension) Gradient (or Fr´echet derivative): E being an Hilbert space,f is

Fr´echet differentiable at pointxE iff

∃p∈E such that f(x+h) =f(x) + (p,h) +o(khk) ∀h∈E p is thederivativeorgradientoff at pointx, denoted f0(x)or

∇f(x).

h→(p(x),h)is a linearfunction, calleddifferential functionor tangent linear functionorJacobianoff at pointx

Important (obvious) relationship: ∂f

∂d(x) = (∇f(x),d)

(44)

Definition and minimization of the cost function Linear (time independent) problems

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverse

Let

M

a p

×

n matrix, with rank n, and

bIRp

.

(hence pn)

Let J(x) =

kMx−bk2

= (Mx

b)T

(Mx

b).

J is minimum for ˆ

x

=

M+b

, where

M+

= (M

TM)−1MT

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized norm

Let

N

a p

×

p symmetric definite positive matrix.

Let J

1

(x) =

kMx−bk2N

= (Mx

b)TN

(Mx

b).

J

1

is minimum for ˆ

x

= (M

TNM)−1MTN b.

(45)

Definition and minimization of the cost function Linear (time independent) problems

Minimum of a quadratic function in finite dimension

Theorem: Generalized (or Moore-Penrose) inverse

Let

M

a p

×

n matrix, with rank n, and

bIRp

.

(hence pn)

Let J(x) =

kMx−bk2

= (Mx

b)T

(Mx

b).

J is minimum for ˆ

x

=

M+b

, where

M+

= (M

TM)−1MT

(generalized, or Moore-Penrose, inverse).

Corollary: with a generalized norm

Let

N

a p

×

p symmetric definite positive matrix.

Let J

1

(x) =

kMx−bk2N

= (Mx

b)TN

(Mx

b).

J

1

is minimum for ˆ

x

= (M

TNM)−1MTN b.

(46)

Definition and minimization of the cost function Linear (time independent) problems

Link with data assimilation

This gives the solution to the problem min

x∈IRn

J

o

(x) = 1

2

kHx−yk2o

in the case of a linear observation operator

H.

J

o

(x) = 1

2 (Hx−y)

TR−1

(Hx−y)

−→

ˆ

x

= (H

TR−1H)−1HTR−1y

(47)

Definition and minimization of the cost function Linear (time independent) problems

Link with data assimilation

Similarly:

J(x) = J

b

(x) + J

o

(x)

= 1

2

kx−xbk2b

+ 1

2

kH(x)−yk2o

= 1

2 (x

xb

)

TB−1

(x

xb

) + 1

2 (Hx

y)TR−1

(Hx

y)

= (Mx

b)TN

(Mx

b) =kMx−bk2N

with

M

=

In

H

b

=

xb

y

N

=

B−1 0 0 R−1

which leads to ˆ

x

=

xb

+ (B

−1

+

HTR−1H)−1HTR−1

| {z }

gain matrix

(y

Hxb

)

| {z }

innovation vector

Remark: The gain matrix also readsBHT(HBHT+R)−1

(Sherman-Morrison-Woodbury formula)

(48)

Definition and minimization of the cost function Linear (time independent) problems

Link with data assimilation

Similarly:

J(x) = J

b

(x) + J

o

(x)

= 1

2

kx−xbk2b

+ 1

2

kH(x)−yk2o

= 1

2 (x

xb

)

TB−1

(x

xb

) + 1

2 (Hx

y)TR−1

(Hx

y)

= (Mx

b)TN

(Mx

b) =kMx−bk2N

with

M

=

In

H

b

=

xb

y

N

=

B−1 0 0 R−1

which leads to ˆ

x

=

xb

+ (B

−1

+

HTR−1H)−1HTR−1

| {z }

gain matrix

(y

Hxb

)

| {z }

innovation vector

Remark: The gain matrix also readsBHT(HBHT+R)−1

(Sherman-Morrison-Woodbury formula)

(49)

Definition and minimization of the cost function Linear (time independent) problems

Link with data assimilation

Remark

Hess(J)

| {z }

convexity

=

B−1

+

HTR−1H

= [Cov (ˆ

x)]−1

| {z }

accuracy

(cf BLUE)

(50)

Definition and minimization of the cost function Linear (time independent) problems

Remark

Given the size of n and p, it is generally impossible to handle explicitly

H,B

and

R. So the direct computation of the gain

matrix is impossible.

even in the linear case (for which we have an explicit expression

for

ˆx), the computation ofˆx

is performed using an optimization

algorithm.

(51)

The adjoint method

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(52)

The adjoint method Rationale

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(53)

The adjoint method Rationale

Descent methods

Descent methods for minimizing the cost function require the knowledge of (an estimate of) its gradient.

xk+1=xkkdk

withdk =













−∇J(xk) gradient method

−[Hess(J)(xk)]−1∇J(xk) Newton method

−Bk∇J(xk) quasi-Newton methods (BFGS, . . . )

−∇J(xk) +k∇J(xk∇J(xk)k2

k−1)k2dk−1 conjugate gradient

... ...

(54)

The adjoint method Rationale

The computation of∇J(xk)may be difficult if the dependency of J with regard to the control variablexis not direct.

Example:

I u(x)solution of an ODE

I K a coefficient of this ODE

I uobs(x)an observation ofu(x)

I J(K) =1

2ku(x)−uobs(x)k2

Jˆ[K](k) = (∇J(K),k) =<ˆu,uuobs > withuˆ= ∂u

∂k(K) = lim

α→0

uK+αkuK

α

(55)

The adjoint method Rationale

The computation of∇J(xk)may be difficult if the dependency of J with regard to the control variablexis not direct.

Example:

I u(x)solution of an ODE

I K a coefficient of this ODE

I uobs(x)an observation ofu(x)

I J(K) =1

2ku(x)−uobs(x)k2

Jˆ[K](k) = (∇J(K),k) =<u,ˆ uuobs >

withuˆ= ∂u

∂k(K) = lim

α→0

uK+αkuK

α

(56)

The adjoint method Rationale

It is often difficult (or even impossible) to obtain the gradient through the computation of growth rates.

Example:

( dx(t))

dt =M(x(t)) t ∈[0,T] x(t =0) =u

withu=

u1

... uN

J(u) =1 2

Z T 0

kx(t)−xobs(t)k2 −→ requires one model run

∇J(u) =

∂J

∂u1(u) ...

∂J

∂uN

(u)

 '

[J(u+αe1)−J(u)]/α ...

[J(u+αeN)−J(u)]

−→N+1 model runs

(57)

The adjoint method Rationale

In most actual applications,N= [u] is large (or even very large: e.g.

N=O(108−109)in meteorology) −→ this method cannot be used.

Alternatively, the adjoint method provides a very efficient way to compute∇J.

On the contrary, do not forget that, if the size of the control variable is very small (<10−20), ∇J can be easily estimated by the computation of growth rates.

(58)

The adjoint method Rationale

In most actual applications,N= [u] is large (or even very large: e.g.

N=O(108−109)in meteorology) −→ this method cannot be used.

Alternatively, the adjoint method provides a very efficient way to compute∇J.

On the contrary, do not forget that, if the size of the control variable is very small (<10−20), ∇J can be easily estimated by the computation of growth rates.

(59)

The adjoint method Rationale

Reminder: adjoint operator

General definition:

LetX andY two prehilbertian spaces (i.e. vector spaces with scalar products).

LetA:X −→ Y an operator.

The adjoint operatorA:Y −→ X is defined by:

∀x∈ X,∀y ∈ Y, <Ax,y >Y=<x,Ay>X

In the case whereX andY are Hilbert spaces andAis linear, then A always exists (and is unique).

Adjoint operator in finite dimension:

A:IRn−→IRm a linear operator (i.e. a matrix). Then its adjoint operatorA (w.r. to Euclidian norms) isAT.

(60)

The adjoint method Rationale

Reminder: adjoint operator

General definition:

LetX andY two prehilbertian spaces (i.e. vector spaces with scalar products).

LetA:X −→ Y an operator.

The adjoint operatorA:Y −→ X is defined by:

∀x∈ X,∀y ∈ Y, <Ax,y >Y=<x,Ay>X

In the case whereX andY are Hilbert spaces andAis linear, then A always exists (and is unique).

Adjoint operator in finite dimension:

A:IRn−→IRm a linear operator (i.e. a matrix). Then its adjoint operatorA (w.r. to Euclidian norms) isAT.

(61)

The adjoint method A simple example

Outline

Introduction: model problem

Definition and minimization of the cost function

The adjoint method

Rationale

A simple example

A more complex (but still linear) example Control of the initial condition

The adjoint method as a constrained minimization

(62)

The adjoint method A simple example

The continuous case

The assimilation problem

I

−u00(x) +c(x)u0(x) =f(x) x ∈]0,1[

u(0) =u(1) =0 fL2(]0,1[)

I c(x)isunknown

I uobs(x)anobservationofu(x)

I Cost function: J(c) = 1 2

Z 1 0

u(x)uobs(x)2

dx

∇J →Gˆateaux-derivative: ˆJ[c](δc) =<∇J(c), δc>

ˆJ[c](δc) = Z 1

0

ˆ u(x)

u(x)uobs(x)

dx withˆu= lim

α→0

uc+αδcuc

α

What is the equation satisfied byˆu?

(63)

The adjoint method A simple example

The continuous case

The assimilation problem

I

−u00(x) +c(x)u0(x) =f(x) x ∈]0,1[

u(0) =u(1) =0 fL2(]0,1[)

I c(x)isunknown

I uobs(x)anobservationofu(x)

I Cost function: J(c) = 1 2

Z 1 0

u(x)uobs(x)2

dx

∇J →Gˆateaux-derivative: ˆJ[c](δc) =<∇J(c), δc>

ˆJ[c](δc) = Z 1

0

ˆ u(x)

u(x)uobs(x)

dx withˆu= lim

α→0

uc+αδcuc

α

What is the equation satisfied byuˆ ?

Références

Documents relatifs

Sur internet – Bibliographie – Méthodologie des espaces entre elles: court entre les lettres d’un mot, plus large entre les mots; la tentation d’une copie lettre à lettre retarde

Seven other samples were investigated for their living (stained) and dead (unstained) assemblages. Hierarchical cluster analysis allows the recognition of five benthic species

For the known Lorenz-63 ODE model, we report these energy pathways for fixed-step gradient descent of variational cost (4) (magenta dashed) and learned NN solvers using

It is no longer the case with the modified 3D-Var analysis (panel (b) in Fig. 4), as the structure function takes on the shape of the gradient field used as the sensitive

These methods focus respectively on correcting the amplitude of different sources ([Desroziers and Ivanov, 2001]), the evolved background matrix (associated to a prior

We will show the performance and effectiveness of the EnVar method on Kinect captured depth-range data combined with shallow water model as well as synthetic SST image combined

2 nd Workshop on Data Assimilation &amp; CFD Processing for Particle Image and Tracking Velocimetry December 13 to 14, 217, Delft, The Netherlands.. A new hybrid

A SR problem is covered by the linear direct model (30) with a system matrix A gathering the warping, blurring and down-sampling operators. In fact, the main concern of this section