Adaptive Observations And Multilevel Optimization In Data Assimilation

(1)

RESEARCH OUTPUTS / RÉSULTATS DE RECHERCHE

Author(s) - Auteur(s) :

Publication date - Date de publication :

Permanent link - Permalien :

Rights / License - Licence de droit d’auteur :

Bibliothèque Universitaire Moretus Plantin

Institutional Repository - Research Portal

Dépôt Institutionnel - Portail de la Recherche

researchportal.unamur.be

University of Namur

Adaptive Observations And Multilevel Optimization In Data Assimilation

Gratton, Serge; Rincon-Camacho, Monserrat; Toint, Ph

Publication date:

2013

Document Version

Early version, also known as pre-print Link to publication

Citation for pulished version (HARVARD):

Gratton, S, Rincon-Camacho, M & Toint, P 2013 'Adaptive Observations And Multilevel Optimization In Data Assimilation' Namur center for complex systems, Namur.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal ?

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

by S. Gratton, M. M. Rincon-Camacho and Ph. L. Toint Report NAXYS-06-2013 21 May 2013

ENSEEIHT, 2, rue Camichel, 31000 Toulouse, France

CERFACS, 42, avenue Gaspard Coriolis, 31057 Toulouse, France

University of Namur, 61, rue de Bruxelles, B5000 Namur (Belgium)

(3)

In Data Assimilation

Serge Gratton

∗

, M. Monserrat Rincon-Camacho

†

, and Philippe L. Toint

‡

June 17, 2013

Abstract

We propose to use a decomposition of large-scale incremental four dimensional (4D-Var) data assimilation problems in order to make their numerical solution more efficient. This decomposition is based on exploiting an adaptive hierarchy of the observations. Starting with a low-cardinality set and the solution of its corresponding optimization problem, observations are adaptively added based on a posteriori error estimates. The particular structure of the sequence of associated linear systems allows the use of a variant of the conjugate gradient algorithm which effectively exploits the fact that the number of observations is smaller than the size of the vector state in the 4D-Var model. The method proposed is justified by deriving the relevant error estimates at different levels of the hierarchy and a practical computational technique is then derived. The new algorithm is tested on a 1D-wave equation and on the Lorenz-96 system, the latter one being of special interest because of its similarity with Nu-merical Weather Prediction (NWP) systems.

Keywords: Data assimilation, adaptive observations, numerical algorithms, multilevel opti-mization, a posteriori errors.

1 Introduction

In data assimilation, a substantial body of research has been conducted in the field of adaptive observations. The main idea in this area is to adapt the set of observations used in the model calibration to improve computational efficiency while not sacrificing accuracy. Several techniques have been considered, depending on the practical context of the relevant application. In a Kalman filter framework, Leutbecher (2003b) introduces a Hessian reduced rank estimate whose aim is to predict expected change in the square norm of forecast error due to intermittent modifications of the observation network; its application to the Lorenz96 model (see Lorenz, 1995) is presented in Leutbecher (2003a). Lorenz and Emanuel (1998) present different strategies for weather forecast-ing which target the locations where the first-guess errors are largest, and verify if the includforecast-ing additional data improve the analyses and forecasts. Berliner, Lu and Snyder (1999) introduce rigorous mathematical criteria to select future data in a dynamical process, based on the uncer-tainty of the current analysis, the dynamics of error evolution, the form and errors of observations and data assimilation. In Hansen and Smith (2000) the performance of various observation tar-geting strategies is compared for both direct insertion and ensemble Kalman filter assimilation. The authors conclude that the optimal adaptive observation strategies depend on the combination of model, assimilation scheme and observational network. Daescu and Navon (2004) describe a method based on a periodic update of the adjoint sensitivity field which takes into account the

∗_{ENSEEIHT, 2, rue Camichel, 31000 Toulouse, France and CERFACS, 42, avenue Gaspard Coriolis, 31057}

Toulouse, France. ([email protected]).

†_{CERFACS, 42, avenue Gaspard Coriolis, 31057 Toulouse, France. ([email protected]).}

‡_{Namur Center for Complex Systems, University of Namur, 61, rue de Bruxelles, B-5000 Namur, Belgium.}

([email protected]).

(4)

interaction between time-distributed adaptive and routine observations. Cardinali, Pezzulli and Andersson (2004) introduce an influence matrix whose purpose is to identify influential data in a 4D-Var framework . They observe that low-influence data occur in areas with a large number of observations while high-influence data points occur in regions with sparse ones or in dynamically active areas. Finally, Liu, Kalnay, Miyoshi and Cardinali (2009) describe, in a Kalman filter set-ting, the impact of the self-sensitivity and the cross sensitivity (off-diagonal elements of influence matrix).

Our motivation for the present paper lies in several of the conclusions drawn in this literature. In particular, our work aims at a good numerical exploitation of the conclusion that the spatial distribution of observations may be influential. More specifically, we introduce new techniques for adaptively selecting observations in a 4D-Var setting, based on mathematically sound a posteriori error estimates within a dynamic hierarchy. The technique used for constructing the adaptive set of observations is inspired by adaptive finite elements techniques studied in Section 12.2 (p. 100) of Rincon-Camacho (2011). Related techniques for adaptive multigrid can be found in Brandt (1973) and McCormick (1984).

The paper is structured as follows: Section 2 introduces the 4D-Var vocabulary and the consid-ered hierarchy of observations. Section 3 then covers the associated error estimates and Section 4 presents the new adaptive algorithm. Section 5 discusses its application to a one-dimensional nonlinear wave equation and to the Lorenz96 model. Conclusions and perspectives are discussed in Section 6.

2 The problem and associated observation hierarchy

Consider the nonlinear least-squares problems in 4D-Var data assimilation, whose objective is to find an initial vector state at an initial time denoted as x = x(t0) ∈ IRn. The structure of the problem is as follows: min x∈IRn 1 2kx − xbk 2 B−1+1₂ Nt X j=0 kHj(x(tj)) − yjk2 R−1_j (2.1)

where the squared norm kxk2

M is induced by the inner product xTM x for a symmetric positive definite matrix M ∈ IRl×l and a vector x ∈ IRl. Here xb∈ IRn is the background vector, which is an a priori estimate. The vector yj ∈ IRmj _{is the vector of observations at time tj} _{and Hj} _{is the}

operator modeling the observation process at the same time. The state vector x(tj) satisfies the nonlinear model of evolution x(tj) = M0→j[x(t0)]. The matrix B ∈ IRn×n is a symmetric positive definite matrix representing the background-error covariance and the matrix Rj ∈ IRmj×mj _{is also}

a symmetric positive definite matrix representing the observation-error covariance at time tj. The nonlinear least-squares problem (2.1) is solved iteratively, and, at iteration k, the problem is simplified by linearizing the nonlinear observation operator Hj(x(tj)) at the iterate xk (which for the moment we denote only by x), leading to the optimization problem

min δx∈IRn 1 2kx − xb+ δxk 2 B−1+1₂kH δx − dk2_R−1 (2.2)

where R = diag(R0, R1, . . . , RN_t) ∈ IRm×m, d ∈ IRm denotes the concatenated misfits over time yj− Hj(x(tj)) and H ∈ IRm×n is the concatenated version of the linearized observation process

d =      d0 d1 .. . dN_t      ∈ IRm_{, H =}      H0 H1M0→1 .. . HN_tM0→N_t      ∈ IRn×m_.

Here Hj and M0→j are a (possibly approximate) linearization of the observation operator Hjand the model M0→j around x = xk(t0) respectively.

(5)

Diverse techniques for solving problem (2.1) have been proposed, see for instance Courtier, Th´epaut and Hollingsworth (1994) for the so called incremental method which is equivalent to applying a truncated Gauss-Newton iteration to problem (2.1) (see Gratton, Lawless and Nichols, 2007). The general algorithm is the following:

Algorithm 2.1 Incremental 4D-Var 1. Initialize x0= xb∈ IRn _{and set k = 0.}

2. Compute xk(tj) from xk(t0) = xk by running the nonlinear model M from time t0 to tN_t. 3. Calculate the vectors dk,j= yj− Hj(xk(tj)) for j = 1, . . . , Nt.

4. Find an approximate solution δxkof the minimization problem (2.2), where x = xk, H = Hk and d = dk.

5. Update the current solution as xk+1= xk+ δxk.

6. Set k := k + 1. If convergence is not achieved return to Step 2.

This algorithm is known as the outer loop and the minimization step (Step 4) as the inner loop. Our interest is to reduce the cost of this optimization problem. Termination criteria for the outer and inner loops of this algorithm are discussed in Gratton et al. (2007).

It is most natural to solve the subproblem in Step 4 of Algorithm 2.1 directly in the space of dimension n, the size of x. This technique is referred to as the primal approach. At variance, the quadratic optimization problem (2.2) may be rewritten in a space of dimension m, the number of observations. This is known as the dual approach and it is especially useful when m is signifi-cantly smaller than n, in which case the second term in (2.2) is of relatively low rank compared to the first. A first approach of this type is the Physical-space Statistical Analysis System (PSAS) method (see Courtier, 1997), where the “low rank” observation term in (2.2) is handled by using a Sherman-Morrison formula and applying the standard conjugate-gradient algorithm (see Hestenes and Stiefel, 1952, or Golub and Van Loan, 1996, Section 10.2, p. 520) to the reduced system. This method has the drawback of not maintaining the inherent monotonicity of the conjugate-gradient algorithm in IRn, thereby making any stopping rule of the inner minimization difficult to define (see El Akkroui, Gauthier, Pellerin and Buis, 2008, or Gratton, G¨urol and Toint, 2013). Fortunately, a better alternative is available, where the conjugate-gradient algorithm is reformulated in IRmusing the inner product defined by the metric HBHT _{in order to guarantee the desired monotonicity} properties without incurring additional cost. This technique, known as the Restricted Precondi-tioned Conjugate Gradient method (RPCG, see Gratton and Tshimanga, 2009 and Gratton et al., 2013), provides an efficient numerical procedure where substantial computing gains are obtained when m n. We refer the reader to the cited publications for details.

Our aim in this work is to make the best possible use of this dual technique and to propose an adaptive observations’ strategy for solving the data assimilation problem (2.1). Suppose that we have a large set of m observations O which can be decomposed into a hierarchical collection of sets {Oi}ri=0, each with cardinality mi, such that

Oi⊂ Oi+1 and mi< mi+1 (i = 0, . . . , r − 1)

where, by convention, Or= O and mr= m. To each set of observations Oi we associate a misfit vector yi ∈ IRmi _{(by selecting the relevant components in the vector y), and the corresponding}

observation-error covariance matrix Ri. Given {Oi}r

i=0, we may therefore consider the hierarchical collection of minimization problems

min x∈IRn 1 2kx − xbk 2 B−1+1₂kHi(x) − yik2 R−1_i , i = 0, . . . , r, (2.3) where the vector Hi(x) denotes the nonlinear observation operator concatenated over time.

Thus our objective is to construct the collection {Oi}r

i=0 such that the sequential solution of the problems (2.3) is obtained at significantly lower cost compared to solving (2.2) directly, while at the same time maintaining equivalent accuracy requirements.

(6)

3 Mathematical analysis

Our adaptive method is based on the exploitation of a posteriori bounds for the error between the solutions obtained using few observations or many. In particular, we are interested in comparing the accuracy of the solutions of problems (2.3) for Oi and Oi+1. For simplicity of notation, we denote them as Oc a set with mcobservations and Of a set containing mf observations such that mc < mf and Oc⊂ Of, where the indices c and f stand for ‘coarse’ and ‘fine’, respectively. The ‘fine’ optimization problem (2.3) is given, for the starting point x, by

min δxf∈IRn 1 2kx + δxf− xbk 2 B−1+1₂kHfδx − dfk2_R−1 f (3.4) where df = yf− Hf(x) and Hf is the linearized version of the observation process at x involving the set of observations Of. We may reformulate (3.4) as a convex quadratic problem with linear equality constraints given by

min δxf∈IRn, vf∈IRmf 1 2kx + δxf− xbk 2 B−1+1₂kvfk2_R−1 f , subject to vf = Hfδxf− df, (3.5) see Gratton et al. (2013). The Lagrange function corresponding to this reformulated problem is given by L(δxf, vf, λf) def = 1 2kx + δxf− xbk 2 B−1+1₂kvfk2_R−1 f − λT f(Hfδxf− vf− df). Using this function, the optimality conditions for problem (3.5) can then be expressed as

∇δxfL(δxf, vf, λf) = B −1_{(x + δx} f− xb) − HfTλf = 0, ∇vfL(δxf, vf, λf) = R −1 f (vf) + λf = 0, ∇λ_fL(δxf, vf, λf) = vf− Hfδxf+ df = 0, which leads to the system

H_fTλf = B−1(x + δxf− xb),

−λf = R−1_f (Hfδxf − df). (3.6)

Gratton et al. (2013) and Gratton and Tshimanga (2009) show that the solution of this system can be obtained by solving

(R_f−1HfBH_fT + Im_f)λf = R−1_f (df− Hf(xb− x)), (3.7) for λf and then substituting in the second equation of (3.6) for δxf and vf.

We now compare the solution (δxf, λf) to the solution of the coarse level minimization problem min δxc∈IRn 1 2kx + δxc− xbk 2 B−1+12kΓf(Hfδxc− df)k 2 R−1c , (3.8)

where Γf is the restriction operator Γf : IRmf _{→ IR}mc _{from the fine observation space to the}

coarse one. As above, we reformulate (3.8) as a convex quadratic problem with linear equality constraints, and obtain

min δxc∈IR n , vc∈IRmc 1 2kx + δxc− xbk 2 B−1+12kvck 2 Rc−1, subject to vc= Γf(Hfδxc− df), (3.9)

whose Lagrangian function is given by L(δxc, vc, λc) def = 1 2kx + δxc− xbk 2 B−1+1₂kvck2_R−1 c − λ T c(ΓfHfδxc− vc− Γfdf). The optimality conditions now become

HT

fΓTfλc = B−1(x + δxc− xb),

−λc = R−1

c Γf(Hfδxc− df)

(7)

These conditions may again be solved by computing λc such that

(R−1_c ΓfHfBH_fTΓT_f + Im_c)λc= R−1_c Γf(df− Hf(xb− x)). (3.11) Since the dimension of the system (3.11) is smaller than that of the system (3.7), solving problem (3.8) is (often much) cheaper than solving problem (3.4). The RPCG algorithm mentioned in the previous section derives its efficiency by using the formulation (3.11) rather than (3.7), see Gratton et al. (2013) and Gratton and Tshimanga (2009).

After obtaining the solution (δxc, λc), our objective is now to compute the difference between the (still unknown) λf and Πcλc in order to identify the set of observations where this difference is large. Here Πc is the prolongation operator given by

Πcdef= σfΓT_f (3.12)

for some σf > 0. Our intention is then to construct the “fine” problem from the “coarse” one by adding to the coarse the observations that are singled out by this comparison.

We start by computing the difference between λf and Πcλc in an appropriate norm. Using (3.6) and (3.10), we define E1 def= kλf− Πcλck2 Rf = hRf(λf− Πcλc) , −Rf−1(Hfδxf − df) − Πcλci = hλf− Πcλc, −Hfδxf+ df− RfΠcλci, and E2 def = kλf− Πcλck2_H fBHfT = kHT fλf− H T fΠcλck 2 B = hB(HT fλf− HfTΠcλc), B−1(x + δxf− xb) − HcTΠcλci = hλf− Πcλc, Hf(x − xb) + Hfδxf− HfBHfTΠcλci. By adding errors E1 and E2we obtain that

E1+ E2 = hλf− Πcλc, −Hfδxf+ df− RfΠcλci

+ hλf− Πcλc, Hf(x − xb) + Hfδxf − HfBHfTΠcλci + hλf− Πcλc, Hfδxc− Hfδxci

= hλf− Πcλc, df− RfΠcλc− Hfδxci

+ hλf− Πcλc, Hf(x − xb+ δxc) − HfBHfTΠcλci. As a consequence, we deduce that

E1+ E2 = hR 1/2 f (λf− Πcλc), R −1/2 f (df− Hfδxc− RfΠcλc)i + hHfBH_fT(λf− Πcλc), (HfBH_fT)−1(Hf(x − xb+ δxc) − HfBH_fTΠcλc)i = hR1/2_f (λf− Πcλc), R −1/2 f (df− Hfδxc− RfΠcλc)i + hB1/2_HT f(λf− Πcλc), B1/2HfT(HfBHfT)−1(Hf(x − xb+ δxc) − HfBHfTΠcλc)i. Thus, we obtain that

E1+ E2 ≤ kλf− ΠcλckR_f kdf− Hfδxc− RfΠcλck_R−1 f + kHT fλf− HfTΠcλckBkHfT(HfBHfT) −1_(H f(x − xb+ δxc) − HfBHfTΠcλc)kB ≤ kλf− ΠcλckR_f kdf− Hfδxc− RfΠcλck_R−1 f + kλf− Πcλck_H_f_BHT f kHf(x − xb+ δxc) − HfBH T fΠcλck(HfBHT_f)−1.

(8)

The Cauchy-Schwarz inequality then yields that E1+ E2 ≤ hkλf− Πcλck2 Rf+ kλf − Πcλck 2 HfBHfT i1/2 × kdf− Hfδxc− RfΠcλck2 R−1_f + kHf(x − xb+ δxc) − HfBH T fΠcλck 2 (HfBHfT)−1 1/2 = [ E1 + E2]1/2× kdf− Hfδxc− RfΠcλck2 R−1_f + kHf(x − xb+ δxc) − HfBH T fΠcλck2(HfBH_fT)−1 1/2 . This inequality leads to the a posteriori error bound

E1+ E2≤ kdf− Hfδxc− RfΠcλck2

R−1_f + kHf(x − xb+ δxc) − HfBH T

fΠcλck2(HfBH_fT)−1. (3.13)

The question then arises of how to compute the second term in the right hand side of this inequality, i.e. kHf(x − xb+ δxc) − HfBHT fΠcλck2(HfBH_fT)−1 = hHf(x − xb+ δxc) − HfBH_fTΠcλc, (HfBH_fT)−1Hf(x − xb+ δxc) − HfBH_fTΠcλci = hHf(x − xb+ δxc) − HfBHfTΠcλc, (HfBHfT)−1Hf(x − xb+ δxc) − Πcλci. (3.14) But the first equation of (3.10) multiplied by HfB gives that

HfBHfTΓTfλc= Hf(x − xb+ δxc), (3.15) and hence that

ΓT_fλc= (HfBH_fT)−1Hf(x − xb+ δxc). (3.16) Therefore kHf(x − xb+ δxc) − HfBHfTΠcλck2_(H fBHfT)−1 = hHf(x − xb+ δxc) − HfBH_fTΠcλc, ΓT_fλc− Πcλci = (_σ1 f − 1)hHf(x − xb+ δxc) − HfBH T fΠcλc, Πcλci, (3.17)

because of (3.12). We note also that kλf− Πcλck2 Rf+HfBHT_f = kλf− Πcλck 2 Rf + kλf− Πcλck 2 HfBHT_f = E1+ E2. (3.18)

As a consequence, the left-hand side of this inequality (the sought error estimate) can be bounded above using inequality (3.13), giving the following a posteriori error.

Theorem 3.1 Let δxf be the solution to the problem (3.5) and λf the corresponding Lagrange multiplier to the constraint in (3.5) such that the couple (δxf, λf) satisfies (3.6). Analogously, let δxc be the solution to (3.9) and λc the corresponding Lagrange multiplier such that (δxc, λc) satisfies (3.10). Then the a posteriori error bound satisfies the inequality

kλf− Πcλck2 Rf+HfBH_fT ≤ kdf− Hfδxc− RfΠcλck 2 R−1_f + (1 σf − 1)hHf(x − xb+ δxc) − HfBH T fΠcλc, Πcλci. (3.19)

Note that the bound (3.19) does not involve the computation of λf or δxf.

The use of an a posteriori error bound depending on a primal-dual problem in adaptive finite elements was studied in Rincon-Camacho (2011).

In the next section we describe how to make use of the bound (3.19) in order to algorithmically construct the ‘fine’ problem from the ‘coarse’ one.

(9)

4 A new adaptive algorithm

Starting from a small set of observations Oc, our goal is to add only significant observations to produce Of so that the a posteriori error (3.19) is reduced. Our strategy is to define an auxiliary set of potential fine observations ˜Of from which the observations in Of are selected. However, describing our strategy (and algorithm) requires additional assumptions on the hierarchy of (potential) observations. More specifically, we complete our assumptions as follows.

• The observations correspond to localizations in some underlying continuous measurable “ob-servation space”.

• The coarse observation set partitions the observation space in a finite number of cells {cj}pc

j=1 of measures {wj}pc

j=1.

• The auxiliary set ˜Of is constructed by considering all observations in Oc with the addition of a single additional potential observation point in the interior of each cell. The cell is said to be associated with this additional potential observation.

• There exists a prolongation operator ˜Πcfrom Octo ˜Of such that, for each potential observa-tion oj in ˜Of\ Oc, ˜Πcdefines the value of this observation only in terms of the observations of the associated cell cj. As expected, we define ˜Πc = σfΓ˜T

f.

We illustrate these assumptions by an example: suppose that the observation space is the plane and the coarse observation set Oc is the rectangular ‘grid’ shown in Figure 4.1 (a): the cells are elementary rectangles in this grid, whose measure is given by their surface. We may then define

˜

Of as the grid shown in Figure 4.1 (b), which we obtained by locally adding a new potential observation in the center of each rectangle and four additional ones on its boundary (effectively doubling the mesh in every direction). The observations in Of (as shown in Figure 4.1 (c)) can then be extracted from ˜Of.

(a) Coarse observation set Oc (b) Auxiliary observation set ˜Of

(c) Fine observation set Of

Figure 4.1: Auxiliary observations

In this example, the restriction operator ˜Γf can be defined as the usual full weighting oper-ator associated with bilinear interpolation prolongations (which justifies the introduction of the

(10)

boundary points). This full-weighting restriction operator is given, on every grid node, by the stencil 1 16   1 2 1 2 4 2 1 2 1  . (4.20)

In order to achieve our goal to select ‘important’ observations from ˜Of, we need to compute localized error indicators (3.19). We define them as

ηj def = wjh( ˜df− ˜Hfδxc− ˜RfΠ˜cλc)|j, ˜R−1f ( ˜df− ˜Hfδxc− ˜RfΠ˜cλc)|ji, νj def = ( 1 σf − 1)wjh( ˜Hf(x − xb+ δxc) − ˜HfB ˜H T fΠ˜cλc)|j, ˜Πcλc|ji,

where ˜df, ˜Hf, ˜Rf are constructed from the set of observations ˜Of and where the symbol |jdenotes the restriction of the associated quantity to the cell cj. Note that the η’s correspond to the first term on the right-hand side of (3.19) while the ν’s correspond the the second term.

In order to decide which cells will be chosen to include a new interior potential observation, we use the bulk criterion (also known as D¨orfler marking, see, for instance, D¨orfler, 1996, Morin, Nochetto and Siebert, 2000, Logg, Mardal and Wells, 2012). For some constants θ1, θ2 ∈ (0, 1), we construct the sets Sη and Sν such that

θ1   p X j=1 ηj  ≤ X k∈Sη ηk and θ2   p X j=1 νj  ≤ X k∈Sν νk. (4.21)

In practice this construction is carried out by progressively constructing each set using a greedy heuristic which includes first the non-included cell with maximal indicator value.

Once Sη and Sν are constructed, we decide that a cell k of Oc is ‘refined’ if k ∈ Sη or k ∈ Sν, meaning that the observations associated with the corresponding cell in ˜Of are added to the set Oc to construct the new set Of. More formally,

Of def= Oc∪   [ k∈Sη∪Sν ok  . (4.22)

Thus, starting with a small set of observations O0, we progressively add observations using the method just described, resulting in the following algorithm.

Algorithm 4.1 An algorithm using adaptive observations 1. Set i = 0, initialize x and the coarse observation set O0.

2. Given the set of observations Oi, construct the auxiliary set ˜Oi+1 such that the conditions described at the beginning of this section hold.

3. Find the solution (δxi, λi) to the problem min

δxi∈IRn 1

2kxi+ δxi− xbk

2

B−1+1₂k˜Γi+1( ˜Hi+1δxi− ˜di+1)k2_R−1 i

, (4.23)

by approximately solving the system (R−1_i Γ˜iH˜i+1B ˜Hi+1T Γ˜

T

i+1+ Imi)λi= R

−1

i Γ˜i+1( ˜di+1− ˜Hi+1(xb− xi)) using RPCG and then setting δxi = xb− xi+ B ˜Hi+1T ˜ΓTiλi.

4. For each cell cj of observation set Oi compute the error indicators

ηj = wjh( ˜di+1− ˜Hi+1δxi− ˜Ri+1Π˜iλi)|j, ˜R−1i+1( ˜di+1− ˜Hi+1δxi− ˜Ri+1Π˜iλi)|ji, νj = (_σ_i+11 − 1)wjh( ˜Hi+1(xi− xb+ δxi) − ˜Hi+1B ˜Hi+1T Π˜iλi)|j, ˜Πiλi|ji.

(11)

5. Build the sets Sη and Sν such that θ1   pi X j=1 ηj  ≤ X k∈Sη ηk, and θ2   pi X j=1 νj  ≤ X k∈Sν ηk.

using the bulk criterion. 6. Construct the set Oi+1 as

Oi+1:= Oi∪   [ k∈Sη∪Sν ok   . 7. Update x ← x + δxi.

8. Increment i and return to Step 2.

We note that the computation of (δxi, λi) in Step 3 corresponds to applying the RCPG al-gorithm to (4.23), thereby making this computation essentially dependent on m, the number of observations, which the algorithm maintains as small as necessary by design. We also note that the bulk of the remaining work is in the computation of the error indicators in Step 4, as this requires the product of ˜Hi+1 with δxi. Observe that, because of (3.12), ˜Hi+1T Π˜iλi = σi+1H˜i+1T Γ˜Ti+1λi and that both this quantity and the product ˜Hi+1(xi− xb) are already available as by-products of Step 3.

Similar methods for constructing adaptive grids are the multi-level adaptive technique (MLAT) studied in Brandt (1973) and the fast adaptive composite grid (FAC) presented in McCormick (1984).

5 Numerical experiments

5.1 Two test problems

This section is devoted to showing the performance of our new adaptive algorithm on two test cases. The first one is a one-dimensional wave equation system, which we refer to as the 1D-Wave model from now on. The dynamics on this model are governed by the following nonlinear wave equations: ∂2 ∂t2u(z, t) − ∂ 2 ∂z2u(z, t) + f (u) = 0, u(0, t) = u(1, t) = 0, u(z, 0) = u0(z), ∂_∂tu(z, 0) = 0, 0 ≤ t ≤ T, 0 ≤ z ≤ 1, (5.24)

where we have chosen f (u) = µeηu_{. In this case we look for the initial function u0(z), which} corresponds to x in the data assimilation problem (2.1). We illustrate in Figure 5.2 (a) the initial state vector u0 (x = u0) and the evolution of the system in Figure 5.2 (b) (view from the top) where the space domain corresponds to the horizontal axis and the time domain to the vertical axis.

Our second example is the model referred as Lorenz96 presented in Lorenz (1995). The vari-able ¯u is a vector of N -equally spaced entries around a circle of constant latitude, i.e. ¯u(t) = (u1(t), u2(t), . . . , uN(t)). The N -dimensional system is determined by the following N equations

duj dt =

1

(12)

(a) Initial x0:= u0(z) (b) Dynamical system over space and time

Figure 5.2: Nonlinear 1D Wave equation

where F and κ are constants independent of j. To form a cyclic chain, we set uN = u0, u−1 = uN −1 and uN +1 = u1. This system is known to have a chaotic behaviour over time depending on the parameters N , F and κ (see for instance Karimi and Paul, 2009) For a given set of parameters N , F and κ for which a stable behavior is observed, i.e. where data assimilation can be performed, we consider then the following dynamical system:

duj+θ dt =

1

κ(−uj+θ−2uj+θ−1+ uj+θ−1uj+θ+1− uj+θ+ F ), j = 1, . . . N, θ = 1, . . . , Θ, (5.26) where θ and Θ are integers. Thus, the new size of the vector ¯u(t) is N × Θ, which may be specified as large as needed in our numerical experiments. The dynamical system is plotted for N = 40, F = 8, κ = 120 and Θ = 10, using the coordinate graph described in Figure 5.3 (a). For an initial state x = ¯u(0) as that shown in Figure 5.3 (b) the system develops over time as described in Figure 5.3 (c) (view from the top). As we observe that the system becomes chaotic after a certain time, we consider a reduced window of assimilation plotted in Figure 5.3 (d).

5.2 Results

We now provide numerical results using Algorithm 4.1, which uses the RPCG algorithm in Step 3, as was mentioned in Section 2. When tuning the accuracy parameter for stopping the inner iterations, we noticed that it was suitable to choose a value in the middle of the possible range. In the case of the 1DWave equation the range of the parameter is more or less defined by [10−1, 10−4], thus we choose 10−2, while the range is approximately [10−1, 10−6] in the case of the Lorenz96 model, and we chose the value 10−4 in our experiment.

For the 1DWave problem, we depict the background vector in a black dashed line together with the true solution as a red line, in Figure 6.4 (a). The result given by our algorithm is plotted as a blue line in Figure 6.4 (b). The background vector and the true solution are plotted in Figure 6.5 (a) and the result given by our algorithm is presented in Figure 6.5 (b) for the Lorenz96 model.

In order to observe the adaptive nature of the algorithm, for an intermediate iteration i, we display two consecutive sets of observations Oi and Oi+1 in Figures 6.6 (a)-(b) and 6.7 (a)-(b) for the 1DWave and the Lorenz96 respectively. In order to appreciate the impact of the local error indicators defined in Step 4, we also illustrate, in Figures 6.6 (e) and 6.7 (e), the local behaviour of the error between the prolongation of the current λito the set Oi+1and the true ˜λi+1, as given by

j= D

(˜λi+1− Πiλi)|j, [( ˜Ri+1+ ˜Hi+1B ˜Hi+1T )(˜λi+1− Πiλi)]|j E

,

together with that of the local error indicators themselves (ηj and νj are displayed Figures 6.6 (c)-(d) and 6.7 (c)-(d), respectively).

(13)

(a) Coordinate system

(b) Initial x0:= (u1(0), u2(0), . . . , uN(0))

(c) Dynamical system

over space and time (d) Window of assimilation

Figure 5.3: Lorenz96 problem

In the case of the 1DWave equation we observe in Figure 5.2 (a) that the peak in the middle of the signal produces a dynamical reaction over space and time, shown in part (b) of the same figure. The errors ηjand νj in Figure 6.6 (c) and (d) are larger in the regions of strong dynamical activity, whose identification is clear in Figure 6.6 (a)-(b) which shows the evolution of the observation sets. In Figure 6.6 (e), we also observe that the regions where the difference j is large coincide with those where ηj and νj are also large in Figure 6.6 (c)-(d).

In the case of the Lorenz96 model where the initial signal has also a peak in the middle (Figure 5.3 (b)), the high dynamics are on the left part of the space and time graph (Figure 5.3 (d)). In this case the quantities ηjand νjalso provide correct indication of where more observations are needed, see Figure 6.7 (c)-(d). We also note that the distance jin Figure 6.7 (c) is consistent with ηj and νj. Indeed, the sets of observations in Figure 6.7 (a)-(b) show that the observations concentrate on the left part of the graph where the highly dynamic nature of the model is most evident.

In Figures 6.8 and 6.9, we compare the performance of the new algorithm with the simple use of uniform observations and a benchmark method where a pre-established hierarchy of uniform distributed and progressively denser observations sets is used in Algorithm 4.1 (skipping Steps 4-6). This last method bypasses the new adaptive features of the method and its associated computing cost. For this comparison, we define, for a given iterate x resulting from Step 7 of Algorithm 4.1, the cost function as the value of the function in (2.3) when i = r, i.e. when all the possible observations are used. We then plot the evolution of this cost function (in logarithmic scale) against the number of observations used and the associated flop (floating-point operations) counts for the three algorithms. (The curves for the uniform case do not correspond to an algorithm, but show the accuracy and computational costs associated with directly solving the problem on a uniform grid for each specific size.)

(14)

that obtained by using uniformly distributed observations (plotted in blue) or the benchmark (plotted in black), which is explained by the fact that most observations used by the adaptive algorithm are in regions of space where their contribution to accuracy is largest.

Computational experience not reported here also indicates that the effect of the choice of starting value of x does not affect significantly the hierarchy of observation sets beyond the fact that observations concentrate in regions of high dynamical activity. We also found that [0.4, 0.7] appears to be an adequate range for the parameters θ1 and θ2 in (4.21), yielding a satisfactory rate of inclusion of new observations at each iteration.

Both results on accuracy and computational costs are therefore highly encouraging.

6 Conclusions and perspectives

A method which identifies the influential data on a 4D-Var data assimilation problem is proposed, as well as an algorithm that exploits this identification to improve on computing efficiency. The cpu-time gains are obtained for two reasons, the first being that the available number of obser-vations is used very effectively, and the second the fact that the cost of the subproblem solution is signficantly reduced by the use of dual-space conjugate-gradient techniques like RPCG. Nu-merical experience has been presented on two nonlinear test problems, and the results are so far encouraging.

Further refinements of the algorithm could be considered, such as the use of adaptive precondi-tioners in the subproblem solver, and continued experience with the method is of course desirable to assess its true potential. Extensions of these ideas in other domains are also possible: we think in particular of data assimilation problems in frameworks where each state of the system is itself an image on which adaptive reconstruction techniques could be applied.

Acknowledgments.

(15)

(a) Background vector and true initial u(0)

(b) True u(0) and algorithm solution

Figure 6.4: Results: Nonlinear 1D Wave equation

(a) Background vector and true initial u(0)

(b) True u(0) and algorithm solution

(16)

(a) Oi

−→

(b) Oi+1

(c) ηj (d) νj (e) j

Figure 6.6: Observations set and adaptive errors

(a) Oi

−→

(b) Oi+1

(c) ηj (d) νj (e) j

(17)

(a) Cost function versus number of observations (b) Cost function versus flops Figure 6.8: Performance of the algorithm on the nonlinear wave equation

(a) Cost function versus number of observations (b) Cost function versus flops Figure 6.9: Performance of the algorithm on the Lorenz96 problem

(18)

References

L.M. Berliner, Z.Q. Lu, and C. Snyder. Statistical design for adaptive weather observations. Journal of the Atmospheric Sciences, 56(15), 2536–2552, 1999.

A. Brandt. Multi-level adaptive technique (MLAT) for fast numerical solution to boundary value problems. in ‘Proceedings of the Third International Conference on Numerical Methods in Fluid Mechanics’, pp. 82–89, Heidelberg, Berlin, New York, 1973. Springer Verlag.

C. Cardinali, S. Pezzulli, and E. Andersson. Influence-matrix diagnostic of a data assimilation system. Quarterly Journal of the Royal Meteorological Society, 130(603), 2767–2786, 2004. Ph. Courtier. Dual formulation of four-dimensional variational assimilation. Quarterly Journal of

the Royal Meteorological Society, 123, 2449–2461, 1997.

Ph. Courtier, J.-N. Th´epaut, and A. Hollingsworth. A strategy for operational implementation of 4D-Var using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120, 1367–1388, 1994.

D.N. Daescu and I.M. Navon. Adaptive observations in the context of 4D-Var data assimilation. Meteorology and Atmospheric Physics, 85(4), 205–226, 2004.

W. D¨orfler. A convergent adaptive algorithm for Poisson’s equation. SIAM Journal on Numerical Analysis, 33(3), 1106–1124, 1996.

A. El Akkroui, P. Gauthier, S. Pellerin, and S. Buis. Intercomparison of the primal and dual formulations of variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 134(633), 1015–1025, 2008.

G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Balti-more, third edn, 1996.

S. Gratton and J. Tshimanga. An observation-space formulation of variational assimilation using a restricted preconditioned conjugate-gradient algorithm. Quarterly Journal of the Royal Meteorological Society, 135, 1573–1585, 2009.

S. Gratton, S. G¨urol, and Ph. L. Toint. Preconditioning and globalizing conjugate gradients in dual space for quadratically penalized nonlinear-least squares problems. Computational Optimization and Applications, 54(1), 1–25, 2013.

S. Gratton, A. Lawless, and N. K. Nichols. Approximate Gauss-Newton methods for nonlinear least-squares problems. SIAM Journal on Optimization, 18, 106–132, 2007.

J.A. Hansen and L.A. Smith. The role of operational constraints in selecting supplementary observations. Journal of the Atmospheric Sciences, 57(17), 2859–2871, 2000.

M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of the National Bureau of Standards, 49, 409–436, 1952.

A. Karimi and M.R. Paul. Extensive chaos in the Lorenz-96 model. Arxiv preprint arXiv:0906.3496, 2009.

M. Leutbecher. Adaptive observations, the Hessian metric and singular vectors. in ‘Proc. ECMWF Seminar on Recent developments in data assimilation for atmosphere and ocean, Reading, UK’, pp. 8–12, 2003a.

M. Leutbecher. A reduced rank estimate of forecast error variance changes due to intermittent modifications of the observing network. Journal of the Atmospheric Sciences, 60(5), 729–742, 2003b.

(19)

J. Liu, E. Kalnay, T. Miyoshi, and C. Cardinali. Analysis sensitivity calculation in an ensemble Kalman filter. Quarterly Journal of the Royal Meteorological Society, 135(644), 1842–1851, 2009.

A. Logg, K.A. Mardal, and G.N. Wells. Automated solution of differential equations by the finite element method, Vol. 84 of Lecture Notes in Computational Science and Engineering. Springer Verlag, Heidelberg, Berlin, New York, 2012.

E.N. Lorenz and K.A. Emanuel. Optimal sites for supplementary weather observations: Simulation with a small model. Journal of the Atmospheric Sciences, 55(3), 399–414, 1998.

E. N. Lorenz. Predictability: a problem partly solved. in ‘Seminar on Predictability’, Vol. 1, pp. 1–18, Reading, UK, 1995. ECMWF.

S. T. McCormick. Fast adaptive composite grid (FAC) methods: Theory for the variational case. in ‘Defect correction methods’, pp. 115–121. Springer Verlag, Heidelberg, Berlin, New York, 1984.

P. Morin, R.H. Nochetto, and K.G. Siebert. Data oscillation and convergence of adaptive FEM. SIAM Journal on Numerical Analysis, 38(2), 466–488, 2000.

M. M. Rincon-Camacho. Adaptive methods for Total Variation based image restoration. PhD thesis, University of Graz, Graz, Austria, 2011.