Optimal control under probability constraint

(1)

HAL Id: inria-00585861

https://hal.inria.fr/inria-00585861

Submitted on 14 Apr 2011

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Optimal control under probability constraint

Pierre Carpentier, Jean-Philippe Chancelier, Guy Cohen

To cite this version:

Pierre Carpentier, Jean-Philippe Chancelier, Guy Cohen. Optimal control under probability con-straint. SADCO Kick off, Mar 2011, Paris, France. �inria-00585861�

(2)

Problem formulation Modeling improvement Stochastic Arrow-Hurwicz algorithm Numerical results

Optimal control under probability constraint

P. Carpentier — ENSTA ParisTech

J.-P. Chancelier — ´Ecole des Ponts ParisTech

G. Cohen — ´Ecole des Ponts ParisTech

This work has been supported by

Thal`es-Al´enia-Space (Cannes) and CNES (Toulouse)

SADCO Workshop

(3)

Problem formulation Modeling improvement Stochastic Arrow-Hurwicz algorithm Numerical results

Presentation outline

1 Problem formulation

2 Modeling improvement

3 Stochastic Arrow-Hurwicz algorithm

(4)

Problem formulation

Modeling improvement Stochastic Arrow-Hurwicz algorithm Numerical results

Satellite model and deterministic optimization problem Engine failure

Stochastic formulation

(5)

Problem formulation

Satellite model and deterministic optimization problem

Engine failure Stochastic formulation

Satellite model

dr dt = v , dv dt = −µ r kr k3 + F mκ, (1a) dm dt = − T g0Isp δ . (1b)

(1a) : 6-dimensional state vector (position r and velocityv).

(1b) : 1-dimensional state vector (massm including fuel).

κinvolves the direction cosines of the thrust and the on-off switch

δ of the engine (3 controls), and µ, F , T , g0, Isp are constants. The deterministic control problem is to drive the satellite from the initial condition at ti to a known final position rf and velocity vf

(6)

Problem formulation

Engine failure Stochastic formulation

Deterministic optimization problem

Using equinoctial coordinates for the position and velocity ; state vector x ∈ R7,

and cartesian coordinates for the thrust of the engine

; control vector u ∈ R3,

the deterministic optimization problem is written as follows: min u(·) K x (tf) (2a) subject to: x (ti) = xi, • x (t) = f x (t), u(t) , (2b) ku(t)k ≤ 1 ∀t ∈ [t_i, tf] , (2c) C x (tf) = 0 . (2d)

(7)

Problem formulation

Engine failure

Sometimes, the engine may fail to work when needed: the satellite drift away from the deterministic optimal trajectory. After the engine control is recovered, it is not always possible to drive the satellite to the final target at tf.

By anticipating such possible failures and by modifying the trajectory followed before any such failure occurs, one may increase the possibility of eventually reaching the target. But such a deviation from the deterministic optimal trajectory results in a deterioration of the economic performance. The problem is thus to balance the increased probability of eventually reaching the target despite possible failures against the expected economic performance, that is, to quantify the price of safety one is ready to pay for.

(8)

Problem formulation

Stochastic formulation (1)

A failure is modeled using two random variables: tp : random initial time of the failure,

td : random duration of the failure.

For every realization (tpξ, t_dξ):

1 u(·) denotes the control used prior to any failure

; u is defined over [ti, tf] but implemented over [ti, tpξ] and corresponds to an open-loop control,

2 _{the control is 0 in [t}_pξ_{, t}_pξ_{+ t}ξ

d],

3 vξ(·) denotes the control used after the end of the failure

; vξ _{is defined over [t}ξ

p+ t_dξ, tf] (if nonempty) and corresponds to a closed-loop strategy v. The satellite dynamics in the stochastic formulation writes:

xξ(ti) = xi, •

(9)

Problem formulation

Stochastic formulation (2)

The problem is to minimize the expected cost (fuel consumption) w.r.t. the open-loop control u and the closed-loop strategy v,

the probability to hit the target at time tf being at least p.

min u(·) E min vξ_(·)K x ξ_(t f) (3a) subject to: xξ(ti) = xi, • xξ(t) = fξ xξ(t), u(t), vξ(t) , (3b) ku(t)k ≤ 1 ∀t ∈ [t_i, tf] , kvξ(t)k ≤ 1 ∀t ∈ [tpξ+ t ξ d, tf] , (3c) P C xξ(tf) = 0 ≥ p. (3d)

(10)

Problem formulation

Modeling improvement

Stochastic Arrow-Hurwicz algorithm Numerical results Probability as an expectation Cost refinement Final formulation 1 Problem formulation 2 Modeling improvement Probability as an expectation Cost refinement Final formulation

(11)

Problem formulation

Stochastic Arrow-Hurwicz algorithm Numerical results Probability as an expectation Cost refinement Final formulation

Probability as an expectation

Let I(y ) = ( 1 if y = 0, 0 otherwise. Then P C xξ(tf) = 0 = EI C xξ(tf) . Thus, Problem (3) can (shortly) be written:

min u(·) E min vξ_(·)K x ξ_(t f) (4a) s.t. E I C xξ(tf) ≥ p . (4b)

Formulation (4) opens the possibility to make use of a stochastic

(12)

Problem formulation

Stochastic Arrow-Hurwicz algorithm Numerical results

Probability as an expectation

Cost refinement

Final formulation

Missing the target should not contribute to the cost

Whenever a failure occurred, setting vξ ≡ 0 over [tpξ+ t_dξ, tf] generally makes the satellite misses the target and does not contribute to the probability constraint,

but it forces fuel consumption to its minimum and contributes to minimize the cost function.

This yields an artificially good expected cost; however, one is not interested in what happens whenever the whole mission has failed. Therefore, a better formulation is to register only scenarios when the target is effectively hit; instead of the original expected cost in (4), we thus prefer to deal with a conditional expected cost:

E K xξ(tf) C x ξ_(t f) = 0 = E K xξ(tf) × I C xξ(tf) E I C xξ_(t f) .

(13)

Problem formulation

Cost refinement

Final formulation

Conditional expected cost

The problem is reformulated as

min u(·),v(·) E K xξ(tf) × I C xξ(t_f) E I C xξ(t_f) (5a) s.t. E I C xξ(tf) ≥ p . (5b)

Such a formulation is however not well-suited for the stochastic Arrow-Hurwicz algorithm:

(14)

Problem formulation

Cost refinement

Final formulation

An useful lemma

Using compact notation, Problem (5) is:

min

u

J(u)

Θ(u) s.t. Θ(u) ≥ p , (6) in which J and Θ assume positive values.

1 _{If u}]_{is a solution of (6) and if Θ(u}]_{) = p, then u}]_{is also a solution of}

min

u J(u) s.t. Θ(u) ≥ p . (7) 2 Conversely, if u]_{is a solution of (7), and if an optimal Kuhn-Tucker}

multiplier β]satisfies the condition

β]≥ J(u

]

) Θ(u]₎ ,

(15)

Problem formulation

Cost refinement

Final formulation

Back to a cost in expectation

Finally, instead of (5) we aim at solving

min u(·),v(·)E K xξ(tf) × I C xξ(tf) (8a) s.t. E I C xξ(t_f) ≥ p . (8b)

(16)

Problem formulation

Probability as an expectation Cost refinement

Final formulation

Final formulation and associated Lagrangian

min u(·) E min vξ_(·)K x ξ_(t f) × I C xξ(tf) s.t. µ p − E I C xξ(t_f) ≤ 0 .

Assuming there exists a saddle point for the associated Lagrangian, we have finally to solve

max µ≥0 minu(·) µp + Emin vξ_(·) K x ξ_(t f)−µ×I C xξ(tf) .

Remark: it is convenient to put apart the no-failure event, namelytp≥ tf

in the problem formulation. For the sake of simplicity, we do not present this last problem improvement. We will denote byπf the probability of this event.

(17)

Problem formulation Modeling improvement

Stochastic Arrow-Hurwicz algorithm

Numerical results

Algorithm overview

Downstream closed-loop problem Smoothing function I

Solving the resulting approximated problem

Algorithm overview

(18)

Numerical results

Algorithm overview

Algorithm overview (1)

In order to solve max µ≥0 minu(·) ( µ p + Emin vξ_(·) K x ξ_(t f) − µ × I C xξ(tf) | {z } W (u, µ, ξ) ) . that is, max µ≥0 minu(·) n µ p + E W (u, µ, ξ)o ,

(19)

Numerical results

Algorithm overview

Algorithm overview (2)

Arrow-Hurwicz algorithm At iteration k, 1 draw a failure ξk = (tξ k p , tξ k

d ) according to its probability law,

2 _{update u(·):} uk+1= Π_Buk− εk_∇ uW (uk, µk, ξk) , 3 _{update µ:} µk+1 = max 0, µk + ρk p + ∇µW (uk+1, µk, ξk) .

(20)

Numerical results

Algorithm overview

Downstream closed-loop problem

Smoothing function I

Downstream closed-loop problem

Consider the inner optimization problem: W (u, ξ, µ) = min vξ_(·) n K xξ(tf) − µ ×I C xξ(tf) o ,

If reaching the target is impossible (C xξ(tf) 6= 0 ∀vξ),

the best thing to do is to stop immediately: W ≡ 0. If reaching the target is possible but too expensive (that

is K xξ(tf) ≥ µ), the best thing to do is again to stop

immediately: W ≡ 0.

Therefore, it must be checked that one can reach the target while maintaining fuel consumption below a threshold, and

the corresponding optimal control v∗ξ(·) must be computed.

At every iteration k, we must evaluate function W as well as its derivatives w.r.t. u(.) and µ. But W is not differentiable!

(21)

Numerical results

Algorithm overview

Downstream closed-loop problem

Smoothing function I

First difficulty: I is not a smooth function

I(y ) = ( 1 if y = 0, 0 otherwise, Ir(y ) =    1 −y_r22 2 if y ∈ [−r , r ], 0 otherwise. 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

There are specific rules to drive rk to 0 as the iteration number

k goes to infinity in order to obtain the best asymptotic Mean Quadratic Error of the gradient estimates (see [1]).

(22)

Numerical results

Algorithm overview

Second difficulty: solving the approximated problem

The approximated closed-loop problem to solve at each iteration is: Wr(uk, ξk, µk) = min vξ_(·) n K xξ(tf) − µk ×Ir C xξ(t_f) o .

In this setting, we have to check if the target is reachedup to r.

In practice, the solution of the approximated problem is derived from the resolution of two standard optimal control problems.

(23)

Problem formulation Modeling improvement Stochastic Arrow-Hurwicz algorithm

Numerical results

Brief description

Results for different values of p The price of safety

4 Numerical results

Brief description

(24)

Numerical results

Brief description

Mission description

Interplanetary mission (Earth-Mars trajectory): duration of the mission: 450 days,

tp: exponential distribution s.t. P tp≥ tf =πf ≈ 0.58, td: exponential distribution s.t. P 2 ≤ td≤ 7 ≈ 0.80. Comp. normale w Comp. tangentielle s Comp. radiale q 0 1 2 3 4 5 6 7 8 9 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

Using normalized units: ti= 0.69 and tf = 8.73.

The deterministic optimal control has a “bang–off–bang” shape. Along the deterministic optimal path, the probability to recover a failure is:

(25)

Numerical results

Brief description

Parameters tuning

Gradient step length:

εk = a

b +k , ρ

k ₌ c

d +k ,

usual for a stochastic gradient algorithm.

Smoothing parameter:

rk = α

β +k13

,

(26)

Numerical results

Brief description

Results for different values of p

The price of safety

Masse finale / iterations

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.675 0.676 0.677 0.678 0.679 0.680 0.681

Multiplicateur probabilite / iterations

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Comp. normale w Comp. tangentielle s Comp. radiale q 0 1 2 3 4 5 6 7 8 9 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

(27)

Numerical results

Brief description

The price of safety

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.674 0.675 0.676 0.677 0.678 0.679 0.680 0.681

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.314 0.316 0.318 0.320 0.322 0.324 0.326 Comp. normale w Comp. tangentielle s Comp. radiale q 0 1 2 3 4 5 6 7 8 9 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

(28)

Numerical results

Brief description

The price of safety

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.673 0.674 0.675 0.676 0.677 0.678 0.679 0.680 0.681

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.318 0.320 0.322 0.324 0.326 0.328 0.330 0.332 Comp. normale w Comp. tangentielle s Comp. radiale q 0 1 2 3 4 5 6 7 8 9 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

(29)

Numerical results

Brief description

The price of safety

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.674 0.675 0.676 0.677 0.678 0.679 0.680 0.681

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360 0.365 Comp. normale w Comp. tangentielle s Comp. radiale q 0 1 2 3 4 5 6 7 8 9 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0

(30)

Numerical results

Brief description

The price of safety

Fuel consumption versus probability level

Consommation sans panne / Probabilite

0.85 0.90 0.95 1.00 0.3204 0.3206 0.3208 0.3210 0.3212 0.3214 0.3216 0.3218 0.3220 0.3222

(31)

Numerical results

Brief description

The price of safety

Conclusion

Main conclusion

We are able to deal with probability constraints in the optimal

control framework.

Future works

From the theoretical point of view:

existence of a saddle point for the constrained problem, smoothing process (results available only for inequality constraints).

From the numerical point of view:

efficient solver for the downstream problem, computer parallelization.

(32)

Laetitia Andrieu, Guy Cohen and Felisa Vazquez-Abad.

Stochastic Programming with Probability.

arXiv, math.OC 0708.0281, 2007.

Jean-Christophe Culioli and Guy Cohen.

Decomposition/Coordination Algorithms in Stochastic Optimization.

SIAM Journal on Control and Optimization, 28, 1372-1403, 1990.

Jean-Christophe Culioli and Guy Cohen.

Optimisation stochastique sous contraintes en esp´erance.

CRAS, t.320, S´erie I, pp. 75-758, 1995.

Thierry Dargent.

Transfert d’orbite optimal par application du principe du minimum.

Logiciel T 3D, version 2, 19 mars 2004.

Andr´as Pr´ekopa.

Stochastic Programming.

(33)

Numerical results

Brief description

The price of safety