Lagrange multipliers

(1)

Sylvain Rubenthaler

28 octobre 2019

(2)

Gradients

Whenf :_Rⁿ_→_R is sufficiently differentiable. We can write a first order development, using many equivalent notations for the first order differential off:

f(x+u) = f(x)+df(x)(u)+o(kuk)

= f(x)₊f⁰(x).u₊o(_ku_k). We have

f⁰(x).u=

n

X

i=1

∂f

∂xi

(x)×ui. So, if we define the gradient off by

∇f(x)=







∂f

∂x1(x) ...

∂f

∂xn(x)





 ,

(3)

Lagrange multipliers Notations

Gradients

we then get

f⁰(x).u= 〈∇f(x),u〉 (〈. . . , . . .〉is the scalar product).

BThis is not a real proof.

(4)

Maximization of f

Suppose, we havef :_Rⁿ→R andg1, . . . ,gp:_Rⁿ→R. We set M₌{x:gi(x)₌ci,1_≤i_≤p} (for some constantsci). We are interested in

argmax

x∈M

f(x),

which is the same as finding

½ maxf(x)

under g_i(x)₌c_i,1_≤i_≤p. (1)

(5)

Optimization under equality constraints

Necessary condition

Suppose we have found a maximum inx₀. Then, for any “small”

moveu such thatg_i⁰(x0).u=0 (that is, a move that stays inM), we have

f(x₀₊u)₌f(x₀)₊f⁰(x₀).u₊o(_ku_k). Sof⁰(x₀).u₌0. This means that

∇f(x0)_∈Span(_∇g1(x0), . . . ,∇gp(x0)).

(6)

Lagrangian function

We set

L(x,λ1, . . . ,λp)₌f(x)₋

p

X

i=1

λi(g_i(x)₋c_i)

(L:_Rⁿ×R^p→R). In x0, there exist _λ⁽⁰⁾₁ , . . . ,_λ⁽⁰⁾_p such that

∇f(x0)₌

p

X

i=1

λ⁽⁰⁾_i ∇gi(x0).

(7)

Optimization under equality constraints

Lagrangian function

Let us compute

∇L(x,λ1, . . . ,λp)₌







∂f

∂x1(x)−Pp i=1λi^∂g_i

∂x1(x) ...

∂f

∂x_n(x)₋^P^p_i

=1λi^∂gi

∂x_n(x)

−(g1(x)₋c1) ...

−(g_p(x)₋c_p)





 .

We observe that

∇L(x0,λ⁽⁰⁾₁ , . . . ,λ⁽⁰⁾p )=0.

(8)

Working our way back

The conclusion is that, when trying to find the solution of (1), a good candidate isx0 such that there exists (_λ⁽⁰⁾_i )_1≤_i_≤_p with

∇L(x0,λ⁽⁰⁾₁ , . . . ,λ⁽⁰⁾p )=0.

The coefficients_λ⁽⁰⁾_i are called “Lagrange multipliers”.

(9)

Optimization under inequality constraints

Maximization of f

Suppose, we havef :_Rⁿ→R andg1, . . . ,gp:_Rⁿ→R. We are interested in

½ maxf(x)

under g_i(x)_≤c_i,1_≤i_≤p. (2) for somec₁, . . . , c_p.

(10)

Necessary condition

Suppose we have found a maximumx0. We then divide the indexes into

Ï for 1_≤i≤k,gi(x0)=ci (we say the constraint is binding)

Ï for k+1_≤i≤p,gi(x0)<ci (we say the constraint is not binding)

for somek in{0,1, . . .p}.

(11)

Necessary condition

Forv such that g_i⁰(x₀).v₌0 (1_≤i_≤k), we have f⁰(x0).v₌0

(becausex0 is a maximum of f restrained to

{x:g_i(x)₌c_i,1_≤i_≤k}). So there exist_λ⁽⁰⁾₁ , . . . ,_λ⁽⁰⁾_k such that

∇f(x₀)₌

k

X

i₌1

λ⁽⁰⁾_i ∇g_i(x₀). We set

λ⁽⁰⁾_k₊₁= · · · =λ⁽⁰⁾p =0.

(12)

Necessary condition

Let us now takeh_>0 (“small”). We have, for alli in {1,2, . . . ,k}, f(x0−h_∇g_i(x0))₌f(x0)₋h_〈∇f(x0),∇g_i(x0)_{〉 +}o(h),

and we should havef(x0−h_∇gi(x0))_≤f(x0). So

〈∇f(x0),∇gi(x0)〉 ≥0.

In the case where the_∇g_i(x₀) are orthogonal, the above implies λ⁽⁰⁾_i ≥0.

(13)

Lagrangian function

We define

L(x,λ1, . . . ,λp)₌f(x)₋

p

X

i=1

λi(gi(x)₋ci) (L:_Rⁿ×R^p→R). The gradient ofL is (the same as before)

∇L(x,λ1, . . . ,λp)=







∂f

∂x1(x)₋^P^p_i

=1λi^∂gi

∂x1(x) ...

∂f

∂xn(x)₋^P^p_i

=1λi^∂g_i

∂xn(x)

−(g₁(x)₋c₁) ...

−(gp(x)₋cp)





 .

(14)

Lagrangian function

We get that











∂L

∂x1(x0)₌0, . . . ,_∂^∂_x^L

n(x0)₌0, λ⁽⁰⁾_i (g_i(x_i)₋c_i)₌0,1_≤i_≤p, gi(x0)_≤ci,1_≤i_≤p,

〈∇f(x0),∇g_i(x0)_{〉 ≥}0,1_≤i_≤k.

(3)

(15)

Working our way back

The conclusion is that, when trying to find the solution of (2), a good candidate isx0 such that there exists (_λ⁽⁰⁾_i )_1≤_i_≤_p such that Equation (3) is statisfied.