• Aucun résultat trouvé

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

N/A
N/A
Protected

Academic year: 2021

Partager "Linear-Quadratic McKean-Vlasov Stochastic Differential Games"

Copied!
27
0
0

Texte intégral

(1)

HAL Id: hal-01941591

https://hal.archives-ouvertes.fr/hal-01941591

Preprint submitted on 1 Dec 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

Enzo Miller, Huyen Pham

To cite this version:

Enzo Miller, Huyen Pham. Linear-Quadratic McKean-Vlasov Stochastic Differential Games. 2018.

�hal-01941591�

(2)

Linear-Quadratic McKean-Vlasov Stochastic Differential Games

Enzo MILLER

Huyên PHAM

December 1, 2018

Abstract

We consider a multi-player stochastic differential game with linear McKean-Vlasov dynamics and quadratic cost functional depending on the variance and mean of the state and control actions of the players in open-loop form. Finite and infinite horizon problems with possibly some random coefficients as well as common noise are addressed. We propose a simple direct approach based on weak martingale optimality principle together with a fixed point argument in the space of controls for solving this game problem. The Nash equilibria are characterized in terms of systems of Riccati ordinary differential equations and linear mean-field backward stochastic differential equations: existence and uniqueness conditions are provided for such systems. Finally, we illustrate our results on a toy example.

MSC Classification: 49N10, 49L20, 91A13.

Key words: Mean-field SDEs, stochastic differential game, linear-quadratic, open-loop controls, Nash equilibria, weak martingale optimality principle.

1 Introduction

1.1 General introduction-Motivation

The study of large population of interacting individuals (agents, computers, firms) is a central issue in many fields of science, and finds numerous relevant applications in economics/finance (systemic risk with financial entities strongly interconnected), sociology (regulation of a crowd motion, herding behavior, social networks), physics, biology, or electrical engineering (telecommunication). Ratio- nality in the behavior of the population is a natural requirement, especially in social sciences, and is addressed by including individual decisions, where each individual optimizes some criterion, e.g. an investor maximizes her/his wealth, a firm chooses how much to produce outputs (goods, electricity, etc) or post advertising for a large population. The criterion and optimal decision of each individual depend on the others and affect the whole group, and one is then typically looking for an equilibrium among the population where the dynamics of the system evolves endogenously as a consequence of the optimal choices made by each individual. When the number of indistinguishable agents in the population tend to infinity, and by considering cooperation between the agents, we are reduced in the asymptotic formulation to a McKean-Vlasov (McKV) control problem where the dynamics and

This work is supported by FiME (Finance for Energy Market Research Centre) and the “Finance et Développement Durable - Approches Quantitatives” EDF - CACIB Chair.

LPSM, University Paris Diderot,enzo.miller at polytechnique.edu

LPSM, University Paris Diderot and CREST-ENSAE,pham at lpsm.paris

(3)

the cost functional depend upon the law of the stochastic process. This corresponds to a Pareto- optimum where a social planner/influencer decides of the strategies for each individual. The theory of McKV control problems, also called mean-field type control, has generated recent advances in the literature, either by the maximum principle [5], or the dynamic programming approach [14], see also the recent books [3] and [6], and the references therein, and linear quadratic (LQ) models provide an important class of solvable applications studied in many papers, see, e.g., [15], [11], [10], [2].

In this paper, we consider multi-player stochastic differential games for McKean-Vlasov dyna- mics. This corresponds and is motivated by the competitive interaction of multi-population with a large number of indistinguishable agents. In this context, we are then looking for a Nash equilibrium among the multi-class of populations. Such problem, sometimes refereed to as mean-field-type game, allows to incorporate competition and heterogeneity in the population, and is a natural extension of McKean-Vlasov (or mean-field-type) control by including multiple decision makers. It finds natural applications in engineering, power systems, social sciences and cybersecurity, and has attracted recent attention in the literature, see, e.g., [1], [7], [8], [4]. We focus more specifically on the case of linear McKean-Vlasov dynamics and quadratic cost functional for each player (social planner).

Linear Quadratic McKean-Vlasov stochastic differential game has been studied in [9] for a one- dimensional state process, and by restricting to closed-loop control. Here, we consider both finite and infinite horizon problems in a multi-dimensional framework, with random coefficients for the affine terms of the McKean-Vlasov dynamics and random coefficients for the linear terms of the cost functional. Moreover, controls of each player are in open-loop form. Our main contribution is to provide a simple and direct approach based on weak martingale optimality principle developed in [2]

for McKean-Vlasov control problem, and that we extend to the stochastic differential game, together with a fixed point argument in the space of open-loop controls, for finding a Nash equilibrium.

The key point is to find a suitable ansatz for determining the fixed point corresponding to the Nash equilibria that we characterize explicitly in terms of systems of Riccati ordinary differential equations and linear mean-field backward stochastic differential equations: existence and uniqueness conditions are provided for such systems.

The rest of this paper is organized as follows. We continue Section 1 by formulating the Nash equilibrium problem in the linear quadratic McKean-Vlasov finite horizon framework, and by giving some notations and assumptions. Section 2 presents the verification lemma based on weak sub- martingale optimality principle for finding a Nash equilibrium, and details each step of the method to compute a Nash equilibrium. We give some extensions in Section 3 to the case of infinite horizon and common noise. Finally, we illustrate our results in Section 4 on some toy example.

1.2 Problem formulation

Let T > 0 be a finite given horizon. Let (Ω, F , F, P) be a fixed filtered probability space where F = ( F

t

)

t∈[0,T]

is the natural filtration of a real Brownian motion W = (W

t

)

t∈[0,T]

. In this section, for simplicity, we deal with the case of a single real-valued Brownian motion, and the case of multiple Brownian motions will be addressed later in Section 3. We consider a multi-player game with n players, and define the set of admissible controls for each player i ∈ J 1, n K as:

A

i

=

α

i

: Ω × [0, T ] → R

di

s.t. α

i

is F-adapted and Z

T

0

e

−ρt

E[|α

i,t

|

2

]dt < ∞

,

where ρ is a nonnegative constant discount factor. We denote by A = A

1

× ... × A

n

, and for any

α = (α

1

, ..., α

n

) ∈ A , i ∈ J 1, n K , we set α

−i

= (α

1

, . . . , α

i−1

, α

i+1

, . . . , α

n

) ∈ A

−i

= A

1

× . . . × A

i−1

×

A

i+1

× . . . × A

n

.

(4)

Given a square integrable measurable random variable X

0

and control α = (α

1

, ..., α

n

) ∈ A , we consider the controlled linear mean-field stochastic differential equation in R

d

:

( dX

t

= b(t, X

tα

, E [X

tα

] , α

t

, E

t

])dt + σ(t, X

tα

, E [X

tα

] , α

t

, E

t

])dW

t

, 0 ≤ t ≤ T,

X

0α

= X

0

, (1)

where for t ∈ [0, T ], x, x ∈ R

d

, a

i

, a

i

R

di

:

 

 

 

 

b(t, x, x, α, α) = β

t

+ b

x,t

x + ˜ b

x,t

x + P

n

i=1

b

i,t

α

i

+ ˜ b

i,t

α

i

= β

t

+ b

x,t

x + ˜ b

x,t

x + B

t

α + ˜ B

t

α σ(t, x, x, α, α) = γ

t

+ σ

x,t

x + ˜ σ

x,t

x + P

n

i=1

σ

i,t

α

i

+ ˜ σ

i,t

α

i

= γ

t

+ σ

x,t

x + ˜ σ

x,t

x + Σ

t

α + ˜ Σ

t

α.

(2)

Here all the coefficients are deterministic matrix-valued processes except β and σ which are vector- valued F-progressively measurable processes.

The goal of each player i ∈ J 1, n K during the game is to minimize her cost functional over α

i

∈ A

i

, given the actions α

−i

of the other players:

J

i

i

, α

−i

) = E h Z

T 0

e

−ρt

f

i

(t, X

tα

, E [X

tα

] , α

t

, E

t

])dt + g

i

(X

Tα

, E[X

Tα

]) i , V

i

−i

) = inf

αi∈Ai

J

i

i

, α

−i

),

where for each t ∈ [0, T ], x, x ∈ R

d

, a

i

, a

i

R

di

, we have set the running cost and terminal cost for each player:

 

 

 

 

 

 

 

 

 

 

f

i

(t, x, x, a, a) = (x − x)

|

Q

it

(x − x) + x

|

[Q

it

+ ˜ Q

it

]x + P

n

k=1

a

|k

I

k,ti

(x − x) + a

|k

(I

k,ti

+ ˜ I

k,ti

)x + P

n

k=1

(a

k

− a

k

)

|

N

k,ti

(a

k

− a

k

) + a

k

(N

k,ti

+ ˜ N

k,ti

)a

k

+ P

0≤k6=l≤n

(a

k

− a

k

)

|

G

ik,l,t

(a

l

− a

l

) + a

|k

(G

ik,l,t

+ ˜ G

ik,l,t

)a

l

+2[L

iTx,t

x + P

n

k=1

L

i|k,t

a

k

]

g

i

(x, x) = (x − x)

|

P

i

(x − x) + x(P

i

+ ˜ P

i

)x + 2r

i|

x.

(3)

Here all the coefficients are deterministic matrix-valued processes, except L

ix

, L

ik

, r

i

which are vector- valued F-progressively measurable processes, and | denotes the transpose of a vector or matrix.

We say that α

= (α

1

, ..., α

n

) ∈ A is a Nash equilibrium if for any i ∈ J 1, n K , J

i

) ≤ J

i

i

, α

∗,−i

), ∀α

i

∈ A

i

, i.e. , J

i

) = V

i

∗,−i

).

As it is well-known, the search for a Nash equilibrium can be formulated as a fixed point problem as follows: first, each player i has to compute its best response given the controls of the other players:

α

?i

= BR

i

−i

), where BR

i

is the best response function defined (when it exists) as:

BR

i

: A

−i

→ A

i

α

−i

7→ argmin

α∈Ai

J

i

(α, α

−i

).

Then, in order to ensure that (α

?1

, ..., α

?n

) is a Nash equilibrium, we have to check that this candidate

verifies the fixed point equation: (α

?1

, ..., α

?i

) = BR(α

?1

, ..., α

?i

) where BR := (BR

1

, ...BR

n

).

(5)

The main goal of this paper is to state a general martingale optimality principle for the search of Nash equilibria and to apply it to the linear quadratic case. We first obtain best response functions (or optimal control of each agent conditioned to the control of the others) of each player i of the following form:

α

i,t

= −(S

i,ti

)

−1

U

i,ti

(X

t

E[X

t

]) − (S

i,ti

)

−1

i,ti

− ξ

ii,t

) − ( ˆ S

i,ti

)

−1

(V

i,ti

E[X

t

] + O

ii,t

)

where the coefficients in the r.h.s., defined in (5) and (6), depend on the actions α

−i

of the other players. We then proceed to a fixed point search for best response function in order to exhibit a Nash equilibrium that is described in Theorem 2.3.

1.3 Notations and Assumptions

Given a normed space (K, |.|), and for T ∈ R

?+

, we set:

L

([0, T ], K) = (

φ : [0, T ] → K s.t. φ is measurable and sup

t∈[0,T]

t

| < ∞ )

L

2

([0, T ], K) =

φ : [0, T ] → K s.t. φ is measurable and Z

T

0

e

−ρu

t

|

2

du < ∞

L

2FT

(K) =

φ : Ω → K s.t. φ is F

T

-measurable and E[|φ|

2

] < ∞ S

2F

(Ω × [0, T ], K) =

(

φ : Ω × [0, T ] → K s.t. φ is F-adapted and E[ sup

t∈[0,T]

t

|

2

] < ∞ )

L

2F

(Ω × [0, T ], K) =

φ : Ω × [0, T ] → K s.t. φ is F-adapted and Z

T

0

e

−ρu

E[|φ

u

|

2

]du < ∞

.

Note that when we will tackle the infinite horizon case we will set T = ∞. To make the notations less cluttered, we sometimes denote X = X

α

when there is no ambiguity. If C and C ˜ are coefficients of our model, either in the dynamics or in a cost function, we note: C ˆ = C + ˜ C. Given a random variable Z with a first moment, we denote by Z = E[Z]. For M ∈ R

n×n

and X ∈ R

n

, we denote by M.X

⊗2

= X

|

M X ∈ R. We denote by S

d

the set of symmetric d × d matrices and by S

d+

the subset of non-negative symmetric matrices.

Let us now detail here the assumptions on the coefficients.

(H1) The coefficients in the dynamics (2) satisfy:

a) β, γ ∈ L

2F

(Ω × [0, T ], R

d

)

b) b

x

, ˜ b

x

, σ

x

, σ ˜

x

∈ L

([0, T ], R

d×d

); b

i

, ˜ b

i

, σ

i

, σ ˜

i

∈ L

([0, T ], R

d×di

) (H2) The coefficients of the cost functional (3) satisfy:

a) Q

i

, Q ˜

i

∈ L

([0, T ], S

d+

), P

i

, P ˜

i

S

d

, N

ki

, N ˜

ki

∈ L

([0, T ], S

d+k

), I

ki

, I ˜

ki

∈ L

([0, T ], R

dk×d

) b) L

ix

∈ L

2F

(Ω × [0, T ], R

d

), L

ik

∈ L

2F

(Ω × [0, T ], R

dk

), r

i

∈ L

2F

T

(R

d

) c) ∃δ > 0 ∀t ∈ [0, T ]:

N

i,ti

≥ δI

dk

P

i

≥ 0 Q

it

− I

i,ti|

(N

i,ti

)

−1

I

i,ti

≥ 0 d) ∃δ > 0 ∀t ∈ [0, T ]:

N ˆ

i,ti

≥ δI

dk

P ˆ

i

≥ 0 Q ˆ

it

− I ˆ

i,ti|

( ˆ N

i,ti

)

−1

I ˆ

i,ti

≥ 0.

Under the above conditions, we easily derive some standard estimates on the mean-field SDE:

(6)

- By (H1) there exists a unique strong solution to the mean-field SDE (1), which verifies:

E h

sup

t∈[0,T]

|X

tα

|

2

i

≤ C

α

(1 + E(|X

0

|

2

)) < ∞ (4) where C

α

is a constant which depending on α only through R

T

0

e

−ρt

E[|α

t

|

2

]dt.

- By (H2) and (4) we have:

J

i

(α) ∈ R for each α ∈ A ,

which means that the optimisation problem is well defined for each player.

2 A Weak submartingale optimality principle to compute a Nash- equilibrium

2.1 A verification Lemma

We first present the lemma on which the method is based.

Lemma 2.1 (Weak submartingale optimality principle). Suppose there exists a couple (α

?

, ( W

.,i

)

i∈

J1,nK

), where α

?

∈ A and W

.,i

= { W

α,it

, t ∈ [0, T ], α ∈ A } is a family of adapted processes indexed by A for each i ∈ J 1, n K , such that:

(i) For every α ∈ A , E[ W

α,i0

] is independent of the control α

i

∈ A

i

; (ii) For every α ∈ A , E[ W

α,iT

] = E[g

i

(X

Tα

, P

Xα

T

)];

(iii) For every α ∈ A , the map t ∈ [0, T ] 7→ E[ S

α,it

], with S

α,it

= e

−ρt

W

α,it

+ R

t

0

e

−ρu

f

i

(u, X

uα

, P

Xuα

, α

u

, P

αu

)du is well defined and non-decreasing;

(iv) The map t 7→ E[S

αt?,i

] is constant for every t ∈ [0, T ];

Then α

?

is a Nash equilibrium and J

i

?

) = E[ W

α0?,i

]. Moreover, any other Nash-equilibrium α ˜ such that E[ W

α,i0˜

] = E[ W

α0?,i

] and J

i

( ˜ α) = J

i

?

) for any i ∈ J 1, n K satisfies the condition (iv).

Proof. Let i ∈ J 1, n K and α

i

∈ A

i

. From (ii) we have immediately J

i

(α) = E [ S

αT

] for any α ∈ A . We then have:

E[ W

0 i,α?,−i),i

] = E[ S

0 i?,−i),i

]

E[ S

T i?−i),i

] = J

i

i

, α

?,−i

).

Moreover for α

i

= α

?i

we have:

E[ W

0 ?i?,−i),i

] = E[ S

0 ?i?,−i),i

]

= E[ S

T i?,−i),i

] = J

i

?i

, α

?,−i

),

which proves that α

?

is a Nash equilibrium and J

i

?

) = E[ W

α0?,i

]. Finally, let us suppose that α ˜ ∈ A is another Nash equilibrium such that E[W

α,i0˜

] = E[W

α0?,i

] and J

i

( ˜ α) = J

i

?

) for any i ∈ J 1, n K . Then, for i ∈ J 1, n K we have:

E[ S

α,i0˜

] = E[ W

α,i0˜

] = E[ W

α0?,i

] = E[ S

αT?,i

] = J

i

?

) = J

i

( ˜ α) = E[ S

α,iT˜

].

Since t 7→ E[ S

α,it˜

] is nondecreasing for every i ∈ J 1, n K , this implies that the map is actually constant

and (iv) is verified.

(7)

2.2 The method and the solution

Let us now apply the optimality principle in Lemma 2.1 in order to find a Nash equilibrium. In the linear-quadratic case the laws of the state and the controls intervene only through their expectations.

Thus we will use a simplified optimality principle where P is simply replaced by E in conditions (ii) and (iii) of Lemma 2.1. The general procedure is the following:

Step 1. We guess a candidate for W

α,i

. To do so we suppose that W

α,it

= w

ti

(X

tα

, E [X

tα

]) for some parametric adapted random field {w

it

(x, x), t ∈ [0, T ], x, x ∈ R

d

} of the form w

ti

(x, x) = K

ti

.(x − x)

⊗2

+ Λ

it

.x

⊗2

+ 2Y

ti|

x + R

it

.

Step 2. We set S

α,it

= e

−ρt

w

ti

(X

tα

, E [X

tα

]) + R

t

0

e

−ρu

f

i

(u, X

uα

, E [X

uα

] , α

u

, E

u

])du for i ∈ J 1, n K and α ∈ A .We then compute

dtd

E

S

α,it

= e

−ρt

E D

tα,i

(with Itô’s formula) where the drift D

α,i

takes the form:

E D

α,it

= E h

−ρw

ti

(X

tα

, E [X

tα

]) + d dt E

w

it

(X

tα

, E [X

tα

])

+ f

i

(t, X

tα

, E [X

tα

] , α

t

, E

t

]) i

.

Step 3. We then constrain the coefficients of the random field so that the conditions of Lemma 2.1 are satisfied. This leads to a system of backward ordinary and stochastic differential equations for the coefficients of w

i

.

Step 4. At time t, given the state and the controls of the other players, we seek the action α

i

cancelling the drift. We thus obtain the best response function of each player.

Step 5. We compute the fixed point of the best response functions in order to find an open loop Nash equilibrium t 7→ α

?t

.

Step 6. We check the validity of our computations.

2.2.1 Step 1: guess the random fields The process t 7→ E

w

it

(X

tα

, E [X

tα

])

is meant to be equal to E

g

i

(X

Tα

, E [X

Tα

])

at time T , where g(x, x) = P

i

.(x − x)

⊗2

+ (P

i

+ ˜ P

i

).x

⊗2

+ r

i|

x with (P, P , r ˜

i

) ∈ (S

d

)

2

× L

2F

T

(R

d

). It is then natural to search for a field w

i

of the form w

ti

(x, x) = K

ti

.(x − x)

⊗2

+ Λ

it

.x

⊗2

+ 2Y

ti|

x + R

it

with the processes (K

i

, Λ

i

, Y

i

, R

i

) in (L

([0, T ], S

d+

)

2

× S

2F

(Ω × [0, T ], R

d

) × L

([0, T ], R) and solution to:

 

 

 

 

dK

ti

= ˙ K

ti

dt, K

Ti

= P

i

it

= ˙ Λ

it

dt, Λ

iT

= P

i

+ ˜ P

i

dY

ti

= ˙ Y

ti

dt + Z

ti

dW

t

, 0 ≤ t ≤ T, Y

Ti

= r

i

dR

it

= ˙ R

it

dt, R

iT

= 0,

where ( ˙ K

i

, Λ ˙

i

, R ˙

i

) are deterministic processes valued in S

d

×S

d

×R and ( ˙ Y

i

, Z

i

) are adapted processes valued in R

d

.

2.2.2 Step 2: derive their drifts For i ∈ J 1, n K , t ∈ [0, T ] and α ∈ A , we set:

S

α,it

:= e

−ρt

w

ti

(X

t

, E [X

t

]) + Z

t

0

e

−ρu

f

i

(u, X

uα

, E [X

uα

] , α

u

, E

u

])du

(8)

and then compute the drift of the deterministic function t 7→ E[ S

α,it

]:

dE[S

α,it

]

dt = e

−ρt

E[D

tα,i

]

= e

−ρt

E[(X

t

− X

t

)

|

[ ˙ K

ti

+ Φ

it

](X

t

− X

t

) + X

t|

( ˙ Λ

it

+ Ψ

it

)X

t

+ 2[ ˙ Y

ti

+ ∆

it

]

|

X

t

+ ˙ R

it

− ρR

it

+ Γ

it

+ χ

it

i,t

)], where we have defined:

χ

it

i,t

) := (α

i,t

− α

i,t

)

|

S

i,ti

i

− α

i,t

) + α

|i,t

S ˆ

i,ti

α

i,t

+ 2[U

i,ti

(X

t

− X

t

) + V

i,ti

X

t

+ O

ii,t

+ ξ

i,ti

− ξ

ii,t

]

|

α

i,t

with the following coefficients:

 

 

 

 

 

 

 

 

 

 

 

 

Φ

it

= Q

it

+ σ

|x,t

K

ti

σ

x,t

+ K

ti

b

x,t

+ b

|x,t

K

ti

− ρK

ti

Ψ

it

= ˆ Q

it

+ ˆ σ

|x,t

K

ti

σ ˆ

x,t

+ Λ

it

ˆ b

x,t

+ ˆ b

|x,t

Λ

it

− ρΛ

it

it

= L

ix,t

+ b

|x,t

Y

ti

+ ˜ b

|x,t

Y

it

+ σ

x,t|

Z

ti

+ ˜ σ

x,t|

Z

it

+ Λ

it

β

t

|x,t

K

ti

γ

t

+ ˜ σ

x,t|

K

ti

γ

t

+ K

ti

t

− β

t

) − ρY

ti

+ P

k6=i

U

k,ti|

k,t

− α

k,t

) + V

k,ti|

α

k,t

Γ

it

= γ

|t

K

ti

γ

t

+ 2β

t|

Y

ti

+ 2γ

t|

Z

ti

+ P

k6=i

k,t

− α

k,t

)

|

S

k,ti

k,t

− α

k,t

) + α

|k,t

S ˆ

k,ti

α

k,t

+ 2[O

ik,t

+ ξ

k,ti

− ξ

ik,t

]

|

α

k,t

− ρR

it

, (5)

and 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S

k,ti

= N

k,ti

+ σ

k,t|

K

ti

σ

k,t

S ˆ

k,ti

= ˆ N

k,ti

+ ˆ σ

k,t|

K

ti

σ ˆ

k,t

U

k,ti

= I

k,ti

+ σ

k,t|

K

ti

σ

x,t

+ b

|k,t

K

ti

V

k,ti

= ˆ I

k,ti

+ ˆ σ

k,t|

K

ti

σ ˆ

x,t

+ ˆ b

|k,t

Λ

it

O

k,ti

= L

ik,t

+ ˆ b

|k,t

Y

it

+ ˆ σ

k,t|

Z

it

+ ˆ σ

k,t|

K

ti

γ

t

+

12

P

k6=i

( ˆ J

i,k,ti

+ ˆ J

k,i,ti|

k,t

J

k,l,ti

= G

ik,l,t

+ σ

k,t|

K

ti

σ

l,t

J ˆ

k,l,ti

= ˆ G

ik,l,t

+ ˆ σ

k,t|

K

ti

σ ˆ

l,t

ξ

k,ti

= L

ik,t

+ b

|k,t

Y

ti

+ σ

k,t|

Z

ti

+ σ

|k,t

K

ti

γ

t

+

12

P

k6=i

(J

i,k,ti

+ J

k,i,ti|

k,t

.

(6)

2.2.3 Step 3: constrain their coefficients

Now that we have computed the drift, we need to constrain the coefficients so that S

α,i

satisfies the condition of Lemma 2.1. Let us assume for the moment that S

i,ti

and S ˆ

i,ti

are positive definite matrices (this will be ensured by the positive definiteness of K). That implies that there exists an invertible matrix θ

ti

such that θ

it

S

i,ti

θ

ti|

= ˆ S

i,ti

for all t ∈ [0, T ]. We can now rewrite the drift as: "a square in α

i

" + "other terms not depending in α

i

". Since we can form the following square:

E[χ

it

i,t

)] = E[(α

i,t

− α

i,t

+ θ

ti|

α

i,t

− η

ti

)S

i,ti

i,t

− α

i,t

+ θ

i|t

α

i,t

− η

it

) − ζ

ti

]

(9)

with:

 

 

 

 

 

 

 

 

 

 

η

ti

= a

i,0t

(X

t

, X

t

) + θ

ti|

a

i,1t

(X

t

) a

i,0t

(x, x) = − S

i,ti

−1

U

i,ti

(x − x) − S

i,ti

−1

i,ti

− ξ

ii,t

) a

i,1t

(x) = − S ˆ

i,ti

−1

(V

i,ti

x + O

ii,t

) ζ

ti

= (X

t

− X

t

)

|

U

i,ti|

S

ti

−1

U

ti

(X

t

− X

t

) + X

t

V

i,ti|

S ˆ

ti

−1

V

i,ti

X

t

+2(U

i,ti|

S

ti

−1

i,ti

− ξ

ii,t

) + V

i,ti

S ˆ

i,ti

−1

O

i,ti

)X

t

+(ξ

i,ti

− ξ

ii,t

)

|

S

i,ti

−1

i,ti

− ξ

ii,t

) + O

i|i,t

S ˆ

i,ti

−1

O

ii,t

, we can then rewrite the drift in the following form:

E[D

α,it

] = E[(X

t

− X

t

)

|

[ ˙ K

ti

+ Φ

i0t

](X

t

− X

t

) + X

t|

( ˙ Λ

it

+ Ψ

i0t

)X

t

+ 2[ ˙ Y

ti

+ ∆

i0t

]

|

X

t

+ ˙ R

it

+ Γ

i0t

+ (α

i,t

− α

i,t

+ θ

i|t

α

i,t

− η

it

)S

i,ti

i,t

− α

i,t

+ θ

ti|

α

i,t

− η

ti

)], where

 

 

 

 

Φ

i0t

= Φ

it

− U

i,ti|

S

i,ti

−1

U

i,ti

Ψ

i0t

= Ψ

it

− V

i,ti|

S ˆ

i,ti

−1

V

i,ti

i0t

= ∆

it

− U

i,ti|

S

i,ti

−1

i,ti

− ξ

ii,t

) − V

i,ti|

S ˆ

ti

−1

O

i,ti

Γ

i0t

= Γ

it

− (ξ

i,ti

− ξ

ii,t

)

|

S

i,ti

−1

ii,t

− ξ

it

) − O

i|i,t

S ˆ

i,ti

−1

O

ii,t

.

(7)

We can finally constrain the coefficients. By choosing the coefficients K

i

, Γ

i

, Y

i

and R

i

so that only the square remains, the drift for each player i ∈ J 1, n K can be rewritten as a square only (in the next step we will verify that we can indeed choose such coefficients). More precisely we set K

i

, Γ

i

, Y

i

and R

i

as the solution of:

 

 

 

 

dK

ti

= −Φ

i0t

dt K

Ti

= P

i

it

= −Ψ

i0t

dt Λ

iT

= P

i

+ ˜ P

i

dY

ti

= −∆

i0t

dt + Z

ti

dW

t

Y

Ti

= r

i

dR

it

= −Γ

i0t

dt R

iT

= 0,

(8)

and stress the fact that Y

i

, Z

i

, R

i

depend on α

−i

, which appears in the coefficients ∆

i0

, and Γ

i0

. With such coefficients the drift takes now the form:

E[D

α,it

] = E[(α

i,t

− α

i,t

+ θ

i|t

α

i,t

− η

it

)S

i,ti

i,t

− α

i,t

+ θ

ti|

α

i,t

− η

it

)]

= E[(α

i,t

− α

i,t

− a

i,1t

+ θ

ti|

i,t

− a

i,0t

))S

i,ti

i,t

− α

i,t

− a

i,1t

+ θ

ti|

i,t

− a

i,0t

))]

and thus satisfies the nonnegativity constraint: E[D

tα,i

] ≥ 0, for all t ∈ [0, T ], i ∈ J 1, n K , and α ∈ A . 2.2.4 Step 4: find the best response functions

Proposition 2.2. Assume that for all i ∈ J 1, n K , (K

i

, Λ

i

, Y

i

, Z

i

, R

i

) is a solution of (8) given α

−i

∈ A

−i

. Then the set of processes

α

i,t

= a

i,0t

(X

t

, E[X

t

]) + a

i,1t

(E[X

t

])

= − S

i,ti

−1

U

i,ti

(X

t

E[X

t

]) − S

i,ti

−1

ii,t

− ξ

ii,t

) − S ˆ

i,ti

−1

(V

i,ti

E[X

t

] + O

ii,t

)

(9)

(10)

(depending on α

−i

) where X is the state process with the feedback controls α = (α

1

, ..., α

n

), are best-response functions, i.e., J

i

i

, α

−i

) = V

i

−i

) for all i ∈ J 1, n K . Moreover we have

V

i

−i

) = E[W

i,α0

]

= E[K

0i

.(X

0

− X

0

)

⊗2

+ Λ

i0

.X

⊗20

+ 2Y

0i|

X

0

+ R

i0

].

Proof. We check that the assumptions of Lemma 2.1 are satisfied. Since W

α,i

is of the form W

α,it

= w

it

(X

tα

, E[X

tα

]), condition (i) is verified. The condition (ii) is satisfied thanks to the termi- nal conditions imposed on the system (8). Since (K

i

, Λ

i

, Y

i

, Z

i

, R

i

) is solution to (8), the drift of t 7→ E[ S

α,i

] is positive for all i ∈ J 1, n K and all α ∈ A , which implies condition (iii). Finally, for α ∈ A , we see that E[D

α,it

] ≡ 0 for t ∈ [0, T ] and i ∈ J 1, n K if and only if:

α

i,t

− α

i,t

− a

i,1t

(X

tα

, E[X

tα

]) + θ

it|

i,t

− a

i,0t

(E[X

tα

])) = 0 a.s. t ∈ [0, T ].

Since θ

it

is invertible, we get α

i,t

= a

i,0t

by taking the expectation in the above formula. Thus E[D

tα,i

] ≡ 0 for every i ∈ J 1, n K and t ∈ [0, T ] if and only if α

i,t

= α

i,t

+ a

i,1t

= a

i,1t

+ a

i,0t

for every i ∈ J 1, n K and t ∈ [0, T ]. For such controls for the players, the condition (iv) is satisfied. We now check that α

i

∈ A

i

for every i ∈ J 1, n K (i.e. it satisfies the square integrability condition). Since X is solution to a linear Mckean-Vlasov dynamics and satisfies the square integrability condition E[sup

0≤t≤T

|X

t

|

2

] < ∞, it implies that α

i

∈ L

2F

(Ω × [0, T ], R

di

) since S

ii

, U

ii

, S ˆ

ii

, V

ii

are bounded and (O

ii

, ξ

ii

) ∈ L

2

([0, T ], R

di

) × L

2

(Ω × [0, T ], R

di

). Therefore α

i

∈ A

i

for every i ∈ J 1, n K .

2.2.5 Step 5: search for a fixed point

We now find semi-explicit expressions for the optimal controls of each player. The issue here is the fact that the controls of the other players appear in the best response functions of each player through the vectors (Y

1

, Z

1

), ..., (Y

n

, Z

n

). To solve this fixed point problem, we first rewrite (9) and the backward equations followed by (Y, Z) = ((Y

1

, Z

1

), ..., (Y

n

, Z

n

)) in the following way (note that we omit the time dependence of the coefficients to make the notations less cluttered):

 

 

 

 

 

 

 

 

α

?t

− α

?t

= S

x

(X

t

− X

t

) + S

y

(Y

t

− Y

t

) + S

z

(Z

t

− Z

t

) + H − H α

?t

= S ˆ

x

X

t

+ S ˆ

y

Y

t

+ S ˆ

z

Z

t

+ H ˆ

dY

t

= P

y

(Y

t

− Y

t

) + P

z

(Z

t

− Z

t

) + P

α

t

− α

t

) + F − F + P ˆ

y

Y

t

+ ˆ P

z

Z

t

+ ˆ P

α

α

t

+ ˆ F

dt +Z

t

dW

t

,

(10)

(11)

where we define

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S = (S

ii

)

−1

1

i=j

i,j∈J1,nK

S ˆ =

( ˆ S

ii

)

−1

1

i=j

i,j∈J1,nK

J =

12

(J

iji

+ J

jii

)1

i6=j

i,j∈J1,nK

J ˆ =

12

( ˆ J

iji

+ ˆ J

jii

)1

i6=j

i,j∈J1,nK

J = − (I

d

+ SJ)

−1

S J ˆ = −

I

d

+ S ˆ J ˆ

−1

S ˆ S

x

= J U

ii

i∈J1,nK

S ˆ

x

= J ˆ V

ii

i∈J1,nK

S

y

= J (1

i=j

b

|i

)

i,j∈

J1,nK

S ˆ

y

= J ˆ

1

i=j

ˆ b

|i

i,j∈J1,nK

S

z

= J (σ

|i

)

i∈

J1,nK

S ˆ

z

= J ˆ (ˆ σ

|i

)

i∈

J1,nK

H = J L

ii

+ σ

i|

K

i

γ

i∈J1,nK

H ˆ = J ˆ L

ii

+ ˆ σ

i|

K

i

γ

i∈J1,nK

 

 

 

 

 

 

 

 

 

 

 

 

 

 

P

y

= (1

i=j

(U

ii

(S

ii

)

−1

b

|i

− b

|x

+ ρ))

i,j∈

J1,nK

P ˆ

y

= (1

i=j

(V

ii

( ˆ S

ii

)

−1

ˆ b

|i

− ˆ b

|x

+ ρ))

i,j∈J1,nK

P

z

= (1

i=j

(U

ii

(S

ii

)

−1

σ

|i

− σ

x|

))

i,j∈

J1,nK

P ˆ

z

= (1

i=j

(V

ii

(S

ii

)

−1

σ ˆ

|i

− σ ˆ

x|

))

i,j∈

J1,nK

P

α

= −(1

i6=j

(U

ji

+ U

ii

(S

ii

)

−1

(J

iji

+ J

jii|

)))

i,j∈J1,nK

P ˆ

α

= −(1

i6=j

(V

ji

+ V

ii

( ˆ S

ii

)

−1

( ˆ J

iji

+ ˆ J

jii|

)))

i,j∈

J1,nK

F = (K

i

β + σ

|x

K

i

γ)

i∈

J1,nK

F ˆ = (U

ii

(S

ii

)

−1

(L

i

+ σ

|i

K

i

γ) − L

x

− σ

|x

K

i

γ − K

i

β )

i∈

J1,nK

.

(11) Now, the strategy is to propose an ansatz for t ∈ [0, T ] 7→ Y

t

in the form:

Y

t

= π

t

(X

t

− X

t

) + ˆ π

t

X

t

+ η

t

(12) where (π, π, η) ˆ ∈ L

([0, T ], R

nd×d

) × L

([0, T ], R

nd×d

) × S

2F

(Ω × [0, T ], R

nd

) satisfy:

 

 

t

= ψ

t

dt + φ

t

dW

t

, η

T

= r = (r

i

)

i∈

J1,nK

t

= ˙ π

t

dt, π

T

= 0 dˆ π

t

= ˙ˆ π

t

dt, π ˆ

T

= 0.

By applying Itô’s formula to the ansatz we then obtain:

dY

t

= ˙ π

t

(X

t

− X

t

)dt + π

t

d(X

t

− X

t

) + ˙ˆ π

t

X

t

dt + ˆ π

t

dX

t

+ ψ

t

dt + φ

t

dW

= dt

˙

π

t

(X

t

− X

t

) + ψ

t

− ψ

t

+ π

t

β − β + b

x

(X

t

− X

t

) + B(α

t

− α

t

) + dt

h π ˙ˆ

t

X

t

+ ψ

t

+ ˆ π

t

β + ˆ b

x

X

t

+ ˆ Bα

t

i

+ dW

t

h

φ

t

+ π

t

γ + σ

x

X

t

+ ˜ σ

x

X

t

+ Σα

t

+ ˜ Σα

t

i . By comparing the two Itô’s decompositions of Y , we get

 

 

 

 

 

 

P

y

(Y

t

− Y

t

) + P

z

(Z

t

− Z

t

) + P

α

t

− α

t

) + F − F = ˙ π

t

(X

t

− X

t

) + ψ

t

− ψ

t

t

β − β + b

x

(X

t

− X

t

) + B(α

t

− α

t

) P ˆ

y

Y

t

+ ˆ P

z

Z

t

+ ˆ P

α

α

t

+ ˆ F = h

˙ˆ

π

t

X

t

+ ψ

t

+ ˆ π

t

β + ˆ b

x

X

t

+ ˆ Bα

t

i

Z

t

=

h φ

t

+ π

t

γ + σ

x

X

t

+ ˜ σ

x

X

t

+ Σα

t

+ ˜ Σα

t

i

.

(13)

(12)

We now substitute the Y by its ansatz in the best response equation (10), and obtain the system:

 

 

(Id − S

z

πΣ)(α

?t

− α

?t

) = (S

x

+ S

y

π

t

+ S

z

π

t

σ

x

)(X

t

− X

t

)

+(H − H + S

y

t

− η

t

) + S

z

t

− φ

t

+ π

t

(γ − γ ))) (Id − S ˆ

z

π

t

Σ)α ˆ

?t

= ( ˆ S

x

+ ˆ S

y

π ˆ

t

+ ˆ S

z

π

t

σ ˆ

x

)X

t

+ ( ˆ H + ˆ S

y

η

t

+ ˆ S

z

t

+ π

t

.γ)).

(14)

To make the next computations slightly less painful we rewrite (14) as

 

 

 

 

 

 

 

 

 

 

 

 

α

?t

− α

?t

= A

x

(X

t

− X

t

) + R

t

− R

t

α

?t

= ˆ A

x

X

t

+ ˆ R

t

where

A

x

:= (Id − S

z

π

t

Σ)

−1

(S

x

+ S

y

π

t

+ S

z

π

t

σ

x

) A ˆ

x

:= (Id − S ˆ

z

π

t

Σ) ˆ

−1

( ˆ S

x

+ ˆ S

y

π

t

+ ˆ S

z

π

t

σ ˆ

x

) R

t

:= (Id − S

z

π

t

Σ)

−1

(H + S

y

η

t

+ S

z

t

+ π

t

γ)) R ˆ

t

:= (Id − S ˆ

z

π

t

Σ) ˆ

−1

( ˆ H + ˆ S

y

η + ˆ S

z

t

+ π

t

γ)).

(15)

By injecting (14) into (13) we have:

 

 

 

 

 

 

 

 

0 = [ ˙ π

t

+ π

t

b

x

− P

y

π

t

− P

z

π

t

x

+ ΣA

x

) − (P

α

− π

t

B)A

x

] (X

t

− X

t

) +ψ

t

− ψ

t

+ π

t

(β − β) − (P

α

− π

t

B)(R − R) − (F − F )

−P

z

t

− φ

t

+ π

t

(γ − γ + Σ(R − R))) − P

y

t

− η

t

) 0 =

h π ˙ˆ

t

+ ˆ π

t

ˆ b

x

− P ˆ

y

π ˆ − P ˆ

z

π

t

(ˆ σ

x

+ ˆ Σ ˆ A

x

) − ( ˆ P

α

− π ˆ

t

B ˆ ) ˆ A

x

i X

t

t

+ ˆ π

t

β − ( ˆ P

α

− π ˆ

t

B) ˆ ˆ R − F ˆ − P ˆ

z

t

+ π

t

(γ + ˆ Σ ˆ R)) − P ˆ

y

η.

Thus we constrain the coefficients (π, π, ψ, φ) ˆ of the ansatz of Y to satisfy:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

˙

π

t

= −πb

x

+ P

y

π

t

+ P

z

π

t

σ

x

+ (P

α

+ P

z

Σ)(Id − S

z

π

t

Σ)

−1

(S

x

+ S

y

π

t

+ S

z

π

t

σ

x

)

−π

t

B(Id − S

z

π

t

Σ)

−1

(S

x

+ S

y

π

t

+ S

z

π

t

σ

x

) π

T

= 0

˙ˆ

π

t

= −ˆ π

t

ˆ b

x

+ ˆ P

y

π ˆ

t

+ ˆ P

z

π

t

σ ˆ

x

+ ( ˆ P

α

+ ˆ P

z

Σ)(Id ˆ − S ˆ

z

π

t

Σ) ˆ

−1

( ˆ S

x

+ ˆ S

y

π ˆ

t

+ ˆ S

z

π

t

ˆ σ

x

)

−ˆ π

t

B(Id ˆ − S ˆ

z

π

t

Σ) ˆ

−1

( ˆ S

x

+ ˆ S

y

π ˆ

t

+ ˆ S

z

π

t

σ ˆ

x

) ˆ

π

T

= 0

t

= ψ

t

dt + φ

t

dW

η

T

= r

where:

ψ

t

− ψ

t

= −π

t

(β − β) + (P

α

− π

t

B)(R − R) + (F − F ) +P

z

t

− φ

t

+ π

t

(γ − γ + Σ(R − R))) + P

y

t

− η

t

)

ψ

t

= −ˆ π

t

β + ( ˆ P

α

− π ˆ

t

B) ˆ ˆ R + ˆ F + ˆ P

z

t

+ π

t

(γ + ˆ Σ ˆ R)) + ˆ P

y

η

t

R

t

:= (Id − S

z

πΣ)

−1

(H + S

y

η + S

z

(φ + πγ))

R ˆ

t

:= (Id − S ˆ

z

π Σ) ˆ

−1

( ˆ H + ˆ S

y

η + ˆ S

z

(φ + πγ)).

(16)

We now have a feedback form for (Y, Z) = ((Y

1

, Z

1

), ..., (Y

n

, Z

n

)). We can inject it in the best

response functions α

?

in order to obtain the optimal controls in feedback form. We then inject these

latter in the state equation in order to obtain an explicit expression of t 7→ X

t?

.

Références

Documents relatifs

/ La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur. For

Although no change was observed at the gene expression level in response to the OLIV diet in LXR+/+ mice, oleic acid-rich diet induced an LXR-dependent increase in SREBP-1c

Mouse mammary cancer cell line EO771 (CH3 BioSys- tems, Amherst, NY), human triple negative breast cancer cell line MDA-MB-231 (American Type Culture Col- lection (ATCC),

A completely hydronic approach to home heating does not provide this indoor air distribution, so supplementary ventilation schemes are needed for houses without forced-air heating,

Although the four dimensions and 16 items reported here provide a better theo- retical and empirical foundation for the measurement of PSM, the lack of metric and scalar

Si l'on veut retrouver le domaine de l'énonciation, il faut changer de perspective: non écarter l'énoncé, évidemment, le &#34;ce qui est dit&#34; de la philosophie du langage, mais

Of particular interest, post hoc Tukey analysis revealed that the RCTQ total score was higher in the depression with few subthreshold manic symptoms (YMRS score of 1 or 2)