HAL Id: hal-01941591
https://hal.archives-ouvertes.fr/hal-01941591
Preprint submitted on 1 Dec 2018
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Linear-Quadratic McKean-Vlasov Stochastic Differential Games
Enzo Miller, Huyen Pham
To cite this version:
Enzo Miller, Huyen Pham. Linear-Quadratic McKean-Vlasov Stochastic Differential Games. 2018.
�hal-01941591�
Linear-Quadratic McKean-Vlasov Stochastic Differential Games ∗
Enzo MILLER
†Huyên PHAM
‡December 1, 2018
Abstract
We consider a multi-player stochastic differential game with linear McKean-Vlasov dynamics and quadratic cost functional depending on the variance and mean of the state and control actions of the players in open-loop form. Finite and infinite horizon problems with possibly some random coefficients as well as common noise are addressed. We propose a simple direct approach based on weak martingale optimality principle together with a fixed point argument in the space of controls for solving this game problem. The Nash equilibria are characterized in terms of systems of Riccati ordinary differential equations and linear mean-field backward stochastic differential equations: existence and uniqueness conditions are provided for such systems. Finally, we illustrate our results on a toy example.
MSC Classification: 49N10, 49L20, 91A13.
Key words: Mean-field SDEs, stochastic differential game, linear-quadratic, open-loop controls, Nash equilibria, weak martingale optimality principle.
1 Introduction
1.1 General introduction-Motivation
The study of large population of interacting individuals (agents, computers, firms) is a central issue in many fields of science, and finds numerous relevant applications in economics/finance (systemic risk with financial entities strongly interconnected), sociology (regulation of a crowd motion, herding behavior, social networks), physics, biology, or electrical engineering (telecommunication). Ratio- nality in the behavior of the population is a natural requirement, especially in social sciences, and is addressed by including individual decisions, where each individual optimizes some criterion, e.g. an investor maximizes her/his wealth, a firm chooses how much to produce outputs (goods, electricity, etc) or post advertising for a large population. The criterion and optimal decision of each individual depend on the others and affect the whole group, and one is then typically looking for an equilibrium among the population where the dynamics of the system evolves endogenously as a consequence of the optimal choices made by each individual. When the number of indistinguishable agents in the population tend to infinity, and by considering cooperation between the agents, we are reduced in the asymptotic formulation to a McKean-Vlasov (McKV) control problem where the dynamics and
∗This work is supported by FiME (Finance for Energy Market Research Centre) and the “Finance et Développement Durable - Approches Quantitatives” EDF - CACIB Chair.
†LPSM, University Paris Diderot,enzo.miller at polytechnique.edu
‡LPSM, University Paris Diderot and CREST-ENSAE,pham at lpsm.paris
the cost functional depend upon the law of the stochastic process. This corresponds to a Pareto- optimum where a social planner/influencer decides of the strategies for each individual. The theory of McKV control problems, also called mean-field type control, has generated recent advances in the literature, either by the maximum principle [5], or the dynamic programming approach [14], see also the recent books [3] and [6], and the references therein, and linear quadratic (LQ) models provide an important class of solvable applications studied in many papers, see, e.g., [15], [11], [10], [2].
In this paper, we consider multi-player stochastic differential games for McKean-Vlasov dyna- mics. This corresponds and is motivated by the competitive interaction of multi-population with a large number of indistinguishable agents. In this context, we are then looking for a Nash equilibrium among the multi-class of populations. Such problem, sometimes refereed to as mean-field-type game, allows to incorporate competition and heterogeneity in the population, and is a natural extension of McKean-Vlasov (or mean-field-type) control by including multiple decision makers. It finds natural applications in engineering, power systems, social sciences and cybersecurity, and has attracted recent attention in the literature, see, e.g., [1], [7], [8], [4]. We focus more specifically on the case of linear McKean-Vlasov dynamics and quadratic cost functional for each player (social planner).
Linear Quadratic McKean-Vlasov stochastic differential game has been studied in [9] for a one- dimensional state process, and by restricting to closed-loop control. Here, we consider both finite and infinite horizon problems in a multi-dimensional framework, with random coefficients for the affine terms of the McKean-Vlasov dynamics and random coefficients for the linear terms of the cost functional. Moreover, controls of each player are in open-loop form. Our main contribution is to provide a simple and direct approach based on weak martingale optimality principle developed in [2]
for McKean-Vlasov control problem, and that we extend to the stochastic differential game, together with a fixed point argument in the space of open-loop controls, for finding a Nash equilibrium.
The key point is to find a suitable ansatz for determining the fixed point corresponding to the Nash equilibria that we characterize explicitly in terms of systems of Riccati ordinary differential equations and linear mean-field backward stochastic differential equations: existence and uniqueness conditions are provided for such systems.
The rest of this paper is organized as follows. We continue Section 1 by formulating the Nash equilibrium problem in the linear quadratic McKean-Vlasov finite horizon framework, and by giving some notations and assumptions. Section 2 presents the verification lemma based on weak sub- martingale optimality principle for finding a Nash equilibrium, and details each step of the method to compute a Nash equilibrium. We give some extensions in Section 3 to the case of infinite horizon and common noise. Finally, we illustrate our results in Section 4 on some toy example.
1.2 Problem formulation
Let T > 0 be a finite given horizon. Let (Ω, F , F, P) be a fixed filtered probability space where F = ( F
t)
t∈[0,T]is the natural filtration of a real Brownian motion W = (W
t)
t∈[0,T]. In this section, for simplicity, we deal with the case of a single real-valued Brownian motion, and the case of multiple Brownian motions will be addressed later in Section 3. We consider a multi-player game with n players, and define the set of admissible controls for each player i ∈ J 1, n K as:
A
i=
α
i: Ω × [0, T ] → R
dis.t. α
iis F-adapted and Z
T0
e
−ρtE[|α
i,t|
2]dt < ∞
,
where ρ is a nonnegative constant discount factor. We denote by A = A
1× ... × A
n, and for any
α = (α
1, ..., α
n) ∈ A , i ∈ J 1, n K , we set α
−i= (α
1, . . . , α
i−1, α
i+1, . . . , α
n) ∈ A
−i= A
1× . . . × A
i−1×
A
i+1× . . . × A
n.
Given a square integrable measurable random variable X
0and control α = (α
1, ..., α
n) ∈ A , we consider the controlled linear mean-field stochastic differential equation in R
d:
( dX
t= b(t, X
tα, E [X
tα] , α
t, E [α
t])dt + σ(t, X
tα, E [X
tα] , α
t, E [α
t])dW
t, 0 ≤ t ≤ T,
X
0α= X
0, (1)
where for t ∈ [0, T ], x, x ∈ R
d, a
i, a
i∈ R
di:
b(t, x, x, α, α) = β
t+ b
x,tx + ˜ b
x,tx + P
ni=1
b
i,tα
i+ ˜ b
i,tα
i= β
t+ b
x,tx + ˜ b
x,tx + B
tα + ˜ B
tα σ(t, x, x, α, α) = γ
t+ σ
x,tx + ˜ σ
x,tx + P
ni=1
σ
i,tα
i+ ˜ σ
i,tα
i= γ
t+ σ
x,tx + ˜ σ
x,tx + Σ
tα + ˜ Σ
tα.
(2)
Here all the coefficients are deterministic matrix-valued processes except β and σ which are vector- valued F-progressively measurable processes.
The goal of each player i ∈ J 1, n K during the game is to minimize her cost functional over α
i∈ A
i, given the actions α
−iof the other players:
J
i(α
i, α
−i) = E h Z
T 0e
−ρtf
i(t, X
tα, E [X
tα] , α
t, E [α
t])dt + g
i(X
Tα, E[X
Tα]) i , V
i(α
−i) = inf
αi∈Ai
J
i(α
i, α
−i),
where for each t ∈ [0, T ], x, x ∈ R
d, a
i, a
i∈ R
di, we have set the running cost and terminal cost for each player:
f
i(t, x, x, a, a) = (x − x)
|Q
it(x − x) + x
|[Q
it+ ˜ Q
it]x + P
nk=1
a
|kI
k,ti(x − x) + a
|k(I
k,ti+ ˜ I
k,ti)x + P
nk=1
(a
k− a
k)
|N
k,ti(a
k− a
k) + a
k(N
k,ti+ ˜ N
k,ti)a
k+ P
0≤k6=l≤n
(a
k− a
k)
|G
ik,l,t(a
l− a
l) + a
|k(G
ik,l,t+ ˜ G
ik,l,t)a
l+2[L
iTx,tx + P
nk=1
L
i|k,ta
k]
g
i(x, x) = (x − x)
|P
i(x − x) + x(P
i+ ˜ P
i)x + 2r
i|x.
(3)
Here all the coefficients are deterministic matrix-valued processes, except L
ix, L
ik, r
iwhich are vector- valued F-progressively measurable processes, and | denotes the transpose of a vector or matrix.
We say that α
∗= (α
∗1, ..., α
∗n) ∈ A is a Nash equilibrium if for any i ∈ J 1, n K , J
i(α
∗) ≤ J
i(α
i, α
∗,−i), ∀α
i∈ A
i, i.e. , J
i(α
∗) = V
i(α
∗,−i).
As it is well-known, the search for a Nash equilibrium can be formulated as a fixed point problem as follows: first, each player i has to compute its best response given the controls of the other players:
α
?i= BR
i(α
−i), where BR
iis the best response function defined (when it exists) as:
BR
i: A
−i→ A
iα
−i7→ argmin
α∈Ai
J
i(α, α
−i).
Then, in order to ensure that (α
?1, ..., α
?n) is a Nash equilibrium, we have to check that this candidate
verifies the fixed point equation: (α
?1, ..., α
?i) = BR(α
?1, ..., α
?i) where BR := (BR
1, ...BR
n).
The main goal of this paper is to state a general martingale optimality principle for the search of Nash equilibria and to apply it to the linear quadratic case. We first obtain best response functions (or optimal control of each agent conditioned to the control of the others) of each player i of the following form:
α
i,t= −(S
i,ti)
−1U
i,ti(X
t− E[X
t]) − (S
i,ti)
−1(ξ
i,ti− ξ
ii,t) − ( ˆ S
i,ti)
−1(V
i,tiE[X
t] + O
ii,t)
where the coefficients in the r.h.s., defined in (5) and (6), depend on the actions α
−iof the other players. We then proceed to a fixed point search for best response function in order to exhibit a Nash equilibrium that is described in Theorem 2.3.
1.3 Notations and Assumptions
Given a normed space (K, |.|), and for T ∈ R
?+, we set:
L
∞([0, T ], K) = (
φ : [0, T ] → K s.t. φ is measurable and sup
t∈[0,T]
|φ
t| < ∞ )
L
2([0, T ], K) =
φ : [0, T ] → K s.t. φ is measurable and Z
T0
e
−ρu|φ
t|
2du < ∞
L
2FT(K) =
φ : Ω → K s.t. φ is F
T-measurable and E[|φ|
2] < ∞ S
2F(Ω × [0, T ], K) =
(
φ : Ω × [0, T ] → K s.t. φ is F-adapted and E[ sup
t∈[0,T]
|φ
t|
2] < ∞ )
L
2F(Ω × [0, T ], K) =
φ : Ω × [0, T ] → K s.t. φ is F-adapted and Z
T0
e
−ρuE[|φ
u|
2]du < ∞
.
Note that when we will tackle the infinite horizon case we will set T = ∞. To make the notations less cluttered, we sometimes denote X = X
αwhen there is no ambiguity. If C and C ˜ are coefficients of our model, either in the dynamics or in a cost function, we note: C ˆ = C + ˜ C. Given a random variable Z with a first moment, we denote by Z = E[Z]. For M ∈ R
n×nand X ∈ R
n, we denote by M.X
⊗2= X
|M X ∈ R. We denote by S
dthe set of symmetric d × d matrices and by S
d+the subset of non-negative symmetric matrices.
Let us now detail here the assumptions on the coefficients.
(H1) The coefficients in the dynamics (2) satisfy:
a) β, γ ∈ L
2F(Ω × [0, T ], R
d)
b) b
x, ˜ b
x, σ
x, σ ˜
x∈ L
∞([0, T ], R
d×d); b
i, ˜ b
i, σ
i, σ ˜
i∈ L
∞([0, T ], R
d×di) (H2) The coefficients of the cost functional (3) satisfy:
a) Q
i, Q ˜
i∈ L
∞([0, T ], S
d+), P
i, P ˜
i∈ S
d, N
ki, N ˜
ki∈ L
∞([0, T ], S
d+k), I
ki, I ˜
ki∈ L
∞([0, T ], R
dk×d) b) L
ix∈ L
2F(Ω × [0, T ], R
d), L
ik∈ L
2F(Ω × [0, T ], R
dk), r
i∈ L
2FT
(R
d) c) ∃δ > 0 ∀t ∈ [0, T ]:
N
i,ti≥ δI
dkP
i≥ 0 Q
it− I
i,ti|(N
i,ti)
−1I
i,ti≥ 0 d) ∃δ > 0 ∀t ∈ [0, T ]:
N ˆ
i,ti≥ δI
dkP ˆ
i≥ 0 Q ˆ
it− I ˆ
i,ti|( ˆ N
i,ti)
−1I ˆ
i,ti≥ 0.
Under the above conditions, we easily derive some standard estimates on the mean-field SDE:
- By (H1) there exists a unique strong solution to the mean-field SDE (1), which verifies:
E h
sup
t∈[0,T]
|X
tα|
2i
≤ C
α(1 + E(|X
0|
2)) < ∞ (4) where C
αis a constant which depending on α only through R
T0
e
−ρtE[|α
t|
2]dt.
- By (H2) and (4) we have:
J
i(α) ∈ R for each α ∈ A ,
which means that the optimisation problem is well defined for each player.
2 A Weak submartingale optimality principle to compute a Nash- equilibrium
2.1 A verification Lemma
We first present the lemma on which the method is based.
Lemma 2.1 (Weak submartingale optimality principle). Suppose there exists a couple (α
?, ( W
.,i)
i∈J1,nK
), where α
?∈ A and W
.,i= { W
α,it, t ∈ [0, T ], α ∈ A } is a family of adapted processes indexed by A for each i ∈ J 1, n K , such that:
(i) For every α ∈ A , E[ W
α,i0] is independent of the control α
i∈ A
i; (ii) For every α ∈ A , E[ W
α,iT] = E[g
i(X
Tα, P
XαT
)];
(iii) For every α ∈ A , the map t ∈ [0, T ] 7→ E[ S
α,it], with S
α,it= e
−ρtW
α,it+ R
t0
e
−ρuf
i(u, X
uα, P
Xuα, α
u, P
αu)du is well defined and non-decreasing;
(iv) The map t 7→ E[S
αt?,i] is constant for every t ∈ [0, T ];
Then α
?is a Nash equilibrium and J
i(α
?) = E[ W
α0?,i]. Moreover, any other Nash-equilibrium α ˜ such that E[ W
α,i0˜] = E[ W
α0?,i] and J
i( ˜ α) = J
i(α
?) for any i ∈ J 1, n K satisfies the condition (iv).
Proof. Let i ∈ J 1, n K and α
i∈ A
i. From (ii) we have immediately J
i(α) = E [ S
αT] for any α ∈ A . We then have:
E[ W
(α0 i,α?,−i),i] = E[ S
(α0 i,α?,−i),i]
≤ E[ S
(αT i,α?−i),i] = J
i(α
i, α
?,−i).
Moreover for α
i= α
?iwe have:
E[ W
(α0 ?i,α?,−i),i] = E[ S
(α0 ?i,α?,−i),i]
= E[ S
(αT i,α?,−i),i] = J
i(α
?i, α
?,−i),
which proves that α
?is a Nash equilibrium and J
i(α
?) = E[ W
α0?,i]. Finally, let us suppose that α ˜ ∈ A is another Nash equilibrium such that E[W
α,i0˜] = E[W
α0?,i] and J
i( ˜ α) = J
i(α
?) for any i ∈ J 1, n K . Then, for i ∈ J 1, n K we have:
E[ S
α,i0˜] = E[ W
α,i0˜] = E[ W
α0?,i] = E[ S
αT?,i] = J
i(α
?) = J
i( ˜ α) = E[ S
α,iT˜].
Since t 7→ E[ S
α,it˜] is nondecreasing for every i ∈ J 1, n K , this implies that the map is actually constant
and (iv) is verified.
2.2 The method and the solution
Let us now apply the optimality principle in Lemma 2.1 in order to find a Nash equilibrium. In the linear-quadratic case the laws of the state and the controls intervene only through their expectations.
Thus we will use a simplified optimality principle where P is simply replaced by E in conditions (ii) and (iii) of Lemma 2.1. The general procedure is the following:
Step 1. We guess a candidate for W
α,i. To do so we suppose that W
α,it= w
ti(X
tα, E [X
tα]) for some parametric adapted random field {w
it(x, x), t ∈ [0, T ], x, x ∈ R
d} of the form w
ti(x, x) = K
ti.(x − x)
⊗2+ Λ
it.x
⊗2+ 2Y
ti|x + R
it.
Step 2. We set S
α,it= e
−ρtw
ti(X
tα, E [X
tα]) + R
t0
e
−ρuf
i(u, X
uα, E [X
uα] , α
u, E [α
u])du for i ∈ J 1, n K and α ∈ A .We then compute
dtdE
S
α,it= e
−ρtE D
tα,i(with Itô’s formula) where the drift D
α,itakes the form:
E D
α,it= E h
−ρw
ti(X
tα, E [X
tα]) + d dt E
w
it(X
tα, E [X
tα])
+ f
i(t, X
tα, E [X
tα] , α
t, E [α
t]) i
.
Step 3. We then constrain the coefficients of the random field so that the conditions of Lemma 2.1 are satisfied. This leads to a system of backward ordinary and stochastic differential equations for the coefficients of w
i.
Step 4. At time t, given the state and the controls of the other players, we seek the action α
icancelling the drift. We thus obtain the best response function of each player.
Step 5. We compute the fixed point of the best response functions in order to find an open loop Nash equilibrium t 7→ α
?t.
Step 6. We check the validity of our computations.
2.2.1 Step 1: guess the random fields The process t 7→ E
w
it(X
tα, E [X
tα])
is meant to be equal to E
g
i(X
Tα, E [X
Tα])
at time T , where g(x, x) = P
i.(x − x)
⊗2+ (P
i+ ˜ P
i).x
⊗2+ r
i|x with (P, P , r ˜
i) ∈ (S
d)
2× L
2FT
(R
d). It is then natural to search for a field w
iof the form w
ti(x, x) = K
ti.(x − x)
⊗2+ Λ
it.x
⊗2+ 2Y
ti|x + R
itwith the processes (K
i, Λ
i, Y
i, R
i) in (L
∞([0, T ], S
d+)
2× S
2F(Ω × [0, T ], R
d) × L
∞([0, T ], R) and solution to:
dK
ti= ˙ K
tidt, K
Ti= P
idΛ
it= ˙ Λ
itdt, Λ
iT= P
i+ ˜ P
idY
ti= ˙ Y
tidt + Z
tidW
t, 0 ≤ t ≤ T, Y
Ti= r
idR
it= ˙ R
itdt, R
iT= 0,
where ( ˙ K
i, Λ ˙
i, R ˙
i) are deterministic processes valued in S
d×S
d×R and ( ˙ Y
i, Z
i) are adapted processes valued in R
d.
2.2.2 Step 2: derive their drifts For i ∈ J 1, n K , t ∈ [0, T ] and α ∈ A , we set:
S
α,it:= e
−ρtw
ti(X
t, E [X
t]) + Z
t0
e
−ρuf
i(u, X
uα, E [X
uα] , α
u, E [α
u])du
and then compute the drift of the deterministic function t 7→ E[ S
α,it]:
dE[S
α,it]
dt = e
−ρtE[D
tα,i]
= e
−ρtE[(X
t− X
t)
|[ ˙ K
ti+ Φ
it](X
t− X
t) + X
t|( ˙ Λ
it+ Ψ
it)X
t+ 2[ ˙ Y
ti+ ∆
it]
|X
t+ ˙ R
it− ρR
it+ Γ
it+ χ
it(α
i,t)], where we have defined:
χ
it(α
i,t) := (α
i,t− α
i,t)
|S
i,ti(α
i− α
i,t) + α
|i,tS ˆ
i,tiα
i,t+ 2[U
i,ti(X
t− X
t) + V
i,tiX
t+ O
ii,t+ ξ
i,ti− ξ
ii,t]
|α
i,twith the following coefficients:
Φ
it= Q
it+ σ
|x,tK
tiσ
x,t+ K
tib
x,t+ b
|x,tK
ti− ρK
tiΨ
it= ˆ Q
it+ ˆ σ
|x,tK
tiσ ˆ
x,t+ Λ
itˆ b
x,t+ ˆ b
|x,tΛ
it− ρΛ
it∆
it= L
ix,t+ b
|x,tY
ti+ ˜ b
|x,tY
it+ σ
x,t|Z
ti+ ˜ σ
x,t|Z
it+ Λ
itβ
t+σ
|x,tK
tiγ
t+ ˜ σ
x,t|K
tiγ
t+ K
ti(β
t− β
t) − ρY
ti+ P
k6=i
U
k,ti|(α
k,t− α
k,t) + V
k,ti|α
k,tΓ
it= γ
|tK
tiγ
t+ 2β
t|Y
ti+ 2γ
t|Z
ti+ P
k6=i
(α
k,t− α
k,t)
|S
k,ti(α
k,t− α
k,t) + α
|k,tS ˆ
k,tiα
k,t+ 2[O
ik,t+ ξ
k,ti− ξ
ik,t]
|α
k,t− ρR
it, (5)
and
S
k,ti= N
k,ti+ σ
k,t|K
tiσ
k,tS ˆ
k,ti= ˆ N
k,ti+ ˆ σ
k,t|K
tiσ ˆ
k,tU
k,ti= I
k,ti+ σ
k,t|K
tiσ
x,t+ b
|k,tK
tiV
k,ti= ˆ I
k,ti+ ˆ σ
k,t|K
tiσ ˆ
x,t+ ˆ b
|k,tΛ
itO
k,ti= L
ik,t+ ˆ b
|k,tY
it+ ˆ σ
k,t|Z
it+ ˆ σ
k,t|K
tiγ
t+
12P
k6=i
( ˆ J
i,k,ti+ ˆ J
k,i,ti|)α
k,tJ
k,l,ti= G
ik,l,t+ σ
k,t|K
tiσ
l,tJ ˆ
k,l,ti= ˆ G
ik,l,t+ ˆ σ
k,t|K
tiσ ˆ
l,tξ
k,ti= L
ik,t+ b
|k,tY
ti+ σ
k,t|Z
ti+ σ
|k,tK
tiγ
t+
12P
k6=i
(J
i,k,ti+ J
k,i,ti|)α
k,t.
(6)
2.2.3 Step 3: constrain their coefficients
Now that we have computed the drift, we need to constrain the coefficients so that S
α,isatisfies the condition of Lemma 2.1. Let us assume for the moment that S
i,tiand S ˆ
i,tiare positive definite matrices (this will be ensured by the positive definiteness of K). That implies that there exists an invertible matrix θ
tisuch that θ
itS
i,tiθ
ti|= ˆ S
i,tifor all t ∈ [0, T ]. We can now rewrite the drift as: "a square in α
i" + "other terms not depending in α
i". Since we can form the following square:
E[χ
it(α
i,t)] = E[(α
i,t− α
i,t+ θ
ti|α
i,t− η
ti)S
i,ti(α
i,t− α
i,t+ θ
i|tα
i,t− η
it) − ζ
ti]
with:
η
ti= a
i,0t(X
t, X
t) + θ
ti|a
i,1t(X
t) a
i,0t(x, x) = − S
i,ti −1U
i,ti(x − x) − S
i,ti −1(ξ
i,ti− ξ
ii,t) a
i,1t(x) = − S ˆ
i,ti −1(V
i,tix + O
ii,t) ζ
ti= (X
t− X
t)
|U
i,ti|S
ti−1U
ti(X
t− X
t) + X
tV
i,ti|S ˆ
ti−1V
i,tiX
t+2(U
i,ti|S
ti−1(ξ
i,ti− ξ
ii,t) + V
i,tiS ˆ
i,ti −1O
i,ti)X
t+(ξ
i,ti− ξ
ii,t)
|S
i,ti −1(ξ
i,ti− ξ
ii,t) + O
i|i,tS ˆ
i,ti −1O
ii,t, we can then rewrite the drift in the following form:
E[D
α,it] = E[(X
t− X
t)
|[ ˙ K
ti+ Φ
i0t](X
t− X
t) + X
t|( ˙ Λ
it+ Ψ
i0t)X
t+ 2[ ˙ Y
ti+ ∆
i0t]
|X
t+ ˙ R
it+ Γ
i0t+ (α
i,t− α
i,t+ θ
i|tα
i,t− η
it)S
i,ti(α
i,t− α
i,t+ θ
ti|α
i,t− η
ti)], where
Φ
i0t= Φ
it− U
i,ti|S
i,ti −1U
i,tiΨ
i0t= Ψ
it− V
i,ti|S ˆ
i,ti −1V
i,ti∆
i0t= ∆
it− U
i,ti|S
i,ti −1(ξ
i,ti− ξ
ii,t) − V
i,ti|S ˆ
ti−1O
i,tiΓ
i0t= Γ
it− (ξ
i,ti− ξ
ii,t)
|S
i,ti −1(ξ
ii,t− ξ
it) − O
i|i,tS ˆ
i,ti −1O
ii,t.
(7)
We can finally constrain the coefficients. By choosing the coefficients K
i, Γ
i, Y
iand R
iso that only the square remains, the drift for each player i ∈ J 1, n K can be rewritten as a square only (in the next step we will verify that we can indeed choose such coefficients). More precisely we set K
i, Γ
i, Y
iand R
ias the solution of:
dK
ti= −Φ
i0tdt K
Ti= P
idΛ
it= −Ψ
i0tdt Λ
iT= P
i+ ˜ P
idY
ti= −∆
i0tdt + Z
tidW
tY
Ti= r
idR
it= −Γ
i0tdt R
iT= 0,
(8)
and stress the fact that Y
i, Z
i, R
idepend on α
−i, which appears in the coefficients ∆
i0, and Γ
i0. With such coefficients the drift takes now the form:
E[D
α,it] = E[(α
i,t− α
i,t+ θ
i|tα
i,t− η
it)S
i,ti(α
i,t− α
i,t+ θ
ti|α
i,t− η
it)]
= E[(α
i,t− α
i,t− a
i,1t+ θ
ti|(α
i,t− a
i,0t))S
i,ti(α
i,t− α
i,t− a
i,1t+ θ
ti|(α
i,t− a
i,0t))]
and thus satisfies the nonnegativity constraint: E[D
tα,i] ≥ 0, for all t ∈ [0, T ], i ∈ J 1, n K , and α ∈ A . 2.2.4 Step 4: find the best response functions
Proposition 2.2. Assume that for all i ∈ J 1, n K , (K
i, Λ
i, Y
i, Z
i, R
i) is a solution of (8) given α
−i∈ A
−i. Then the set of processes
α
i,t= a
i,0t(X
t, E[X
t]) + a
i,1t(E[X
t])
= − S
i,ti −1U
i,ti(X
t− E[X
t]) − S
i,ti −1(ξ
ii,t− ξ
ii,t) − S ˆ
i,ti −1(V
i,tiE[X
t] + O
ii,t)
(9)
(depending on α
−i) where X is the state process with the feedback controls α = (α
1, ..., α
n), are best-response functions, i.e., J
i(α
i, α
−i) = V
i(α
−i) for all i ∈ J 1, n K . Moreover we have
V
i(α
−i) = E[W
i,α0]
= E[K
0i.(X
0− X
0)
⊗2+ Λ
i0.X
⊗20+ 2Y
0i|X
0+ R
i0].
Proof. We check that the assumptions of Lemma 2.1 are satisfied. Since W
α,iis of the form W
α,it= w
it(X
tα, E[X
tα]), condition (i) is verified. The condition (ii) is satisfied thanks to the termi- nal conditions imposed on the system (8). Since (K
i, Λ
i, Y
i, Z
i, R
i) is solution to (8), the drift of t 7→ E[ S
α,i] is positive for all i ∈ J 1, n K and all α ∈ A , which implies condition (iii). Finally, for α ∈ A , we see that E[D
α,it] ≡ 0 for t ∈ [0, T ] and i ∈ J 1, n K if and only if:
α
i,t− α
i,t− a
i,1t(X
tα, E[X
tα]) + θ
it|(α
i,t− a
i,0t(E[X
tα])) = 0 a.s. t ∈ [0, T ].
Since θ
itis invertible, we get α
i,t= a
i,0tby taking the expectation in the above formula. Thus E[D
tα,i] ≡ 0 for every i ∈ J 1, n K and t ∈ [0, T ] if and only if α
i,t= α
i,t+ a
i,1t= a
i,1t+ a
i,0tfor every i ∈ J 1, n K and t ∈ [0, T ]. For such controls for the players, the condition (iv) is satisfied. We now check that α
i∈ A
ifor every i ∈ J 1, n K (i.e. it satisfies the square integrability condition). Since X is solution to a linear Mckean-Vlasov dynamics and satisfies the square integrability condition E[sup
0≤t≤T|X
t|
2] < ∞, it implies that α
i∈ L
2F(Ω × [0, T ], R
di) since S
ii, U
ii, S ˆ
ii, V
iiare bounded and (O
ii, ξ
ii) ∈ L
2([0, T ], R
di) × L
2(Ω × [0, T ], R
di). Therefore α
i∈ A
ifor every i ∈ J 1, n K .
2.2.5 Step 5: search for a fixed point
We now find semi-explicit expressions for the optimal controls of each player. The issue here is the fact that the controls of the other players appear in the best response functions of each player through the vectors (Y
1, Z
1), ..., (Y
n, Z
n). To solve this fixed point problem, we first rewrite (9) and the backward equations followed by (Y, Z) = ((Y
1, Z
1), ..., (Y
n, Z
n)) in the following way (note that we omit the time dependence of the coefficients to make the notations less cluttered):
α
?t− α
?t= S
x(X
t− X
t) + S
y(Y
t− Y
t) + S
z(Z
t− Z
t) + H − H α
?t= S ˆ
xX
t+ S ˆ
yY
t+ S ˆ
zZ
t+ H ˆ
dY
t= P
y(Y
t− Y
t) + P
z(Z
t− Z
t) + P
α(α
t− α
t) + F − F + P ˆ
yY
t+ ˆ P
zZ
t+ ˆ P
αα
t+ ˆ F
dt +Z
tdW
t,
(10)
where we define
S = (S
ii)
−11
i=ji,j∈J1,nK
S ˆ =
( ˆ S
ii)
−11
i=ji,j∈J1,nK
J =
12(J
iji+ J
jii)1
i6=ji,j∈J1,nK
J ˆ =
12( ˆ J
iji+ ˆ J
jii)1
i6=ji,j∈J1,nK
J = − (I
d+ SJ)
−1S J ˆ = −
I
d+ S ˆ J ˆ
−1S ˆ S
x= J U
iii∈J1,nK
S ˆ
x= J ˆ V
iii∈J1,nK
S
y= J (1
i=jb
|i)
i,j∈J1,nK
S ˆ
y= J ˆ
1
i=jˆ b
|ii,j∈J1,nK
S
z= J (σ
|i)
i∈J1,nK
S ˆ
z= J ˆ (ˆ σ
|i)
i∈J1,nK
H = J L
ii+ σ
i|K
iγ
i∈J1,nK
H ˆ = J ˆ L
ii+ ˆ σ
i|K
iγ
i∈J1,nK
P
y= (1
i=j(U
ii(S
ii)
−1b
|i− b
|x+ ρ))
i,j∈J1,nK
P ˆ
y= (1
i=j(V
ii( ˆ S
ii)
−1ˆ b
|i− ˆ b
|x+ ρ))
i,j∈J1,nKP
z= (1
i=j(U
ii(S
ii)
−1σ
|i− σ
x|))
i,j∈J1,nK
P ˆ
z= (1
i=j(V
ii(S
ii)
−1σ ˆ
|i− σ ˆ
x|))
i,j∈J1,nK
P
α= −(1
i6=j(U
ji+ U
ii(S
ii)
−1(J
iji+ J
jii|)))
i,j∈J1,nKP ˆ
α= −(1
i6=j(V
ji+ V
ii( ˆ S
ii)
−1( ˆ J
iji+ ˆ J
jii|)))
i,j∈J1,nK
F = (K
iβ + σ
|xK
iγ)
i∈J1,nK
F ˆ = (U
ii(S
ii)
−1(L
i+ σ
|iK
iγ) − L
x− σ
|xK
iγ − K
iβ )
i∈J1,nK
.
(11) Now, the strategy is to propose an ansatz for t ∈ [0, T ] 7→ Y
tin the form:
Y
t= π
t(X
t− X
t) + ˆ π
tX
t+ η
t(12) where (π, π, η) ˆ ∈ L
∞([0, T ], R
nd×d) × L
∞([0, T ], R
nd×d) × S
2F(Ω × [0, T ], R
nd) satisfy:
dη
t= ψ
tdt + φ
tdW
t, η
T= r = (r
i)
i∈J1,nK