• Aucun résultat trouvé

I Colloque Jeunes Probabilistes et Statisticiens (3 → 7 of May – Mont Dore) J Advisors:

N/A
N/A
Protected

Academic year: 2022

Partager "I Colloque Jeunes Probabilistes et Statisticiens (3 → 7 of May – Mont Dore) J Advisors:"

Copied!
24
0
0

Texte intégral

(1)

M-estimation in Complex Modeling

Nabil Rachdi (PhD Student) [email protected]

I Colloque Jeunes Probabilistes et Statisticiens (3 → 7 of May – Mont Dore) J Advisors:

Jean-Claude Fort (Paris V), Thierry Klein (Toulouse III)

Fabien Mangeant (EADS IW), Régis Lebrun (EADS IW).

(2)

Outline

1 Motivations and notations

2 Model Selection

3 Risk excess bound

4 Theorical Results

5 Examples

(3)

General framework

Variable of Interest: Y, density f , probability measure Q unknown Feature: Density distribution, mean, threshold probability, quantile etc...

Experimental data: Y

nexp

= Y

1exp

, ..., Y

nexp

(a priori training data) supposed i.i.d from Q - Link to history

- Arise from experiments, complex codes etc...

- Small number - Difficult to obtain

Simulation model: h ∈ H : (x, θ) ∈ X × Θ 7→ y = h(x, θ)

For a configuration x

0

, the model gives y

0

= h(x, θ

0

) ⇒ Deterministic !!

Goal: Use Simulated data to improve feature estimation of Y :

⇒ 1. Estimation procedure: choice of the model h, and parameter θ

⇒ 2. Study of a feature based on ( Y b

1sim

, ..., Y b

msim

) (a posteriori training data)

to compare with the feature based on (Y

1exp

, ..., Y

nexp

)

(4)

General framework

Variable of Interest: Y, density f , probability measure Q unknown Feature: Density distribution, mean, threshold probability, quantile etc...

Experimental data: Y

nexp

= Y

1exp

, ..., Y

nexp

(a priori training data) supposed i.i.d from Q - Link to history

- Arise from experiments, complex codes etc...

- Small number - Difficult to obtain

Simulated data: Consider that X ↔ (X ,B,P

X

) i.e x (deterministic) ↔ X (random).

For m ≥ 0, let Y

msim

= Y

1sim

, ..., Y

msim

(depends on the model h and the parameter θ) - h ∈ H (set of models), θ ∈ Θ (set of parameters)

- Y

isim

= h(X

i

,θ), i = 1, ..., m, X

i

i.i.d random variables (distribution P

X

).

- X t Y

exp

Goal: Use Simulated data to improve feature estimation of Y :

⇒ 1. Estimation procedure: choice of the model h, and parameter θ

⇒ 2. Study of a feature based on ( Y b

1sim

, ..., Y b

msim

) (a posteriori training data)

to compare with the feature based on (Y

1exp

, ..., Y

nexp

)

(5)

General framework

Variable of Interest: Y, density f , probability measure Q unknown Feature: Density distribution, mean, threshold probability, quantile etc...

Experimental data: Y

nexp

= Y

1exp

, ..., Y

nexp

(a priori training data) supposed i.i.d from Q - Link to history

- Arise from experiments, complex codes etc...

- Small number - Difficult to obtain

Simulated data: Consider that X ↔ (X ,B,P

X

) i.e x (deterministic) ↔ X (random).

For m ≥ 0, let Y

msim

= Y

1sim

, ..., Y

msim

(depends on the model h and the parameter θ) - h ∈ H (set of models), θ ∈ Θ (set of parameters)

- Y

isim

= h(X

i

,θ), i = 1, ..., m, X

i

i.i.d random variables (distribution P

X

).

- X t Y

exp

Goal: Use Simulated data to improve feature estimation of Y :

⇒ 1. Estimation procedure: choice of the model h, and parameter θ

⇒ 2. Study of a feature based on ( Y b

1sim

, ..., Y b

msim

) (a posteriori training data)

to compare with the feature based on (Y

1exp

, ..., Y

nexp

)

(6)

Simulated data estimation

Illustration of Experimental & Simulated Data: Example of a 2D-Performance Y = (Y

1

,Y

2

)

Choice of h ∈ H and θ ∈ Θ ? ⇒ driven by the feature considered

(7)

Simulated data estimation

Illustration of Experimental & Simulated Data: Example of a 2D-Performance Y = (Y

1

,Y

2

)

Choice of h ∈ H and θ ∈ Θ ? ⇒ driven by the feature considered

(8)

Definition: Feature

Let P be the set of all probability measures, D a metric space, we define a feature as a function

ρ : P −→ D

µ 7−→ ρ(µ).

Some feature:

- ρ(µ) = E

W∼µ

(W ) (mean), q

Wα

(α-quantile)

⇒ D = R

- ρ(µ) = P(W > s) ⇒ D = [0, 1]

- ρ(µ) = f

µ

⇒ D = {set of distributions}

- etc ...

(9)

Choice of (h, θ) for a Feature ρ

In our framework:

the model

P

X (h,θ)

−→ P

h,θ featureρ

−→ ρ(P

h,θ

) := ρ

h

(θ)

Important Remark: the feature ρ

h

(θ) can be unreachable → we say that h is complex denote the "true" feature by ρ(Q) := ρ

For fix h ∈ H, choice of θ

θ

0

= Argmin

Θ

D(ρ

h

(θ), ρ

)

with D a measure of distance D × D.

We use the form:

M(h, θ) = Z

γ

h,θ

(y) Q(dy)

where the function γ

h,θ

is called contrast of (h,θ).

We have

γ

h,θ

= Ψ (ρ

h

(θ))

with Ψ some operator on D.

(10)

Example of contrasts:

If the feature is the density

- γ

h,θ

(y) = − ln (ρ

h

(θ)) (y) ⇒ M(h,θ) = K(ρ

, ρ

h

(θ)) - γ

h,θ

(y) = ||ρ

h

(θ)||

22

− 2ρ

h

(θ)(y ) ⇒ M(h, θ) = ||ρ

− ρ

h

(θ)||

22

. - etc...

The same for mean, quantile, threshold probability etc...

Criterion to minimize:

M(h, θ) = Z

γ

h,θ

(y) Q(dy)

Assume that (h

, θ

) is the unique minimum of M(·, ·) Difficulties:

- The measure Q is unknown (Classical !)

- For complex models h, the feature ρ

h

(θ) is unreachable so the contrast too

h,θ

= Ψ (ρ

h

(θ))).

(11)

M(h, θ) = Z

γ

h,θ

(y ) Q(dy)

Alternative: Use of Experimental & Simulated data

- Replace Q by its empirical version Q

n

(depends on n-Experimental data Y

nexp

) We have

M

n

(h, θ) = 1 n

n

X

i=1

γ

h,θ

(Y

iexp

)

- Since γ

h,θm

= Ψ ρ

mh

(θ)

, replace the feature ρ

h

(θ) by an estimator ρ

mh

(θ) (depends on m-Simulated data Y

msim

).

B Pakes and Pollard (1989) studied estimators minimizing an approximation M e

n

(h, θ) ∼ M

n

(h, θ) (Optimization Estimators).

B They prove √

n-consistency and give limit theorems

B The conditions are given on the (approximated) criterion M e

n

(h, θ)

I We want conditions on the simulation (quantification) and on the feature (regularity)

(12)

Criterion to minimize in practice:

M

n,m

(h, θ) := 1 n

n

X

i=1

γ

h,θm

(Y

iexp

)

Estimator of θ

0

(h) = Argmin

θ∈Θ

M(h,θ) b θ

n,m

(h) = Argmin

θ∈Θ

M

n,m

(h,θ) .

First question: Consistency b θ

n,m

(h) −→

n→+∞

m→+∞

θ

0

(h) ?

(13)

Source of errors

Proposition: Model Error Upper Bound We prove

M(h, b θ

n,m

(h)) − M(h

, θ

)

| {z }

risk excess of(h,θbn,m(h))

. 1

√ n || G

n

γ

h,·m

||

Θ

+ ||E

hm

||

Θ

+ ∆

h

Variance terms (random terms):

- 1

√ n || G

n

γ

mh,·

||

Θ

= sup

θ∈Θ

1 n

n

X

i=1

γ

h,θm

(Y

iexp

) − E

Y

mh,θ

(Y ))

(deviation)

⇒ Estimation Error of Statistical Data (G

n

:= √

n(Q

n

− Q)) - ||E

hm

||

Θ

= sup

θ∈Θ

|| γ

h,θm

− γ

h,θ

||

1,Q

, with ||g||

1,Q

= R

R

|g(y )| Q (dy)

⇒ Simulation Error Bias term :

- ∆

h

= M(h,θ

0

(h)) − M(h

, θ

)

⇒ Approximation Error of the model h

(14)

Link with classical methods

Classical Inequality Recall M(h, θ) = R

γ

h,θ

(y) Q (dy), we have

M(h, b θ

n

(h)) − M(h

, θ

) ≤ 1

√ n || G

n

γ

h,·

||

Θ

+ ∆

h

For no complex models: (Statistical models, simplified models etc...)

⇒ the feature ρ

h

(θ) is reachable ⇒ Simulation is useless Advantages

- Well studied

- Maximum Likelihood, Regression etc...

Drawback

- ∆

h

can be large (due to simplification of h) Trade-off Bias-Variance

OUR GOAL: Extend the theory to physical models

(15)

Theorical result

Recall:

model h + parameter θ ⇒ Feature ρ

h

(θ) ⇒ Contrast γ

h,θ

In our framework:

model h + parameter θ ⇒ Feature ρ

mh

(θ) ⇒ Contrast γ

h,θm

Inequality:

M(h, b θ

n,m

(h)) − M(h

, θ

) . 1

√ n || G

n

γ

h,·m

||

Θ

+ ||E

hm

||

Θ

+ ∆

h

Key point: Let the class of functions

Γ

mh

:= {γ

h,θm

, θ ∈ Θ} = {Ψ(ρ

mh

(θ)), θ ∈ Θ}

- the class is random (simulated data) - the class changes with m

Important Remark: || G

n

γ

h,·m

||

Θ

= || G

n

||

Γm h

Main difficulty : Prove the tightness of the random variable sequence

|| G

n

||

Γm h

n,m

(16)

Definition: L

2

(Q) Bracketing numbers and Entropy with bracketing

B Let G be a class of functions. Given two functions l, u, the bracket [l, u] is the set of all functions g with l ≤ g ≤ u. An ε-bracket is a bracket [l, u] with ||u − l||

2,Q

< ε. The bracketing number N

[ ]

(ε, G, L

2

(Q)) is the minimum number of -brackets needed to cover the class of functions G.

B The entropy with bracketing is the logarithm of the bracketing number.

Definition: L

2

(Q) Bracketing integral The bracketing integral is defined as

J

[ ]

(δ, G, L

2

(Q)) :=

Z

δ 0

q

log N

[ ]

(ε, G, L

2

(Q))dε .

References:

v d Vaart & Wellner (1996) Weak

Convergence and Empirical Processes

v d Vaart (1998) Asymptotics Statistics

S van de Geer (2000) Empirical

Processes in M-estimation

(17)

Particular classes

Recall that G

n

:= √

n(Q

n

− Q)

Definition: Glivenko-Cantelli classes

A class of function G is ( Q )-Glivenko-Cantelli if

√ 1

n || G

n

||

G

−→

p.s

0 .

Definition: Donsker classes

A class of function G is (Q)-Donsker if

G

n

G in l

(G)

Theorems

1. If for all ε > 0, N

[ ]

(ε, G, L

1

(Q)) < ∞ then G is Glivenko-Cantelli.

2. If J

[ ]

(1, G, L

2

(Q)) < ∞ then G is Donsker.

The class Γ

mh

is random and changes with m. We need more results.

(18)

Our Results

Theorical study of the Γ

mh

-indexed empirical G

n

= √

n(Q

n

− Q) process

Consistency

We prove that b θ

n,m

−→

n→+∞

m→+∞

θ

0

, under some conditions in terms of :

- Model Complexity (Bracketing Numbers) (model h + feature regularity) - Simulation Speed (Size of Simulated Data set, m)

- Control of Simulated contrasts (Modified Lindeberg conditions)

Limiting Distribution

1. It exists r

n,m

≥ 0 such that r

n,m

(b θ

n,m

− θ

0

) = O

P

(1)

2. Under some conditions on the model h and on the feature ρ, we can define m

n

:= inf{m ≥ 0, r

n,m

& n

12−εn

}, where ε

n

> 0 is a nonincreasing sequence.

3. Under some conditions,

b θ

n,m

− θ

0

N (0,Σ)

(19)

Example of the Range study

Phenomenon : Y = Range (distance an aircraft can travel), feature = density distribution A priori training data : Experimental data, n = 20, Y

nexp

= Y

1exp

, ..., Y

nexp

(obtained from complex model h

supposed to be the "true") Additional knowledge : Simulated data, m = 3000, Y

msim

= Y

1sim

, ..., Y

msim

from

h(X, θ) = F V C

s

1 θ

1

log 1

1 − θ

2

- Uncertain Inputs X = (F, V ,C

s

)

T

- Parameters θ = (θ

1

, θ

2

) ∈ Θ

Choice of θ ? θ

0

(h) = Argmin

θ∈Θ

Z

R

γ

h,θ

(y ) f (y) dy

γ

h,θ

= − ln(f

h,θ

) f

h,θ

↔ f

h,θm

(Kernel) f ↔ 1 n

n

X

i=1

δ

Yexp i

b θ

n,m

(h) = Argmin

θ∈Θ

− 1 n

n

X

i=1

ln

 1 m

m

X

j=1

K

hm

(Y

iexp

− Y

jsim

)

A posteriori training data : (Y

1exp

, ..., Y

nexp

, Y b

1sim

, ..., Y b

msim

)→ n + m = 3020!

(20)

Feature with a priori (Experimental) and a posteriori (Experimental + Simulated) training data

(21)

Feature with a priori (Experimental) and a posteriori (Experimental + Simulated) training data

(22)

Feature with a priori (Experimental) and a posteriori (Experimental + Simulated) training data

(23)

Work in progress

Non parametric estimation improvement:

- Let ϕ be a non parametric estimator of the feature ρ ϕ : {set of samples} −→ D

Λ 7−→ ϕ(Λ).

- We note ϕ

n

= ϕ(Y

nexp

) and ϕ

m

= ϕ(Y

msim

), and we consider the aggregate estimator ϕ

n,m

= α · ϕ

n

+ (1 − α) · ϕ

m

with 0 < α < 1.

- Estimation of α −→ Simulate or not to simulate ?

- Asymptotic properties

(24)

Thank you for your attention !

Références

Documents relatifs

Dans cet expos´e, nous d´ecrirons dans un premier temps ce ph´enom`ene dans le cadre plus simple de diffusions en dimension finie (avec un mouvement brownien additif) puis dans

z INTER-BASIN WATER TRANSFERS TO RELIEVE GROUNDWATER STRESS..

However, bank branch advisor, bank affiliated broker and an FA not affiliated with a bank or insurance company accounted for 78.5% of all types of financial advisors identified by

provides a go/no go verification of RAM, calendar clock and LTC, the EPROM software monitor has bootstrap routines for RL01 /02, RK05, RX01/02, RM02, TU58 and TM11 storage

Zum Vorschub des Papierstreifens. Hinweis: Man kann das Papier auch mit der aida herausziehen. Papiervorschub-Automatik: Durch kurzes Antippen der [ff] -Taste wird

Network User acknowledges and agrees that, except to the extent provided otherwise in the Attachment A of the Access Code for Transmission, the adherence to and due

© Région Auvergne-Rhône-Alpes, Inventaire général du patrimoine culturel, ADAGP communication libre, reproduction soumise à autorisation. 30 mars 2022 Page

1 I was a resident in family practice at the Ottawa Civic Hospital in the early 1990s, and I am very grateful to Dr Patten for sharing his knowledge and skills in the art