Data-driven kernel representations for sampling with an unknown block dependence structure under correlation constraints

(1)

HAL Id: hal-01634877

https://hal-upec-upem.archives-ouvertes.fr/hal-01634877

Submitted on 14 Nov 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

Data-driven kernel representations for sampling with an

unknown block dependence structure under correlation

constraints

Guillaume Perrin, Christian Soize, N. Ouhbi

To cite this version:

Guillaume Perrin, Christian Soize, N. Ouhbi. Data-driven kernel representations for sampling with

an unknown block dependence structure under correlation constraints. Computational Statistics and

Data Analysis, Elsevier, 2018, 119, pp.139-154. �10.1016/j.csda.2017.10.005�. �hal-01634877�

(2)

an unknown blo k dependen e stru ture under orrelation onstraints G.Perrin a ,C. Soize b ,N. Ouhbi a

CEA/DAM/DIF, F-91297, Arpajon,Fran e

b

UniversitéParis-Est,MSME UMR8208 CNRS,Marne-la-Vallée, Fran e

Innovation andResear hDepartment, SNCF,Paris, Fran e

Abstra t

The multidimensionalGaussian kernel-density estimation (G-KDE) is a

powerful tool to identify the distribution of random ve tors when the

max-imal information is a set of independent realizations. For these methods, a

key issue is the hoi e of the kernel and the optimization of the bandwidth

matrix. To optimize these kernel representations, two adaptations of the

lassi al G-KDE are presented. First, it is proposed to add onstraints on

the mean and the ovarian e matrix in the G-KDE formalism. Se ondly, it

is suggested to separate in dierent groups the omponents of the random

ve tor of interest that ould reasonably be onsidered as independent. This

blo k by blo k de omposition is arried out by lookingfor the maximum of

a ross-validation likelihood quantity that is asso iated with the blo k

for-mation. This leadstoatensorizedversion ofthe lassi alG-KDE.Finally,it

is shown onaseries of exampleshowthese two adaptations an improvethe

nonparametri representations of the densities of randomve tors, espe ially

when thenumberofavailablerealizationsisrelativelylow omparedtotheir

dimensions.

Key words:

Kernel density estimation, optimal bandwidth,nonparametri

representation, data-driven sampling

(3)

The generation of independent realizations of a se ond-order

R

d

-valued

random ve tor

X

,whose distribution,

P

X

(dx)

,is unknown but an only be

approximated from a nite set of

N ≥ 1

realizations, is a entral issue in

un ertainty quanti ation,signalpro essingand dataanalysis. Onepossible

approa h to address this problem is to suppose that the sear hed

distribu-tion belongs to an algebrai lass of distributions, whi h an be mapped

from arelatively smallnumberof parameters (for instan e, the

multidimen-sional Gaussiandistribution). Generating newrealizations ofrandomve tor

X

amounts therefore at identifying the parameters that best suit the

avail-abledata andthen,atsamplingindependentrealizationsasso iatedwiththe

identied parametri distribution. However, whenthe dependen e stru ture

asso iated with the omponentsof

X

is omplex, su h that its distribution

an be on entrated on an unknown subset of

R

d

, the denition of a

rele-vantparametri lasstorepresent

P

X

(dx)

an be omeverydi ult. Inthat

ase, nonparametri approa hes are generally preferred to these parametri

onstru tions [16, 11℄. In parti ular, the multidimensional Gaussian

kernel-density estimation (G-KDE) method approximates the probability density

fun tion (PDF) of

X

, if it exists, as a sum of

N

multidimensional

Gaus-sian PDFs, whi h are entred at ea h available independent realization of

X

. Optimizing the ovarian e matri es asso iated with these

N

PDFs is a

entral issue, as they ontrol the inuen e of ea h realization of

X

on the

nal approximation of

P

X

(dx)

. Even if there are many ontributions on

this subje t (see for instan e [5, 4, 3, 6, 18℄),when the dimension

d

of

X

is

high (

d ∼ 10 − 100

), onstant ovarian e matri esparametrizedby a unique

s alingparameter aregenerally onsidered. Inparti ular, theSilvermanrule

of thumb [12℄ for hoosing this s aling parameter is widely used be ause of

its simpli ity and its good asymptoti behaviour when

N

tends to innity.

However, forxed valuesof

N

,this Silverman hoi e oftenoverestimates the

s attering of

P

X

(dx)

, and an have di ulties to orre tly on entrate the

new generated realizations of

X

on their regionsof high probability.

To over ome this problem,a two-step pro edureis introdu ed. First, we

suggestto enter andtoun orrelatetherandomve tor

X

(usingaPrin ipal

Component Analysis for instan e). Then, based on the maximization of a

global"Leave-One-Out"likelihood,the ideaistoseparateindierentblo ks

the elements of

X

, whi h ould reasonably be onsidered as statisti ally

(4)

number of realizations of

X

, the less elements there are in ea h group, the

more han e we have to orre tly infer the multidimensionaldistribution of

ea hsub-ve tor onstituted ofea hgroupelements,andsothe bettershould

be the estimation of the PDF of

X

. Nevertheless, the identi ation of this

(unknown) blo k de omposition is a di ult ombinatorial problem. This

paperpresentsthereforetwoalgorithmstondrelevantblo kde ompositions

in areasonable omputationaltime.

The outline of this work is as follows. Se tion 2 presents the theoreti al

frameworkasso iatedwiththeG-KDEandtheoptimizationofthe ovarian e

matri es on whi h it is based. The blo k de omposition we propose is then

detailedinSe tion3. At last,the e ien y ofthe methodis illustratedona

series of analyti and industrialexamples inSe tion 4.

2. Theoreti al framework

Let

X

:= {X(ω), ω ∈ Ω}

be a se ond-order random ve tor dened on

a probability spa e

(Ω, T , P)

, with values in

R

d

. We assume that the

prob-ability density fun tion (PDF) of

X

exists. By denition, this PDF, whi h

is denoted by

p

X

, is an element of

M

1 (R

d

_{, R}

+

₎

, the set of positive-valued

fun tions, whose integral over

R

d

is 1. It is assumed that the maximal

available information about

p

X

is a set of

N > d

independent and

dis-tin t realizations of

X

, whi h are gathered inthe deterministi set

S(N) :=

{X(ω

n

), 1 ≤ n ≤ N}

. Given these realizations of

X

, the kernel estimator of

p

X

is

b

p

X

(x; H, S(N)) =

det

(H)

−1/2

N

X

n=1

K

H

−1/2

(x − X(ω

n

))

,

(1)

where det

(·)

is the determinant operator,

K

isany fun tion of

M

1 (R

d

_{, R}

+

₎

,

and

H

is a

(d × d)

-dimensional positive denite symmetri matrix that is generally referred as the "bandwidth matrix". In the following, wefo us on

the lassi al ase when

K

is the Gaussian multidimensionaldensity. Hen e,

the PDF

p

X

is approximated by a mixture of

N

Gaussian PDFs, for whi h

the means are the available realizations of

X

and the ovarian e matri es

(5)

b

p

X

(x; H, S(N)) =

1 N

N

X

n=1

φ (x; X(ω

n

), H) ,

x

∈ R

d

,

(2)

where for any

R

d

-dimensional ve tor

µ

and for any

(R

d

_{× R}

d

₎

-dimensional

symmetri positivedenitematrix

C

,

φ(·; µ, C)

isthePDFofan

R

d

-dimensional

Gaussian randomve tor with mean

µ

and ovarian ematrix

C

:

φ (x; µ, C) :=

exp

−

1

2 (x − µ)

T

C

−1

(x − µ)

(2π)

d/2

p

det

(C)

,

x

∈ R

d

.

(3)

By onstru tion, the matrix

H

in Eq. (2) hara terizes the lo al

on-tribution of ea h realization of

X

. Thus, its value has to be optimized to

minimizethedieren ebetween

p

X

,whi hisunknown, and

b

p

X

(·; H, S(N))

.

The mean integrated squarederror (MISE) performan e riterion

MISE

(H; d, N) = E

Z

R

d

(p

X

(x) − b

p

X

(x; H, S(N)))

2 dx

(4)

is generally onsideredtoquantify su ha dieren e. Here

E [·]

is the

mathe-mati alexpe tation. Forthis riterion,it an benoti edthattheset

S(N)

is

random,whereas intherest ofthispaperitisdeterministi . Givensu ient

regularity onditions on

p

X

, an asymptoti approximation of this riterion

anbederived. Inlowdimension,thevalueof

H

thatminimizesthis

asymp-toti riterion an be expli itly al ulated, but itsvalue depends onthe

un-known PDF

p

X

and its derivatives (see [10℄ for more details). Studies have

therefore been ondu ted to estimate these fun tions (generally iteratively)

from the onlyavailableinformationgiven by

S(N)

(see forinstan e [11, 5℄).

However, the onvergen e ofthesemethodsis ratherslowinhighdimension,

su h that in pra ti e, a widely used value for

H

is given by the Silverman

bandwidth matrix

H

Silv

(d, N) := (h

Silv

(d, N))

2 





b

σ

2

1

0 · · ·

0

0 _bσ

2

. . . . . . . . . . . . . . .

0

0 · · ·

0 _b

σ

2 d







(5)

where for all

1 ≤ i ≤ d

,

bσ

2 i

is the empiri alestimation of the varian e of

X

i

,

(6)

h

Silv

(d, N) :=

1 N

4 (d + 2)

1 d

₊₄

.

(6)

This expression, whi h is derived from a Gaussian assumption on

p

X

, is

thought to be a good ompromise between omplexity and pre ision.

How-ever,itisgenerallyobservedthat,forxedvaluesof

N

,whenthedistribution

of

X

is on entrated on an unknown subset of

R

d

, the more omplex and

dis onne ted this subset, the less relevant the value of

H

Silv

(d, N)

. To fa e

this problem, the diusion maps theory [14℄ an be used to bias the

gener-ation of independent realizations under

p

b

X

(·; H

Silv

(d, N), S(N))

and make

them loser to the ones we ouldhave got if they had been generated under

the true PDF

p

X

. Indeed, diusion maps are a very powerful

mathemati- al tool to dis over and hara terize sets on whi h the distribution of

X

is

on entrated,andtheir ouplingtononparametri statisti alrepresentations

has shown promising results, even when dealing with very high values of

d

[2℄.

From another point of view, the likelihood

L(S(N)|H)

asso iated with

H

an alsodire tly beused to identify relevant values of

H

. From Eq. (1),

it follows that

L(S(N)|H) :=

N

Y

n=1

b

p

X

(X(ω

_n

); H, S(N)) =

1 N

N

Y

n=1

N

X

m=1

φ

n,m

(H),

(7)

φ

n,m

(H) := φ (X(ω

n

); X(ω

m

), H) ,

1 ≤ n, m ≤ N.

(8)

The fun tion

L(S(N)|H)

uses twi e the same information(to ompute

b

p

X

(·; H, S(N))

andtoevaluateit). Hen e,ittendstoinnitywhen

H

tends

tozero, whi h an beseen asanoverttingof the availabledata. Inorderto

avoidthisphenomenon,itisproposedin[15℄to onsiderits"Leave-One-Out"

(LOO) expression

L

LOO

(S(N)|H) :=

N

Y

n=1

1 N − 1

N

X

m=1,m6=n

φ

n,m

(H)

(9)

instead. Given this approximate likelihood obtained from an LOO

ross-validation, and an a priori density

p

H

for

H

, Bayesian approa hes an be

(7)

p

H

(H|S(N)) := c L

LOO

(S(N)|H)p

H

(H), H ∈ M

+

(d).

(10)

Here,

c

isanormalizing onstantand

M

+

_(d)

isthesetofall

(d×d)

-dimensional

symmetri positivedenite matri es. In parti ular,the maximum likelihood

estimate of

H

isdenoted by

H

MLE

(d, N) := arg max

H

_∈M

+

_(d)

L

LOO

(S(N)|H).

(11)

Additionally, onsidering that the best available approximations of the

true mean and ovarian e matrix of

X

are given by their empiri al

estima-tions

b

µ

X

:=

1 N

N

X

n=1

X(ω

n

),

b

R

X

:=

1 N − 1

N

X

n=1

(X(ω

n

) −

µ

b

X

) ⊗ (X(ω

n

) −

µ

b

X

),

the expression given by Eq. (1) an be slightly modied to ensure that

the meanand the ovarian e matrix of the G-KDEapproximationof

X

are

equal to these estimations. Following [13℄, this an be done by onsidering

the subsequent proposition. The proof is given inAppendix.

Proposition 1. If the PDF of

f

X

is equal to

e

p

X

(·; H, S(N)) :=

1 N

N

X

n=1

φ (·; AX(ω

n

) + β, H) ,

(12)

β

:= (I

d

− A)b

µ, H := b

R

X

−

N − 1

N

A b

R

X

A

T

_,

(13)

where

A

is any

(d × d)

-dimensional matrix su h that

H

ispositive denite,

then the mean and the ovarian e matrix of

f

X

are equal to

µ

b

and

R

b

X

re-spe tively.

Given

S(N)

, the G-KDEof thePDF of

X

under onstraintsonitsmean

and its ovarian e matrix is denoted by

p

e

X

(·; H

MLE

(d, N), S(N))

. Here,

H

MLE

(d, N)

is the argument thatmaximizesthe LOO likelihoodof

H

asso- iated with

p

e

X

.

(8)

Given

µ

b

,

R

b

X

, and

H

MLE

(d, N)

, the generation of independent

realiza-tions of

f

X

∼

p

e

X

(·; H

MLE

(d, N), S(N))

is straightforward. Indeed, for any

M ≥ 1

, the Algorithm 1 (dened below) an be used to generate a

(d × M)

-dimensionalmatrix

Z

, whose olumnsare independent realizations

of

f

X

. There,

U {1, . . . , N}

denotes the dis rete uniform distribution over

{1, . . . , N}

and

N (0, 1)

denotes the standard Gaussiandistribution.

1 Let

Q(ω

′

1 ), . . . , Q(ω

′

M

)

be

M

independent realizationsthat are drawn from

U {1, . . . , N}

;

2 Let

M

bea

(d × M)

-dimensional matrix whose olumnsare all equal to

µ

b

;

3 Compute

A

su h that

H

:= b

R

X

−

N −1

N

A b

R

X

A

T

; 4 Dene

X

¯

:=

X

(ω

Q(ω

′

1 )

) · · · X(ω

Q(ω

′

M

)

;

5 Let

Ξ

bea

(d × M)

-dimensional matrix, whose omponentsare

dM

independent realizations that are drawn from

N (0, 1)

;

6 Assemble

Z

= M + A( ¯

X

− M ) + H

MLE

(d, N)

1/2

_Ξ

.

Algorithm 1:Generation of

M

independentrealizations of

X

f

.

Finally,this se tionhas presented thegeneralframeworkto

nonparamet-ri ally approximate the PDF of a random ve tor when the maximal

infor-mation is a set of

N

independent realizations. Some adjustments of the

lassi alformulationhave been proposed totake intoa ount onstraints on

the rst and se ond statisti al moments of the approximated PDF, and it

has been proposed to sear h the kernel density bandwidthas the solutionof

a omputationallydemanding LOO likelihoodmaximizationproblem.

However, fromtheanalysisof aseriesoftest ases,itappearsthat

R

b

X

is

arathergoodapproximationof

H

MLE

(d, N)

forthenonparametri modelling

of high dimensionalrandomve tors(

d ∼ 10 − 100

)with limitedinformation

(

N ∼ 10d

for instan e). From Eqs. (12) and (13), this means that we are

approximatingthe PDFof

X

asauniqueGaussian PDF,whoseparameters

orrespond tothe empiri almean and ovarian ematrix of

X

:

lim

H

_{→ b}

R

X

e

p

X

(·; H, S(N)) = φ(·;

µ, b

b

R

X

).

(14)

This ould prevent us from re overing the subset of

R

d

(9)

smaller values for the omponents of

H

in the nonparametri model. If all

the omponents of

X

are a tually dependent, there is however no reason

to do so without biasing the nal onstru ted distribution in fo using too

mu h onthe available data. Thus, instead ofarti iallyde reasing the most

likelyvalueof

H

(a ordingtothe availabledata),the nextse tionproposes

several adaptations of this G-KDEformalism.

3. Data-driven tensor-produ t representation

Thisse tionpresentssomeadaptationsofthe lassi alG-KDEtoimprove

the nonparametri representations of

p

X

when the number

N

of available

realizations of

X

isrelatively small ompared toits dimension

d

. Following

[16℄ and [17℄, we rst suggest to pre-pro ess the realizations of

X

(from a

Prin ipalComponentAnalysisforinstan e)su hthat

X

isnowsupposed to

be entred and un orrelated:

b

µ

X

= 0,

R

b

X

= I

_d

.

Here,

I

d

isthe

(d × d)

-dimensionalidentitymatrix. Thismakesindependent

the omponents of

X

that were only linearly dependent. Then, the idea is

to identify groups of omponents of

X

that an reasonably be onsidered

as statisti ally independent, if they exist. Instead of using statisti al tests,

we propose to sear h these groups by looking for the maximum of a

ross-validationlikelihood quantity that is asso iated with ea h blo k formation.

Thus,givenablo kbyblo kde ompositionofthe omponentsof

X

,thePDF

p

X

is approximated as the produ t of the nonparametri estimations of the

PDFsasso iatedwithea hsub-ve torof

X

. Forinstan e,ifthe

d

omponents

of

X

are sorted in

d

distin t groups, the approximation of

p

X

orresponds

to the produ t of the

d

nonparametri estimations of the marginal PDFs

of

X

. Indeed, if the identied blo k de omposition is orre tly adapted to

the (unknown) dependen e stru ture of

X

, there are good han es for the

nonparametri representation of

p

X

to be improved.

More details about this blo k de omposition are presented in the rest of

this se tion. First, we introdu e the notations and the formalism on whi h

this de omposition is based. Then, several algorithms are proposed for its

(10)

Forany

b

in

{1, . . . , d}

d

and for all

1 ≤ i ≤ d

,

b

i

an be used as a blo k

index for the

i

th

omponent

X

i

of

X

. This means that if

b

i

= b

j

,

X

i

and

X

j

are supposed tobedependent and have tobelong tothe same blo k. On

the ontrary, if

b

i

6= b

j

,

X

i

and

X

j

are supposed to beindependentand they

an belong totwo dierent blo ks. In order toavoid any redundan y in the

blo k by blo k parametrization of

X

, the following subset of

{1, . . . , d}

d

is onsidered:

B(d) :=

b

∈ {1, . . . , d}

d

| b

1 = 1, 1 ≤ b

j

≤ 1 + max

1≤i≤j−1

b

i

, 2 ≤ j ≤ d

.

(15)

Additionally,for any

b

in

B(d)

, let

•

Max

(b)

be the maximal value of

b

,

• s

(ℓ)

_{(X; b)}

bethe randomve tor that gathersallthe omponents of

X

with a blo k index equal to

ℓ

,

• d

ℓ

be the numberof elements of

b

that are equal to

ℓ

,

• S

ℓ

_(N)

bethesetthatgathersthe

N

independentrealizationsof

s

(ℓ)

_{(X; b)}

that havebeen extra ted fromthe

N

independentrealizations of

X

in

S(N)

.

Thereexists abije tionbetween

B(d)

and thesetofallblo kbyblo k

de- ompositionsof

X

. Forinstan e,for

d = 5

,alltheelementsof

{(i, j, i, k, k), 1 ≤ i 6= j 6= k ≤ 5}

orrespond tothesame blo k de ompositionof

X

, butonly

b

= (1, 2, 1, 3, 3)

is in

B(d)

. We an alsoidentify

s

(1)

(X; b) = (X

1 , X

3 ),

s

(2)

(X; b) = X

2 ,

s

(3)

(X; b) = (X

4 , X

5 ),

(16)

Max

(b) = 3,

d

1 = 2 d

2 = 1,

d

3 = 2.

(17)

A ording to Eq. (12), for any

H

ℓ

in

M

+

_(d

ℓ

)

, the PDF of

s

(ℓ)

_{(X; b)}

an

be approximated by

p

e

s

(ℓ)

(X;b)

(·; H

ℓ

, S

ℓ

(N))

. It follows that the PDF of

X

(11)

e

p

X

(x; H

₁

, . . . , H

Max

(b)

, S(N), b) :=

Max

(b)

Y

ℓ=1

e

p

s

(ℓ)

_(X;b)

(s

(ℓ)

(x; b); H

_ℓ

, S

ℓ

(N)).

(18)

Su h a onstru tionforthe PDFof

X

meansthat the ve tors

s

(ℓ)

_{(X; b)}

,

1 ≤ ℓ ≤

Max

(b)

, are assumed to be independent. For any

b

in

B(d)

, let

H

MLE

1 (b), . . . , H

MLE

d

(b)

betheargumentsthatmaximizetheLOOlikelihood

asso iated with

p

e

X

. Hen e, for a given blo k by blo k de omposition of

X

that is hara terized by a given value of

b

, the most likely G-KDE of

p

X

is

given by

e

p

X

(x; H

MLE

1 (b), . . . , H

MLE

d

(b), S(N), b).

(19) UsingEqs. (9),(12)and(18),forany

b

in

B(d)

andany

(H

1 , . . . , H

Max

(b)

)

in

M

+

_(d

1 ) × · · · × M

+

(d

Max

(b)

)

, this LOO likelihoodis given by

L

LOO

(S(N)|H

1 . . . , H

d

, b) =

Max

(b)

Y

ℓ=1

N

Y

n=1

1 N − 1

N

X

m=1,m6=n

e

φ

n,m

(H

ℓ

, b),

(20)

e

φ

n,m

(H

ℓ

, b) := φ s

(ℓ)

(X(ω

n

); b); A

ℓ

s

(ℓ)

(X(ω

m

); b), H

(ℓ)

,

(21)

H

ℓ

:= I

d

ℓ

−

N − 1

N

A

ℓ

A

T

ℓ

.

(22) Noti ingthat

max

H

₁

_,...,H

Max

(b)

,b

Max

(b)

Y

ℓ=1

N

Y

n=1

1 N − 1

N

X

m=1,m6=n

e

φ

n,m

(H

ℓ

, b)

= max

b

Max

(b)

Y

ℓ=1

max

H

_ℓ

N

Y

n=1

1 N − 1

N

X

m=1,m6=n

e

φ

n,m

(H

ℓ

, b),

(23)

itfollows thatforagivenblo kbyblo kde ompositionof

X

,the mostlikely

values of

H

1 , . . . , H

Max

(b)

an be omputed independently, and saved for

a possible re-use for an other value of

b

. Indeed, if

b

(1)

_{= (1, 1, 2, 2)}

(12)

values

H

(1)

1

and

H

(1)

2

havetobe hosen forthe bandwidth matri es(one for

ea hblo k). ThismeansthattwoindependentLOOlikelihoodmaximization

problems have to be solved. In the same manner, if

b

(2)

= (1, 1, 2, 3)

, three values

H

(2)

1

,

H

(2)

2

and

H

(2)

3

have tobe hosen. However, given the same set

of realizations of

X

, it is lear that the most likely value of

H

(1)

1

is equal

to the most likely value of

H

(2)

1

. Hen e, the most likely value of

b

, whi h is

denoted by

b

MLE , is eventually solutionof

b

MLE

:= arg max

b

_∈B(d)

L

LOO

(S(N)|H

MLE

1 (b), . . . , H

MLE

d

(b), b).

(24)

There, we remindthat for any

b

in

B(d)

and any

1 ≤ ℓ ≤

Max

(b)

,

H

MLE

ℓ

(b) := arg

_H

max

ℓ

∈M

+

(d

ℓ

)

N

Y

n=1

1 N − 1

N

X

m=1,m6=n

e

φ

n,m

(H

ℓ

, b).

(25)

Analyzing the value of

b

MLE

an give information on the a tual

depen-den e stru ture for the omponents of

X

. Indeed, if

b

MLE

= (1, . . . , 1)

, the

mostappropriaterepresentationforthePDFof

X

isits lassi al

multidimen-sionalGaussiankernel estimation. Thiswouldmeanthat allthe omponents

of

X

are likely to be dependent. On the ontrary, if

b

MLE

= (1, 2, . . . , d)

,

the most likely representation orresponds to the assumption that all the

omponents of

X

are independent. Other values of

b

MLE

an also be used

to identify groups of dependent omponents of

X

, whi h are likely to be

independent the ones to the others.

3.2. Pra ti alsolving of the blo k byblo k de omposition problem

The optimization problem dened by Eq. (24) being very omplex, we

suggest to sear h the most likely blo k by blo k de omposition of

X

using

verysimpleparametrizationsofthebandwidthmatri es. Indeed,on eve tor

X

has been entred and un orrelated, it is reasonable to parametrize ea h

bandwidth matrix

H

ℓ

by a unique s alar

h

ℓ

, su h that

H

ℓ

= h

2 ℓ

I

d

ℓ

. From

Eq. (22), it follows that

A

ℓ

=

N

N − 1

q

1 − h

2 ℓ

I

d

ℓ

.

(26)

Hen e,foragiven pre ision

ǫ

,the omplexproblemofsear hing themost

likely values of

H

1 , . . . , H

Max

(b)

an be redu ed to minimizingMax

(b)

non

(13)

value of

d

1 2 3 4 5 6 7 8 9 10 value of

N

B(d)

1 2 5 15 52 203 877 4140 21147 115975 value of

N

max greedy

(d)

1 3 8 17 31 51 78 113 157 211

Table1: Evolutionof

N

B(d)

and

N

max

greedy

(d)

withrespe tto

d

.

in parallel, and ea h minimization problem an be solved very e iently

using a ombinationof golden se tion sear h and su essive paraboli

inter-polations (see [1℄ for further details about this method). However, solving

the optimization problem dened by Eq. (24) an still be omputationally

demandingwhen

d

in reases. Indeed,asit anbeseeninTable1,thenumber

of admissiblevalues of

b

, whi h isdenoted by

N

B(d)

, in reases exponentially

with respe t to

d

. Hen e, a brute for e approa h, whi h would onsist in

testing allthe possible values of

b

, an not be used to identify

b

MLE .

Asanalternative,weproposeto onsideragreedyalgorithm,whose

om-putational ost an bebounded. Startingfroma ongurationwhere allthe

omponentsof

X

areinthe sameblo k,whi h orrespondsto

b

= (1, . . . , 1)

,

the idea of this algorithmis toremove iteratively one element of this initial

blo k, and to put it in a blo k that would be already built, or in a new

blo k where it is the only element. The Algorithm 2 provides a more

de-tailed des ription of this pro edure. By onstru tion, the number

N

greedy

(d)

of evaluations of

b

7→ max

h

L

LOO

(S(N)|b, h)

veries

N

greedy

(d) ≤ N

max greedy

(d) := 1 +

d−2

X

i=0

(d − i)(i + 1) ≤ d

3 .

(27)

For

d > 4

,su hanalgorithm an therefore beused to approximate

b

MLE

at a omputational ost that ismu hmore aordablethan a dire t

identi- ation based on

N

B(d)

evaluations of

b

7→ max

h

L

LOO

(S(N)|b, h)

.

When modelling high dimensional random ve tors (

d ∼ 50 − 100

), the

value of

N

max

greedy

(d)

, whi h is denitely mu h smaller than

N

B(d)

, an also

be omevery high:

N

max

greedy

(d = 50) = 22051, N

max

greedy

(d = 100) = 171601.

(28)

To identify relevant values for

b

at a lower omputational ost in su h

(14)

1 Initialization:

b

∗

= (1, . . . , 1)

,ind.blo ked

= ∅

; 2 for

k = 1 : d

do 3

L

(k)

_{= ∅}

,

b

(k)

= ∅

, index

(k)

_{= ∅}

,

ℓ = 1

; 4 for

i ∈ {1, . . . , d} \

ind.blo ked do 5 for

j = 2 : min(d,

Max

(b

⋆

_{) + 1)}

do

6 Adapt the value of the blo k index:

b

temp

:= b

∗

,

b

temp

i

= j

; 7 Compute:

L

temp

= max

h

L

LOO

(S(N)|b

temp

, h)

; 8 Save results:

L

(k)

_{{ℓ} = L}

temp ,

b

(k)

_{{ℓ} = b}

temp , index

(k)

_{{ℓ} = i}

; 9 In rement:

ℓ ← ℓ + 1

; 10 end 11 end

12 Find the best blo k index at iteration

k

:

ℓ

∗

_{= arg max}

ℓ

L

(k)

{ℓ}

;

13 A tualize:

b

∗

_{← b}

(k)

_{ℓ

∗

_}

, ind.blo ked

←

ind.blo ked

∪

index

(k)

_{ℓ

∗

_}

;

14 end

15 Maximize over alliterations:

(ℓ

greedy

, k

greedy

) := arg max

ℓ,k

L

(k)

{ℓ}

; 16 Approximate

b

MLE

≈ b

(k

greedy

)

_ℓ

greedy

.

Algorithm 2:Greedy sear h of

b

MLE .

(15)

algorithmsto the ase of the identi ationof the most likely blo k by blo k

de omposition of

X

is proposed. The fusionand the mutationpro esses on

whi h su h algorithms are generally based, as well as a pseudo-proje tion

in

B(d)

are therefore detailedin Appendix. In these algorithms, for any set

S

(whi h an be dis rete or ontinuous), we denote by

U(S)

the uniform

distribution over

S

. Based on these three fun tions, the Algorithm 3 shows

the geneti pro edure we suggest for solving Eq. (24). The results given by

this geneti algorithmare dependent onthree parameters:

•

the maximumnumberof iterations

i

max ,

•

the probabilityof mutation

p

Mut ,

•

the size of the population we are onsidering in the geneti algorithm

N

pop .

Forthisalgorithm,thenumberofevaluationsof

b

7→ max

h

L

LOO

(S(N)|b, h)

is equal to

N

tot

= i

max

× N

pop . Fora given value of

N

tot , it is however hard

toinferthe optimalvaluesforthesethreeparameters,asitdependson

d

and

on the optimal blo k-by-blo k stru ture of the onsidered random ve tor of

interest. However, from the analysis of a series of numeri al examples, it is

generally interesting to hoose small values for

p

Mut

to limitthe number of

spontaneous mutations, and favour high values for the numberof iterations

i

max

ratherthan for the populationsize

N

pop .

On e a satisfying value

bb

MLE

of

b

has been identied using the s alar

parametrizationofthebandwidthmatri es,itispossibletoenri hthe

parametriza-tion ofthe bandwidthmatri estoimprovethe nonparametri representation

of the PDFof

X

. This amountsat solving

H

MLE

ℓ

(b

b

MLE

) = arg

max

H

_ℓ

_∈M

+

_(d

_ℓ

₎

1 N − 1

N

X

m=1,m6=n

e

φ

n,m

(H

ℓ

, b

b

MLE

)

(29)

for all

1 ≤ ℓ ≤

Max

(b

b

MLE

)

. In pra ti e, we observed on a series of test

ases that the interest of su h an enri hment of the bandwidth matrix was

(16)

1 Choose

N

pop

≥ 2

,

0 ≤ p

Mut

≤ 1

and

i

max

≥ 1

; 2 Initialization; 3 Dene

B = ∅

,

L = ∅

, in

= 1

; 4 Choose atrandom

N

pop elements of

B(d)

,

n

b

(1)

, . . . , b

(N

pop

)

o

; 5 for n = 1 :

N

pop do 6 Compute:

L

temp

= max

h

L

LOO

(S(N)|b

(n)

, h)

;

7 Save results:

L {

in

} = L

temp ,

B {

in

} = b

(n)

, in

=

in

+ 1

; 8 end 9 Iteration ; 10 for i = 2 :

i

max do 11 Gatherin

S

the

N

pop

elements of

B

asso iated with the

N

pop

highest

values of

L

;

12 Choose atrandom

N

pop

distin tpairs of elementsof

S

:

n

b

(n,1)

, b

(n,2)

, 1 ≤ n ≤ N

pop

o

; 13 for n = 1 :

N

pop do 14 Fusion:

b

Fus

=

Fusion

(b

(n,1)

_{, b}

(n,2)

₎

; 15 Mutation:

b

Mut

=

Mutation

(b

Fus

, p

Mut

)

; 16 Compute:

L

temp

= max

h

L

LOO

(S(N)|b

Mut

, h)

;

17 Save results:

L {

in

} = L

temp ,

B {

in

} = b

Mut , in

=

in

+ 1

; 18 end 19 end

20 Maximize over alliterations:

k

gene

= arg max

1≤k≤

in

−1

L {k}

;

21 Approximate

b

MLE

≈ B {k

gene

}

.

Algorithm 3: Geneti sear h of

b

MLE

. The fun tions Mutation() and

Fusion() are presented in Appendix, and are detailed in Algorithms 4

(17)

The purpose of this se tion is toillustrate the interest of the orrelation

onstraints and the tensorized formulation for the nonparametri

represen-tation of PDFs when the maximal informationis a nite set of independent

realizations. To this end, a series of examples will be presented. The rst

examples will be based on generated data, so that the errors an be

on-trolled,whereas the lastexamplepresentsanindustrialappli ationbased on

experimentaldata.

4.1. Monte Carlo simulation studies

4.1.1. Lemnis atefun tion

Let

U

bearandomvaluethatisuniformlydistributedon

[−0.85π, 0.85π]

,

ξ

= (ξ

1 , ξ

2 )

bea2-dimensionalrandomve tor whose omponentsare two

in-dependent standard Gaussianvariables,and

X

L

= (X

L

1 , X

L

2 )

be the random ve tor sothat

X

L

=

sin(U)

1 + cos(U)

2 ,

sin(U) cos(U)

1 + cos(U)

2 + 0.05ξ.

(30)

Weassumethat

N = 200

X

L

havebeen

gath-ered in

S(N)

. Given this information, we would like to generate additional

points that ould sensibly be onsidered as new independent realizations of

X

L

. Based on the G-KDE formalism presented in Se tion 2, four kinds of

generators are ompared in Figure 1, depending on the value of the

band-width and onthe onstraintson the statisti almoments of

X

L .

•

Case 1:

p

X

L is approximated by

p

c

X

L

(·; (h

Silv

(d, N))

2 _I

d

, S(N))

, whi h

is dened by Eq. (1) (no onstraints).

•

Case 2:

p

X

p

f

X

L

(·; (h

Silv

(d, N))

2 _I

d

, S(N))

, whi h

is dened by Eq. (12) ( onstraints onthe mean and the ovarian e).

•

Case3:

p

X

Lisapproximatedby

p

c

X

L

(·; (h

MLE

(d, N))

2 _I

d

, S(N))

(no on-straints).

•

Case 4:

p

X

p

f

X

L

(·; (h

MLE

(d, N))

2 _I

d

, S(N))

(18)

The relevan e of the dierent approximations of

p

X

L an be analysed

from a graphi al point of view inFigure 1. It is instru tive to ompare the

asso iatedvaluesoftheLOOlikelihood,whi hisdenotedby

L

LOO

(S(N)|H)

,

as the higher this value, the more likely the approximation. Hen e, for this

example, introdu ing onstraints onthe mean and the ovarian e of the

G-KDE tends to slightly in rease the values of

L

LOO

(S(N)|H)

. Moreover,

these results are strongly improved when hoosing

h

MLE

(d, N)

instead of

h

Silv

(d, N)

. Then, for these four ases, Figure 2 ompares the evolution of

h

Silv

(d, N)

and

h

MLE

(d, N)

with respe t to

N

, and shows the asso iated

values of the LOO likelihood. For this example, it an therefore be seen

that

h

Silv

(d, N)

strongly overestimates the s attering of the distribution of

X

L

, for any onsidered values of

N

. This is not the ase when working

with

h

MLE

(d, N)

. It is also interesting to noti e that for values of

N

lower

than

10

4

(whi his veryhigh for2-dimensional ases),the dieren ebetween

h

MLE

(d, N)

and

h

Silv

(d, N)

is always important.

4.1.2. Fourbran hes lover-knot fun tion

Inthesamemannerthaninthepreviousse tion,let

U

bearandomvalue

that is uniformly distributed on

[−π, π]

,

ξ

= (ξ

1 , ξ

2 , ξ

3 )

be a 3-dimensional

random ve tor whose omponentsare three independent standard Gaussian

variables,and

X

FB

be the random ve tor so that

X

FB

= (cos(U) + 2 cos(3U), sin(U) − 2 sin(3U), 2 sin(4U)) + ξ.

(31)

On eagain,startingfromadatasetof

N = 200

independentrealizations,

we would like to be able to generate additional realizations of

X

FB . For

this 3-dimensional ase, as in the previous se tion, Figures 3 and 4 allow

us to underline the interest of onsidering G-KDE representations that are

onstrained in terms of mean and ovarian e, for whi h the bandwidths are

optimized from the likelihoodmaximizationpointof view.

4.1.3. Interest of the blo k-by-blo k de omposition in higher dimensions

As explained in Se tion 3, when

d

is high, the G-KDE of

p

X

requires

very high values of

N

to be able to identify the manifold on whi h the

dis-tribution of

X

is on entrated. In other words, if

N

is xed, the higher

d

,

the higher

h

MLE

(d, N)

andthe more s atteredthe newrealizations of

X

. As

anillustrationofthis phenomenon, letus onsider thetwofollowing random

(19)

−2

2 −0.5

0.5

PSfrag repla ements

x1

x

2

(a)

log(L

LOO

(S(N )|h

2 _I

d

)) = −147

−2

2 −0.5

0.5

PSfrag repla ements

x1

x

2

(b)

log(L

LOO

(S(N )|h

2 _I

d

)) = −142

−2

2 −0.5

0.5

PSfrag repla ements

x1

x

2

( )

log(L

LOO

(S(N )|h

2 _I

d

)) = −39.0

−2

2 −0.5

0.5

PSfrag repla ements

x1

x

2

(d)

log(L

LOO

(S(N )|h

2 _I

d

)) = −38.7

Figure 1: Lemnis ate ase:

N

= 200

givendata points(big bla k squares) and

10

4

ad-ditional realizations(smallred andgreenpoints) generatedfrom aG-KDEapproa hfor

h

= h

Silv

(d, N )

(rst row)and

h

= h

MLE

(d, N )

(se ond row). The rst olumn

orre-spondstothe asewhereno onstraintsonthemeanandthe ovarian eofthegenerated

pointsareintrodu ed,whereasthese ond olumn orrespondstothe asewherethemean

and the ovarian eof thegenerated points areequal to theirempiri al estimations that

are omputedfromtheavailabledata. Under ea h graphis shownthevalueofthe LOO

(20)

10

2

10

3

10

4

10 −1

10

0

PSfrag repla ements

N

h

(a) Evolutionofthebandwidth

0 2000

4000

6000

8000

10000

−3000

−2500

−2000

−1500

−1000

−500

0

PSfrag repla ements

lo

g

(L

LOO

(S

(N

)|

h

2 I

d

))

N

(b)Evolutionof

L

LOO

(S(N )|h

2 I

d

)

Figure2:Evolutionofthebandwidth(left)andoftheLOO-likelihood(right)withrespe t

toN fortheLemnis ate fun tion(2D).Thereddottedlines orrespondto theSilverman

ase:

h

= h

Silv

(d, N )

. Thebla ksolidlines orrespondtotheMLE ase:

h

= h

MLE

(d, N )

.

Forthis 2D example, the distin tions between the ases with orrelation onstraints or

withoutwerenegligible ompared to thedieren ebetweentheSilvermanand theMLE

ases. Hen e,onlythe aseswhere orrelation onstraintsareimposedontheG-KDEare

represented. Ea h urve orrespondstothemeanvaluesof

h

and

log(L

LOO

(S(N )|h

2 _I

d

))

, whi h havebeen omputed from50 independentgenerated200-dimensional setsof

inde-pendentrealizationsof

X

L .

•

Case 1:

X

(2D)

_{= (X}

L

, Ξ

3 , . . . , Ξ

d

)

.

•

Case 2:

X

(3D)

_{= (X}

FB

, Ξ

4 , . . . , Ξ

d

)

.

Here,

Ξ

3 , . . . , Ξ

d

denote

d

independent standard Gaussian random

vari-ables, whereas the random ve tors

X

L

and

X

FB

have been introdu ed in

Se tion 4.1. Forthese two ases, two ongurations are ompared.

•

On the rst hand, a lassi alG-KDE ofthe PDFsof

X

(2D)

and

X

(3D)

is omputed. In that ase, noblo k de ompositionis arried out. The

blo k by blo k ve tors asso iated with these modelling, whi h are

re-spe tively denoted by

b

(2D,1)

and

b

(3D,1)

, are equalto

(1, . . . , 1)

.

•

Onthese ondhand,weimpose

b

(2D,2)

_{= (1, 1, 2, . . . , d−1)}

and

b

(3D,2)

₌

(1, 1, 1, 2, . . . , d − 2)

,and webuild theasso iatedtensorized versionsof

the G-KDE of the PDFs of

X

(2D)

and

X

(3D)

.

Hen e,when noblo k de omposition is arriedout, we anverify in

Fig-ure 5 that

h

MLE

(21)

−5

5 −5

5

PSfrag repla ements

x1

x2

x

3

(a)

log(L

LOO

(S(N )|h

2 _I

d

)) = −1.102 × 10

3 −5

5 −5

5

PSfrag repla ements

x1

x2

x

3

(b)

log(L

LOO

(S(N )|h

2 _I

d

)) = −1.101 × 10

3 −5

5 −5

5

PSfrag repla ements

x1

x2

x

3

( )

log(L

LOO

(S(N )|h

2 _I

d

)) = −8.00 × 10

2 −5

5 −5

5

PSfrag repla ements

x1

x2

x

3

(d)

log(L

LOO

(S(N )|h

2 _I

d

)) = −7.98 × 10

2

Figure 3: Fourbran hes lover-knot ase:

N

= 200

givendatapoints(bigbla ksquares) and

10

4

additional realizations (small red and green points) generated from a G-KDE

approa h for

h

= h

Silv

(d, N )

(rst row) and

h

= h

MLE

(d, N )

(se ond row). The rst

olumn orresponds tothe asewhere no onstraintsonthemeanand the ovarian eof

thegenerated pointsare introdu ed, whereasthese ond olumn orrespondstothe ase

where the mean and the ovarian e of thegenerated pointsare equalto theirempiri al

estimations that are omputed from the available data. Under ea h graphis shownthe

(22)

10

2

10

3

10

4

10 −1

10

0

PSfrag repla ements

N

h

(a) Evolutionofthebandwidth

0 2000

4000

6000

8000

10000

−5

−4

−3

−2

−1

0 x 10

4

PSfrag repla ements

lo

g

(L

LOO

(S

(N

)|

h

2 I

d

))

N

(b)Evolutionof

L

LOO

(S(N )|h

2 I

d

)

Figure4:Evolutionofthebandwidth(left)andoftheLOO-likelihood(right)withrespe t

to Nforthefour bran hes lover-knotfun tion(3D).Thereddotted lines orrespondto

the Silverman ase:

h

= h

Silv

(d, N )

. The bla ksolid lines orrespond to theMLE ase:

h

= h

MLE

(d, N )

. Forthis3Dexample,thedistin tionsbetweenthe aseswith orrelation

onstraintsorwithoutwere negligible ompared to thedieren ebetweentheSilverman

and the MLE ases. Hen e, only the ases where orrelation onstraints are imposed

on the G-KDE are represented. Ea h urve orresponds to the mean values of

h

and

log(L

LOO

(S(N )|h

2 _I

d

))

, whi h havebeen omputed from 50independent generated 200-dimensional setsofindependentrealizationsof

X

KB .

onsidered ases. Asa onsequen e, the apa ity ofthe lassi alG-KDE

for-malismto on entratethenewrealizationsof

X

(2D)

and

X

(3D)

onthe orre t subspa es of

R

d

de reases when

d

in reases. To illustratethis phenomenon,

for

N = 500

, Figure6 ompares the positionsof the rst omponents of the

available realizations of

X

(2D)

and

X

(3D)

, and the orresponding positions

of

10

4

additionalpoints generated froma G-KDE approa h. Hen e, just by

working onthe optimization of the value of the bandwidth,it isqui kly

im-possibleto re over the subsets of

R

2

and

R

3

on whi hthe true distributions

of

X

L

and

X

FB

are on entrated. Onthe ontrary,whenthe blo k by blo k

de ompositions given by

b

(2D,2)

and

b

(3D,2)

are onsidered, the

approxima-tion of the PDFs of the two rst omponents of

X

(2D)

and

X

(3D)

is not

ae ted by the presen e of the additionalrandomvariables

Ξ

3 , . . . , Ξ

d

. Asa

onsequen e, for ea h onsidered values of

d

, the new generated points are

on entrated onthe orre tsubspa es, as it an be seen inFigure 6.

Atlast, thehighinterestofintrodu ingtheblo kbyblo kde omposition

for thesetwo examplesisemphasizedby omparinginFigure5the values of

(23)

5

10

15

20

0

0.2

0.4

0.6

0.8

1

PSfrag repla ements

d

h

MLE

(d

,N

)

(a) Lemnis atefun tion

5

10

15

20

0

0.2

0.4

0.6

0.8

1

PSfrag repla ements

h

MLE

(d

,N

)

d

(b) Fourbran hes lover-knotfun tion

5

10

15

20 −14000

−12000

−10000

−8000

−6000

−4000

−2000

0

PSfrag repla ements

d

lo

g

(L

LOO

(S

(N

)|

h

2 I

d

,b

))

b

= b

(2D,1)

b

= b

(2D,2)

( ) Lemnis atefun tion

5

10

15

20 −16000

−14000

−12000

−10000

−8000

−6000

−4000

−2000

0

PSfrag repla ements

lo

g

(L

LOO

(S

(N

)|

h

2 I

d

,b

))

d

b

= b

(2D,1)

b

= b

(2D,2)

(d) Fourbran hes lover-knotfun tion

Figure5: Evolutionsof

h

MLE

(d, N )

and

L

LOO

(S(N )|(h

MLE

(d, N ))

2 _I

d

)

withrespe tto

d

,

for

N

= 500

. Theleft olumn orrespond to the modelling of

X

(2D)

, whereasthe right

olumn orrespondstothemodellingof

X

(3D)

(24)

−2

2 −0.5

0.5

PSfrag repla ements

d = 3

x1

x

2

(a)

b

= b

(2D,1)

−5

5 −5

5 z

PSfrag repla ements

d = 4

x1

x2

(b)

b

= b

(3D,1)

−2

2 −0.5

0.5

PSfrag repla ements

d = 5

x1

x

2

( )

b

= b

(2D,1)

−5

5 −5

5 z

PSfrag repla ements

d = 6

x1

x2

(d)

b

= b

(3D,1)

−2

2 −0.5

0.5

PSfrag repla ements

d = 20

x1

x

2

(e)

b

= b

(2D,2)

−5

5 −5

5 z

PSfrag repla ements

d = 20

x1

x2

(f)

b

= b

(3D,2)

Figure6: Comparisonbetweenthepositionsof

N

= 500

givenvaluesof

X

(2D)

and

X

(3D)

(bigbla ksquares)andthepositionsof

10

4

additionalvalues(smallredpoints)generated

from aG-KDE approa h for several values of

d

and two ongurations of the blo k by

blo k de omposition for theG-KDE of thePDF of

X

(2D)

and

X

(3D)

. The left olumn

orresponds to the modelling of

X

(2D)

, whereas the right olumn orresponds to the

modellingof

X

(3D)

(25)

value of

b

value of

h

MLE

(d, N)

log(L

LOO

(S(N)|h

MLE

(d, N), b))

(1,1,1,1,1) 0.395

−1.21 × 10

3

(1,2,3,4,5) (0.115,0.163,0.0971,0.108,0.118)

−1.15 × 10

3

(1,2,1,2,1) (0.290,0.226)

−1.19 × 10

3

(1,1,2,2,2) (0.113,0.140)

−8

.35 × 10

2

(1,1,2,2,3) (0.113,0.119,0.118)

−9.96 × 10

2

Table 2: Inuen e of the hoi e of

b

on the LOO log-likelihood of the G-KDE for the

modelingofthePDF of

X

= (X

L

,

X

FB

)

with

N

= 200

independentrealizations.

In the same manner, if we dene

X

as the on atenation of

X

L and

X

FB

, whi h are hosen independent, the interest of introdu ing the orre t

blo k by blo k de omposition of

X

in terms of likelihood maximization is

shown in Table 2. Indeed, hoosing

b

= (1, 1, 2, 2, 2)

instead of the two

lassi al a priori hoi es

b

= (1, 1, 1, 1, 1)

(all the omponents are modelled

at the same time) and

b

= (1, 2, 3, 4, 5)

(all the omponents are modelled

separately), allows ustostrongly in rease the likelihoodasso iated with the

approximation of the PDF of

X

. Re ipro ally, su h an example seems to

onrm the fa t that maximizing

L

LOO

(S(N)|h

MLE

(d, N), b)

shouldhelp us

to nd the dependen e stru ture in the omponents of

X

.

4.1.4. E ien y of the proposed algorithms for the blo k-by-blo k

de ompo-sition

Thisse tion aims at omparingthe e ien y of the proposed algorithms

for solving the optimization problem given by Eq. (24). To this end, using

the same notations than in Se tion 3.1, we denote by

X

the randomve tor

su h that for all

1 ≤ ℓ ≤

Max

(b)

,

s

(ℓ)

(X, b) = ξ

(ℓ)

/

_ξ

(ℓ)

_{+ 0.15Ξ}

(ℓ)

.

(32)

Here

ξ

(ℓ)

and

Ξ

(ℓ)

denote independent standard Gaussian random ve tors,

and

k·k

denotes the lassi al Eu lidean norm. By onstru tion, the random

ve tors

s

(ℓ)

_{(X, b)}

are on entratedon

d

ℓ

-dimensionalhyper-spheres,

d

ℓ

being

the dimensionof

s

(ℓ)

_{(X, b)}

. Thus, randomve tor

X

presents aknown blo k

by blo k stru ture, and its distributionis on entrated ona subset of

R

d

.

(26)

Chosen values of

b

d

N

B(d)

N

greedy

(d)

N

b

(10,0.01)

gene

(d)

(1,2,2,1,3,4,1) 7 877 68 (52) 32.4 (1,2,2,1,3,4,1,2,4,5) 10 115975 174 (141) 25.5 (1,2,2,1,3,4,1,2,4,5,5,6,3,4,7,6,8,1,2,7) 20

5.17 × 10

17

968 (879) 51.1

Table 3: Comparison of the e ien y of the greedy and the geneti algorithms for the

identi ationoftheblo k-by-blo kstru tureof

X

.

p

Mut

\ N

pop 2 5 10 20 50 0

∞

13.5 16.0 35.8 58.1 0.005

∞

15.9 19.5 39.7 77.7 0.01 13.4 15.7 25.5 36.0 62.9 0.1 22.9 44.3 41.0 64.7 78.7

Table4: Inuen eoftheparameters

p

Mut

and

N

pop

onthemeannumberoftestedvalues

of

b

,whi hisdenotedby

N

b

(Np op,p

Mut

)

gene

(d)

.

500 X

. Fordierentvaluesof

d

and

b

,theability

of the greedy and the geneti algorithms to nd ba k the orre t blo k by

blo kstru tureof

X

is omparedinTable3. Inthistable,

N

pop

= 10

,

p

Mut

=

0.01

, and we denote by

N

b

(N

pop

,p

Mut

)

gene

(d)

the mean number of distin t values

of

b

that were tested for the geneti algorithmto identify the optimal value

of

b

. These values were omputed from 20 runs of the algorithm initialized

in 20dierent initialpopulations hosen at randomin

B(d)

. For the greedy

ase, the algorithm,whi h isdeterministi ,was run untilit stopped, and we

indi ate inTable 3 two quantities: the totalnumberof iterations

N

greedy

(d)

,

and, in parenthesis, the number of iterations that was a tually needed to

get the best value of

b

. Hen e, for these parti ular examples, the geneti

algorithmwas more e ient than the greedy one.

Theinuen eofparameters

N

pop

and

p

Mut

isthenanalysedinTable4,for

d = 10

and

b

= (1, 2, 2, 1, 3, 4, 1, 2, 4, 5)

. Avalueof

N

b

(N

pop

,p

Mut

)

gene

(d)

equalto

∞

meansthatthe orre tvaluewasneverfound after

10

5

iterations. Therefore,

this example(the same thing was observed for the other exampleswe tried)

seem to en ourage the use of small(but not zero) values of

p

Mut

, as well as

smallvalues of

N

pop

(27)

The me hani al behaviour of the railway tra k strongly depends on the

tra k superstru ture and substru ture omponents. In parti ular, the

me- hani al properties of the ballast layer are very important. Therefore, a

series of studies are in progress to better analyse the inuen e of the

bal-lastshapeonthe railway tra k performan e. In that prospe t, the shapesof

N = 975

ballastgrainshavebeenmeasuredverypre isely. Asanillustration,

Figure 7 shows the s ans of three ballast grains. These measurements an

be onsidered as independent realizations of a omplex randomeld. From

thisnitesetofrealizations,aKarhunen-Loèveexpansion(see [9,8℄formore

details about this method) has been arriedout toredu e the statisti al

di-mension of this random eld. Without entering too mu h into details, we

admit in this paper that the random eld asso iated with the varying

bal-last shape an nallybe parametrizedby a

117

-dimensional randomve tor,

whi h is denoted by

X

. As a onsequen e of the Karhunen-Loève

expan-sion, this randomve tor is entred and its ovarian e matrix is equalto the

117

-dimensionalidentity matrix:

E [X] = 0,

E[X ⊗ X] = I

117 .

(33)

From the experimental data, we have a ess to

N = 975

independent

realizationsof

X

,whi hare gatheredin

S(N)

. Basedonthismaximal

avail-able information, we would like to identify the PDF of

X

from a G-KDE

approa h. The results asso iated with several modellings based on the

G-KDE formalismaresummarizedinTable5. In thistable, wenoti ethe high

interest of introdu ing orrelation onstraints. Indeed, for su h a very high

dimensional problem with relatively little data, if no onstraints are

intro-du ed, we get very poormodels asso iated with very low values of the LOO

likelihood. In that ase, assuming that all the omponents are independent

leads to better results than assumingthat they are alldependent. This an

beexplainedby the fa tthat if allthe omponentof

X

are hosen

indepen-dent, we impose a diagonalstru ture for

E[X ⊗ X]

, whi h is, in that ase,

very lose toimposing that

E[X ⊗ X] = I

117

.

On the ontrary, mu h higher values of the LOO likelihoodare obtained

by adding onstraints on the mean value and the ovarian e matrix of the

G-KDE of the PDF of

X

. In both ases, it an be noti ed that it is worth

working on the values of the bandwidth. Indeed, passing from

h

Silv

(d, N)

to

h

MLE

(28)

Value of

b

Value of

h

Correlation onstraints LOO Log-likelihood

(1, . . . , 1)

h

Silv

(d, N)

no -179379

(1, . . . , 1)

h

MLE

(d, N)

no -176886

(1, . . . , d)

h

MLE

(d, N)

no -162398

(1, . . . , 1)

h

Silv

(d, N)

yes -161745

(1, . . . , 1)

h

MLE

(d, N)

yes -161262

(1, . . . , d)

h

MLE

(d, N)

yes -161775

b

M LE

h

MLE

(d, N)

yes -160930

Table 5: Inuen e of the value of the bandwidth, of the presen e of onstraintson the

ovarian e,and ofthe hoi eof theblo kbyblo kde omposition fortheapproximation

ofthePDFof

X

.

At last, introdu ingthe tensorized representation as it is done in Se tion3,

and working onthe value ofthe blo k-by-blo k de ompositionof

X

leads to

anotherhighin reaseoftheLOOlikelihood. Forthisappli ation,thevalueof

b

M LE

has beenapproximatedfromthe ouplingofthe greedyalgorithmand

the geneti algorithmpresented inSe tion3. Thegreedy algorithmwas rst

laun hed, and stopped after 30000 iterations. Then, based on these results,

additional20000iterationswere performedusing the geneti algorithmwith

N

pop

= 500

and

p

Mut

= 0.005

.

Finally,by working on both the orrelation onstraints and the blo k by

blo k de ompositionof

X

, it ispossibleto onstru t, for this example,very

interesting statisti al models for

X

. Su h models an then be used for the

(29)

Thiswork onsiders the hallengingproblemofidentifying omplexPDFs

when the maximalavailableinformationisaset ofindependentrealizations.

In that prospe t, the multidimensional G-KDE method plays a key role,

as it presents a good ompromise between omplexity and e ien y. Two

adaptationsofthis methodhavebeen presented. First,amodiedformalism

is presented to make the mean and the ovarian e matrix of the estimated

PDF equal to their empiri alestimations. Then, tensorized representations

are proposed. These onstru tionsare based onthe identi ationof a blo k

byblo kdependen estru tureoftherandomve torsofinterest. Theinterest

of these two adaptations has nallybeen illustrated ona series ofanalyti al

examples and ona high-dimensionalindustrialexample.

Theidenti ationofthe bandwidthmatri esandoftheblo kstru tureis

arriedoutinthe frequen ydomain. InvestigatingBayesiansamplingforthe

bandwidthmatri esandtheblo kstru turesele tion ouldbeinterestingfor

future work.

Appendix

A1. Proof of Proposition 1

We an al ulate:

E

h

f

X

i

=

1 N

N

X

n=1

AX

(ω

n

) + β =

µ.

b

(34) Cov

(f

X) =

Z

R

d

x

_{⊗ x e}

p

X

(x; H, S(N))dx −

µ

b

⊗ b

µ

=

1 N

N

X

n=1

H

+ (AX(ω

n

) + β) ⊗ (AX(ω

n

) + β) −

µ

b

⊗ b

µ

= H +

N − 1

N

A b

R

X

A

T

= b

R

X

.

(35)

(30)

This se tion presents the three algorithms that are used in the geneti

algorithm dened in Se tion 3. Algorithm 4 presents the fusion fun tion,

Algorithm 5 des ribes the mutation fun tion, and Algorithm 6 shows the

pseudo proje tion on

B(d)

on whi h they are based.

1 Let

b

(1)

and

b

(2)

be two elements of

B(d)

; 2 Initialization:

b

= (0, . . . , 0)

, index

= {1, . . . , d}

,

n = 1

;

3 while index isnot empty do

4 Choose

i ∼ U({

index

})

,

j ∼ U({1, 2})

,

k ∼ U({1, 2})

;

5 Find

u

(1)

₌

whi h

(b

(1)

_{== b}

(1)

i

)

,

u

(2)

₌

whi h

(b

(2)

_{== b}

(2)

i

)

; 6 if k==1 then 7 Dene

v

= u

(j)

_∩

index ; 8 end 9 else 10 Dene

v

= (u

(1)

_{∪ u}

(2)

_{) ∩}

index ; 11 end 12 Fill

b[v] = n

;

13 A tualize

n ← n + 1

, index

←

index

\v

.

14 end

15 Fusion

(b

(1)

_{, b}

(2)

_{) := Π}

B(d)

_(b)

.

Algorithm 4: Algorithm for the fusion of two elements

b

(1)

and

b

(2)

of

B(d)

. Referen es

[1℄ R. Brent. Algorithms for Minimization without Derivatives. Englewood

Clis N.J.: Prenti e-Hall, 1973.

[2℄ R.R. Coifman,S. Lafon,A.B. Lee, M.Maggioni, B. Nadler,F. Warner,

and S.W. Zu ker. Geometri diusions as a tool for harmoni analysis

and stru ture denition of data: diusion maps. Pro .Natl. A ad. S i.

USA, 2005.

[3℄ Tarn Duong, Arianna Cowling, Inge Ko h, and M. P. Wand. Feature

signi an e for multivariate kernel density estimation. Computational

(31)

1 Let

b

be anelement of

B(d)

and

0 ≤ p

Mut

≤ 1

; 2 for

i = 1 : d

do 3 Choose

u ∼ U([0, 1])

; 4 if

u < p

Mut then 5

b

i

∼ U({1, . . . , d})\ {b

i

}

; 6 end 7 end 8 Mutation

(b, p

Mut

) := Π

B(d)

_(b)

.

Algorithm 5:Algorithm for the mutation of anelement

b

of

B(d)

.

1 Let

b

bean element of

{1, . . . , d}

d

, index

= (1, . . . , 1)

,

n = 1

,

b

∗

= (0, . . . , 0)

; 2 for

i = 1 : d

do 3 if sum(index)==0 then 4 break ; 5 end 6 else 7 Find

u

=

whi h

(b == b

i

)

; 8 Fill

b

∗

[u] = n

;

9 A tualize

n = n + 1

,index

[u] = 0

.

10 end 11 end 12

Π

B(d)

_{(b) := b}

∗

. Algorithm6:Pseudo-proje tion

Π

B(d)

_(b)

ofanyelement

b

in

{1, . . . , d}

d

on

B(d)

.