Aggregated tests of independence based on HSIC measures

(1)

HAL Id: cea-02617133

https://hal-cea.archives-ouvertes.fr/cea-02617133

Submitted on 25 May 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

measures

Anouar Meynaoui, Mélisande Albert, Béatrice Laurent, Amandine Marrel

To cite this version:

Anouar Meynaoui, Mélisande Albert, Béatrice Laurent, Amandine Marrel. Aggregated tests of

inde-pendence based on HSIC measures. EMS 2019 - European Meeting of Statisticians, Bernoulli Society,

Jul 2019, Palerme, Italy. �cea-02617133�

(2)

INSA de Toulouse

Institut de Mathématiques de Toulouse, France CEA, DEN, DER, France

Aggregated tests of independence based on HSIC

measures (part 2)

European Meeting of Statisticians, 2019

Anouar Meynaoui, Mélisande Albert, Béatrice Laurent, Amandine Marrel

(3)

Outline

Introduction

The aggregated testing procedure

Simulation results

Conclusion and Prospect

(4)

Introduction

We recall that westudy the independence of two real random vec-tors X = X(1)_{, . . . , X}(p)_{and Y = Y}(1)_{, . . . , Y}(q)_{with marginal} densities resp. denoted f1 and f2and joint density f .

We recall that we have an i.i.d. sample Zn= (Xi, Yi)1≤i ≤nof (X , Y ). We rely on HSIC-based independence tests withGaussian kernelskλ

and lµ resp. associated to X and Y .

In the previous talk, we first proposed for each couple of values

(λ, µ) a theoretical HSIC test of independence of level α in (0, 1), followed by anon-asymptotic permutation-basedtest, of the same level α.

Thepower of the permuted testis shown to be approximately the same as theoretical powerif enough permutations are used.

(5)

Introduction

When f − f1⊗ f2belongs to aSobolev ballwith regularity δ in (0, 2],

sharp upper boundsof the uniform separation rate w.r.t. the values of λ and µ are provided.

The HSIC test with theoptimal upper boundis shown to be mini-max over Sobolev balls.

This optimal test isnot adaptive,since it depends on the regularity δ.

In this talk, we provide an adaptive procedure of testing inde-pendence which doesn’t depend on the regularity δ.

This procedure is based on theaggregationof a collection of HSIC-tests with a collection of different bandwidths λ and µ.

Numerical studies to assess the performanceof the procedure and tocompare methodological choicesare then provided.

(6)

The aggregated testing procedure

Single HSIC-based test leads to thequestion of the choice of kernel bandwidths λ and µ.Heuristic choices are adopted in practice, with no theoretical justifications.

We propose here an aggregated testing procedure combining a

collectionof single tests based on different bandwidths.

We consider a finite or countable collection Λ × U of bandwidths in (0, +∞)p_{× (0, +∞)}q _{and a collection of positive weights} _ω

λ,µ /

(7)

The aggregated testing procedure

For a given α ∈ (0, 1), we define the aggregated test ∆αwhich rejects

(H0) if there is at least one (λ, µ) ∈ Λ × U such that [

HSICλ,µ > q λ,µ

1−uαe−ωλ,µ,

where uα is the less conservative value such that the test is of level

α, and is defined by uα= sup u > 0 ; Pf1⊗f2 sup (λ,µ)∈Λ×U [ HSICλ,µ− q λ,µ 1−ue−ωλ,µ > 0 ≤ α .

The test function ∆αassociated to this aggregated test, takes values

in {0, 1} and is defined by ∆α= 1 ⇐⇒ sup (λ,µ)∈Λ×U [ HSICλ,µ− q_1−uλ,µ αe−ωλ,µ > 0.

(8)

Oracle type conditions for the second kind error

The aggregated testing procedure ∆α is oflevelα.

Thesecond kinderror of the aggregated testing procedure ∆αverifies

the inequality Pf (∆α= 0) ≤ inf (λ,µ)∈Λ×U n Pf ∆λ,µ αe−ωλ,µ = 0 o , where ∆λ,µ

αe−ωλ,µ is the single test of level αe

−ωλ,µ _{associated to the}

bandwidths (λ, µ)

The aggregated testing procedure has asecond kind at most equal to β, if there exists at least one (λ, µ) ∈ Λ × U such that the test

∆λ,µ

(9)

Oracle type conditions for the second kind error

Theorem

Let α, β ∈ (0, 1), (kλ, lµ) / (λ, µ) ∈ Λ × U a collection of Gaussian

kernels andωλ,µ/ (λ, µ) ∈ Λ × U a collection of positive weights, such

thatP

(λ,µ)∈Λ×Ue

−ωλ,µ_{≤ 1.}

We assume that f , f1 and f2 are bounded. We also assume that all

bandwidths (λ, µ) in Λ × U verify the following conditions

max (λ1...λp, µ1...µq) < 1 and npλ1...λpµ1...µq > log

1

α

> 1.

Then, the uniform separation rate ρ ∆α, Sp+qδ (R), β, where δ ∈ (0, 2]

and R > 0 can be upper bounded as follows

(10)

Oracle type conditions for the second kind error

ρ ∆α, Sp+qδ (R), β 2 ≤ C (Mf, p, q, β, δ) inf (λ,µ)∈Λ×U ( 1 npλ1...λpµ1...µq log(1 α) + ωλ,µ + " p X i =1 λ2δ_i + q X j=1 µ2δ_j # )

where Mf = max (kf k∞, kf1k∞, kf2k∞) and C (Mf, p, q, β, δ) is a positive

constant depending only on its arguments.

This theorem gives anoracle type conditionof the uniform separa-tion rate. Indeed, without knowing the regularity of f − f1⊗ f2, we prove that the uniform separation rate of ∆α is of the same order as

(11)

Adaptive procedure of testing independence

We consider the bandwidth collections Λ and U defined by Λ = {(2−m1,1_{, . . . , 2}−m1,p_{) ; (m}

1,1, . . . , m1,p) ∈ (N∗)p}, (1)

U = {(2−m2,1_{, . . . , 2}−m2,q_{) ; (m}

2,1, . . . , m2,q) ∈ (N∗)q}. (2) We associate to every λ = (2−m1,1_{, . . . , 2}−m1,p_{) in Λ and}

µ = (2−m2,1_{, . . . , 2}−m2,q_{) in U the positive weights}

ωλ,µ= 2 p X i =1 log m1,i× π √ 6 + 2 q X j=1 log m2,j× π √ 6 , (3) so thatP (λ,µ)∈Λ×Ue −ωλ,µ_{= 1.}

(12)

Adaptive procedure of testing independence

Corollary

Assuming that log log(n) > 1, α, β ∈ (0, 1) and ∆α the aggregated

testing procedure, with the particular choice of Λ, U and the weights

(ωλ,µ)_{(λ,µ)∈Λ×U} defined in (1), (2) and (3). Then, the uniform separation

rate ρ ∆α, Sp+qδ (R), β of the aggregated test ∆α over Sobolev spaces

where δ in (0, 2], can be upper bounded as follows

ρ ∆α, Sp+qδ (R), β ≤ C (Mf, p, q, α, β, δ)  log log(n) n _4δ+(p+q)2δ , where Mf = max (kf k∞, kf1k∞, kf2k∞).

The rate of the aggregation procedure over the classes of Sobolev balls is in the same order of the smallest rate of single tests,up to a loglog (n) factor. This combined with the result on the lower bound over Sobolev shows that the aggregated test is adaptive

(13)

Implementation of the aggregated procedure

The collections Λ and U are finite in practice. The correction uαdefined as

uα= sup u > 0 ; Pf1⊗f2 sup (λ,µ)∈Λ×U [ HSICλ,µ− q λ,µ 1−ue−ωλ,µ > 0 ≤ α .

can be approached by a permutation method with Monte Carlo ap-proximation, as done inAlbert et al., 2015.

To compute the quantiles ˆqλ,µ

1−ue−ωλ,µ, we generate uniformly B1 independent random permutations τ1, ..., τB1, independent of Zn. We

then compute for each (λ, µ) ∈ Λ × U and each u > 0 the permuted quantilewith Monte Carlo approximationqˆλ,µ

1−ue−ωλ,µ.

(14)

Implementation of the aggregated procedure

To compute the probability Pf1⊗f2, we generate uniformlyB2

inde-pendent random permutations κ1, ..., κB2, independent of Zn. Denote

for all permutation κb, the corresponding permuted statistic

b

Hκb

λ,µ= [HSICλ,µ(Zκbn ) .

Then, the correction uαis approached by

ˆ uα= sup    u > 0 ; 1 B2 B2 X b=1 1 max(λ,µ)∈Λ×U n b H_λ,µκb −ˆqλ,µ 1−ue−ωλ,µ o >0 ≤ α    . (4) The supremum in Equation (4) is estimated bydichotomy.

Simulation result:the powers of the implemented and the theoretical procedures are approximately the same if enough permutation

(15)

Analytical examples

Dependence forms fromBerrett and Samworth., 2017:

(i). Defining the joint density fl, l = 1, . . . , 10 of (X , Y ) on [−π, π] by

fl(x , y ) =

1

4π2{1 + sin(lx ) sin(ly )} .

(ii). Considering X and Y as

X = L cos Θ +ε1

4, Y = L sin Θ +

ε2 4,

where L, Θ, ε1 and ε2 are independent, with L ∼ U {1, . . . , l } for l = 1, . . . , 10, Θ ∼ U [0, 2π] and ε1, ε2∼ N (0, 1).

(iii). Defining X ∼ U [−1, 1]. For a given ρ = 0.1, 0.2, . . . , 1, Y is defined

as Y = |X |ρ_{ε, where ε ∼ N (0, 1) independent with X .}

(16)

Collection of bandwidths

Choice of collections Λ and U: recommendation of dyadic collec-tion, multiple and dividers by powers of 2 of the X and Y standard deviations (respectively noted s and s0 in Figure 1).

s'/64 s'/32 s'/16 s'/8 s'/4 s'/2 s' 3s'/2 2s' 3s' s/64s/32s/16 s/8 s/4 s/2 s 3s/2 2s 3s λ µ 0.05 0.10 0.15 0.20 n=50 s'/64 s'/32 s'/16 s'/8 s'/4 s'/2 s' 3s'/2 2s' 3s' s/64s/32s/16 s/8 s/4 s/2 s 3s/2 2s 3s λ µ 0.1 0.2 0.3 0.4 0.5 n=100 s'/64 s'/32 s'/16 s'/8 s'/4 s'/2 s' 3s'/2 2s' 3s' s/64s/32s/16 s/8 s/4 s/2 s 3s/2 2s 3s λ µ 0.25 0.50 0.75 n=200

Figure 1:Analytical example (ii), l = 2. Power map of single HSIC test w.r.t. to kernel widths λ and µ respectively associated to X and Y , for sample sizes n = 50, 100 and 200.

(17)

Weights associated to the collection

Choice of weights: comparison of uniform and exponential

de-creasing weights in Figure2.

1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 r P o w er

Uniform weights & n= 200 Exponential weights & n= 200 Uniform weights & n= 100 Exponential weights & n= 100 Uniform weights & n= 50 Exponential weights & n= 50

Figure 2:Analytical example (ii), l = 2. Power of aggregated procedures with uniform and exponential weights, w.r.t. the number r of aggregated widths in each direction, for sample sizes n = 50, 100 and 200.

(18)

Comparison with other independence tests

Comparison with Single HSIC using the permutation method (De Lozzo et Marrel, 2016 ; Meynaoui et al., 2019)and the Mutual Infor-mation Test (MINT,Berrett et Samworth, 2017).

Figure 3:Power curves of MINT, single HSIC test and aggregated procedure for the mechanisms of dependence (i), (ii) and (iii).

(19)

Conclusion and Prospect

Proposition of a test procedure based on aggregating single HSIC tests with different choices of bandwidths.

Procedure adaptive over Sobolev balls, i.e. achieving the optimal uniform separation rate and does not depend on any regularity para-meter.

Encouraging results (on terms of test power) on some analytical examples.

Some possible improvements:

Extend the aggregation procedure to other characteristic kernels

andother types of random variables(e.g. discrete variables).

Extendthe aggregation procedure toother types of experimental designssuch as Quasi-Monte Carlo and Space Filling Designs. A confrontation of the methodology to areal data caseis in progress. Data stem from an industrial case simulating a severe nuclear reactor accidental scenario.

(20)

References

Albert, M., Bouret, Y., Fromont, M., Reynaud-Bouret, P., et al. (2015). Boots-trap and permutation tests of independence for point processes. The Annals of Statistics, 43(6):2537–2564.

Berrett, T. B. and Samworth, R. J. (2017). Nonparametric independence testing via mutual information. arXiv preprint arXiv :1711.06642.

De Lozzo, M. and Marrel, A. (2016b). New improvements in the use of dependence measures for sensitivity analysis and screening. Journal of Statistical Computation and Simulation, 86(15) :3038–3058.

Gretton, A., Bousquet, O., Smola, A.and Scholkopf, B., Measuring statistical de-pendence with Hilbert-Schmidt norms, ALT, 2005.

Meynaoui, A., Albert, M., Laurent, B., and Marrel, A. (2019). Aggregated test of independence based on hsic measures. arXiv preprint arXiv :1902.06441.

(21)

Acknowledgements.

The authors would like to thank the Innovation and Industrial Nuclear Support Division of CEA for funding this CEA PhD work performed in the frame of codes development for Generation IV nuclear reactor safety studies.