• Aucun résultat trouvé

On algorithms for computation of the Tukey depth

N/A
N/A
Protected

Academic year: 2022

Partager "On algorithms for computation of the Tukey depth"

Copied!
171
0
0

Texte intégral

(1)

Rainer Dyckerhoff a Xiaohui Liu b Karl Mosler a Pavlo Mozharovskyi c

a Institute of Econometrics and Statistics, University of Cologne

b School of Statistics,

Jiangxi University of Finance and Economics;

Research Center of Applied Statistics, Jiangxi University of Finance and Economics

c LTCI, Telecom Paris, Institut Polytechnique de Paris

LTCI Data Science seminar

Paris, June 6, 2019

(2)

Data depth

Tukey depth: definition Tukey trimmed regions Applications

Computation of the Tukey depth Theoretical background Algorithm and simulations

Computation of Tukey trimmed regions Existing approaches

The proposed algorithm Tukey median

Outlook for approximations

Conclusions

(3)

Data depth

Tukey depth: definition Tukey trimmed regions Applications

Computation of the Tukey depth Theoretical background Algorithm and simulations

Computation of Tukey trimmed regions Existing approaches

The proposed algorithm Tukey median

Outlook for approximations

Conclusions

(4)

Database

Tables containing data:

- rows = objects - columns = properties

Subject 1

Subject 161 Subject 2

Subject 162

Weight Age 1350

1500

32 32

1320 28

1150 27

(5)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

(6)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

(7)

Data depth

Tukey depth: definition Tukey trimmed regions Applications

Computation of the Tukey depth Theoretical background Algorithm and simulations

Computation of Tukey trimmed regions Existing approaches

The proposed algorithm Tukey median

Outlook for approximations

Conclusions

(8)

the “center” of a distribution. For x ∈ R d and a d -variate random vector X distributed as P ∈ P, a data depth is a function

D : R d × P → [0, 1], (x , P ) 7→ D(x |P ) that is:

I affine invariant: D(Ax + b|AX + b) = D(x |X );

I vanishing at infinity: lim ||x ||→∞ D(x |X ) = 0;

I monotone w.r.t. the deepest point: for any

x ∈ argmax x∈R d D(x |X ), any x ∈ R d , and any 0 ≤ α ≤ 1 it holds: D(x|X ) ≤ D(x + α(x − x )|X );

I upper semicontinuous in x: the upper-level sets D τ (X ) = {x ∈ R d : D(x |X ) ≥ τ } are closed for all τ ;

I (quasiconcave in x): the upper-level sets are convex for all τ .

(9)

Tukey depth of x ∈ R d w.r.t. a d -variate random vector X distributed as P is defined as the smallest probability mass of a closed halfspace containing x:

D T (x|X ) = inf{P (H) : H is a closed halfspace, x ∈ H}, and w.r.t. a data set X = {x 1 , ..., x n } ⊂ R d :

D T(n) (x|X ) = 1 n min

u∈S d−1

]{i : u 0 x i ≥ u 0 x }.

Other depth notions: Mahalanobis (’36), projection (Stahel, ’81;

Donoho, ’82), simplicial volume (Oja, ’83), simplicial (Liu, ’90), zonoid (Koshevoy, Mosler, ’97), spatial (Vardi, Zhang, ’00;

Serfling, ’02) depth.

(10)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

(11)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

120 / 161

(12)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

112 / 161

(13)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

● ● ●

● ●

47 / 161

(14)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

26 / 161

(15)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

41 / 161

(16)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

49 / 161

(17)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

114 / 161

(18)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

135 / 161

(19)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

13 / 161

(20)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

(21)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

152 / 161

(22)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

157 / 161

(23)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

152 / 161

(24)

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

800 1000 1200 1400

20 25 30 35

Weight, in grams

Age , in w eeks

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

14 / 161

Références

Documents relatifs

In the last years, several asymptotic expansion algorithms have appeared, which have the property that they can deal with very general types of singularities, such as singu-

The powerful device does not learn any information on the secret point SK except the ac- tual value of the pairing e(SK, PP) (which is perfectly suitable in our usecases)..

Typically, biomarkers obtained from the Zero Set and the Adapted Box algorithms have shown good discriminating power in the early detection and differential diagnosis of

The M¨ obius transform is a crucial transformation into the Boolean world; it allows to change the Boolean representation between the True Table and Algebraic Normal Form.. In

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

participants that remained active until the end of the experiment (though similar conclusions would also emerge by analysing the sample of GoEco! participants for which baseline

Dieses instruktive und ansprechend gestaltete Buch bietet nicht nur eine auf Archiv-Recherchen basierende Studie über die Entstehungsgeschichte der Vita D[omi]ni n[ost]ri Jesu

Dans un second temps, nous avons utilisé la profondeur de Thkey pour introduire et expérimenter quatre cartes de contrôle multivariées (la carte. r,la carte Q,la