• Aucun résultat trouvé

Segmentation in the mean of heteroscedastic data via resampling

N/A
N/A
Protected

Academic year: 2022

Partager "Segmentation in the mean of heteroscedastic data via resampling"

Copied!
126
0
0

Texte intégral

(1)

Segmentation in the mean of heteroscedastic data via resampling

Alain Celisse

1

AgroParisTech, Paris

2

University Paris-Sud 11, Orsay

3

SSB Group, Paris

Change point detection methods and Applications AgroParisTech, Paris, September, 11 2008

joint work with Sylvain Arlot

(2)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(3)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(4)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(5)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(6)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(7)

Statistical framework: regression on a fixed design

(t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y i = s (t i ) + σ i i ∈ Y = [0, 1] or R Instants t i : deterministic (t i = i /n).

Noise : E [ i ] = 0 and E 2 i

= 1.

Noise level: σ i (heteroscedastic)

Goal: estimate s

(8)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(9)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(10)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(11)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(12)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(13)

Change-points detection framework

Y i = s (t i ) + σ i i

s : piecewise constant with high jumps Heteroscedastic noise

Purpose and strategy:

Estimate s to recover most of the significant jumps w.r.t. the noise level.

We choose our estimator among piecewise constant functions

(14)

Estimation vs. Selection

A change-point in a noisy region

We do not systematically

want to recover it

Use the quadratic risk

Detected change-points

are meaningful

(15)

Estimation vs. Selection

A change-point in a noisy region

We do not systematically

want to recover it

Use the quadratic risk

Detected change-points

are meaningful

(16)

Estimation vs. Selection

A change-point in a noisy region

We do not systematically

want to recover it

Use the quadratic risk

Detected change-points

are meaningful

(17)

Estimation vs. Selection

A change-point in a noisy region

We do not systematically

want to recover it

Use the quadratic risk

Detected change-points

are meaningful

(18)

Estimation vs. Selection

A change-point in a noisy region

We do not systematically

want to recover it

Use the quadratic risk

Detected change-points

are meaningful

(19)

Loss function, least-squares risk and contrast

Loss function:

` (s , u ) = ks − uk 2 n := 1 n

n

X

i =1

(s(t i ) − u(t i )) 2

Least-squares risk of an estimator b s : R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

P n γ (u) := 1 n

n

X

i=1

γ (u, (t i , Y i )) ,

with γ(u, (x, y )) = (u(x) − y ) 2

(20)

Loss function, least-squares risk and contrast

Loss function:

` (s , u ) = ks − uk 2 n := 1 n

n

X

i =1

(s(t i ) − u(t i )) 2 Least-squares risk of an estimator b s :

R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

P n γ (u) := 1 n

n

X

i=1

γ (u, (t i , Y i )) ,

with γ(u, (x, y )) = (u(x) − y ) 2

(21)

Loss function, least-squares risk and contrast

Loss function:

` (s , u ) = ks − uk 2 n := 1 n

n

X

i =1

(s(t i ) − u(t i )) 2 Least-squares risk of an estimator b s :

R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

P n γ (u) := 1 n

n

X

i=1

γ (u, (t i , Y i )) ,

with γ(u, (x, y )) = (u(x) − y ) 2

(22)

Least-squares estimator

(I λ ) λ∈Λ

m

: partition of [0, 1]

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

Empirical risk minimizer over S m (= model):

b s m ∈ arg min

u∈S

m

P n γ(u, ·) = arg min

u∈S

m

1 n

n

X

i=1

(u(t i ) − Y i ) 2 . Regressogram

b s m = X

λ∈Λ

m

β b λ 1 I

λ

β b λ = 1 Card {t i ∈ I λ }

X

t

i

∈I

λ

Y i .

(23)

Least-squares estimator

(I λ ) λ∈Λ

m

: partition of [0, 1]

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

Empirical risk minimizer over S m (= model):

b s m ∈ arg min

u∈S

m

P n γ(u, ·) = arg min

u∈S

m

1 n

n

X

i=1

(u(t i ) − Y i ) 2 . Regressogram

b s m = X

λ∈Λ

m

β b λ 1 I

λ

β b λ = 1 Card {t i ∈ I λ }

X

t

i

∈I

λ

Y i .

(24)

Least-squares estimator

(I λ ) λ∈Λ

m

: partition of [0, 1]

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

Empirical risk minimizer over S m (= model):

b s m ∈ arg min

u∈S

m

P n γ(u, ·) = arg min

u∈S

m

1 n

n

X

i=1

(u(t i ) − Y i ) 2 .

Regressogram b s m = X

λ∈Λ

m

β b λ 1 I

λ

β b λ = 1 Card {t i ∈ I λ }

X

t

i

∈I

λ

Y i .

(25)

Least-squares estimator

(I λ ) λ∈Λ

m

: partition of [0, 1]

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

Empirical risk minimizer over S m (= model):

b s m ∈ arg min

u∈S

m

P n γ(u, ·) = arg min

u∈S

m

1 n

n

X

i=1

(u(t i ) − Y i ) 2 . Regressogram

b s m = X

λ∈Λ

m

β b λ 1 I

λ

β b λ = 1 Card {t i ∈ I λ }

X

t

i

∈I

λ

Y i .

(26)

Data (t 1 , Y 1 ), . . . , (t n , Y n )

(27)

Goal: reconstruct the signal

(28)

How many breakpoints?

D = 1 D = 3

(29)

How many breakpoints?

D = 1 D = 3

(30)

The oracle

(31)

Model selection

(S m ) m∈M −→ ( b s m ) m∈M −→ b s

m b ???

Goals:

Oracle inequality (in expectation, or with a large probability):

` (s, b s m b ) ≤ C inf

M

n

{` (s, b s m ) + R(m, n)}

Adaptivity (provided (S m ) m∈M

n

is well chosen)

(32)

Model selection

(S m ) m∈M −→ ( b s m ) m∈M −→ b s

m b ???

Goals:

Oracle inequality (in expectation, or with a large probability):

` (s, b s m b ) ≤ C inf

M

n

{` (s, b s m ) + R(m, n)}

Adaptivity (provided (S m ) m∈M

n

is well chosen)

(33)

Model selection

(S m ) m∈M −→ ( b s m ) m∈M −→ b s

m b ???

Goals:

Oracle inequality (in expectation, or with a large probability):

` (s, b s m b ) ≤ C inf

M

n

{` (s, b s m ) + R(m, n)}

Adaptivity (provided (S m ) m∈M

n

is well chosen)

(34)

Collection of models

For any m ∈ M n , (I λ ) λ∈Λ

m

denotes a partition of [0, 1] such that

I λ = [t i

k

, t i

k+1

)

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

with dim(S m ) = D m

∀1 ≤ D ≤ n − 1, Card {m ∈ M n | D m = D} =

n − 1 D − 1

(35)

Collection of models

For any m ∈ M n , (I λ ) λ∈Λ

m

denotes a partition of [0, 1] such that

I λ = [t i

k

, t i

k+1

)

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

with dim(S m ) = D m

∀1 ≤ D ≤ n − 1, Card {m ∈ M n | D m = D} =

n − 1 D − 1

(36)

Collection of models

For any m ∈ M n , (I λ ) λ∈Λ

m

denotes a partition of [0, 1] such that

I λ = [t i

k

, t i

k+1

)

S m : linear space of piecewise constant functions on (I λ ) λ∈Λ

m

with dim(S m ) = D m

∀1 ≤ D ≤ n − 1, Card {m ∈ M n | D m = D} =

n − 1 D − 1

(37)

The collection complexity idea

Usual approach:

Bias-variance tradeoff Mallows’ C p :

C p (m) = P n γ( b s m ) + 2σ 2 D m n

This approach is useless:

Mallows’ C p overfits (Figure)

The collection complexity is involved in this phenomenon

C

p

criterion

Oracle dimension: 5

(38)

The collection complexity idea

Usual approach:

Bias-variance tradeoff Mallows’ C p :

C p (m) = P n γ( b s m ) + 2σ 2 D m n

This approach is useless:

Mallows’ C p overfits (Figure)

The collection complexity is involved in this phenomenon

C

p

criterion

Oracle dimension: 5

(39)

The collection complexity idea

Usual approach:

Bias-variance tradeoff Mallows’ C p :

C p (m) = P n γ( b s m ) + 2σ 2 D m n

This approach is useless:

Mallows’ C p overfits (Figure)

The collection complexity is involved in this phenomenon

C

p

criterion

Oracle dimension: 5

(40)

The collection complexity idea

Usual approach:

Bias-variance tradeoff Mallows’ C p :

C p (m) = P n γ( b s m ) + 2σ 2 D m n

This approach is useless:

Mallows’ C p overfits (Figure)

The collection complexity is involved in this phenomenon

C

p

criterion

Oracle dimension: 5

(41)

The collection complexity idea

Usual approach:

Bias-variance tradeoff Mallows’ C p :

C p (m) = P n γ( b s m ) + 2σ 2 D m n

This approach is useless:

Mallows’ C p overfits (Figure)

The collection complexity is involved in this phenomenon

C

p

criterion

Oracle dimension: 5

(42)

Model collection complexity and overfitting

Regular partitions At least 15 points

At least 10 points At least 5 points

(43)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u)

m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(44)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u)

m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(45)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(46)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(47)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(48)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(49)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(50)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(51)

Birg´ e and Massart (2001): Homoscedastic

Algorithm 1:

∀m, b s m = arg min u∈S

m

P n γ (u) m b = arg min

m∈M {P n γ( b s m ) + pen(D m )}

Algorithm 1’: (=Algorithm 1)

∀D, b s

m(D) b = arg min u∈e S

D

P n γ(u ) D b = arg min

D∈D

P n γ( b s

m(D) b ) + pen(D m ) where

S e D = [

m|D

m

=D

S m

(52)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(53)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(54)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(55)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(56)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(57)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(58)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(59)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(60)

Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

Complexity of S m : of order D m /n S e D = S

m|D

m

=D S m

S e D : more complex than any S m Effective complexity measure

Curves may be superimposed Complexity of S e D :

pen(m) = c 1

D m

n + c 2

D m

n log n

D m

What about the heteroscedastic setting?

(61)

Heteroscedastic data

0 5 10 15 20 25 30 35 40 45 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 5

(62)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(63)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(64)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(65)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(66)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(67)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(68)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(69)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(70)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(71)

Strategy: Resampling with rich collections

With NOT TOO RICH collections of models Homoscedastic:

Penalties like c 1 D n

m

are reliable complexity measures

Heteroscedastic:

Resampling outperforms upon penalties c 1 D

m

n (Arlot (08)) With RICH collections of models

Homoscedastic:

c 1 D n

m

+ c 2 D n

m

log

n D

m

is a reliable complexity measure

Heteroscedastic:

We will use resampling to measure

the complexity

(72)

Cross-validation principle

(73)

Cross-validation principle

(74)

The training set: Computation of the estimator

(75)

The training set: Computation of the estimator

(76)

The test set: Quality assessment

(77)

The test set: Quality assessment

(78)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(79)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(80)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(81)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(82)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(83)

V -Fold cross-validation (VFCV)

Randomly split the data into V disjoint subsets (B j ) of size p ≈ n/V

For each 1 ≤ j ≤ V :

(X

1

, Y

1

), . . . , (X

n−p

, Y

n−p

)

| {z }

, (X

n−p+1

, Y

n−p+1

), . . . , (X

n

, Y

n

)

| {z }

Training set Test set B

j

b s

m(−j)

= arg min

u∈Sm

( 1 n − p

n−p

X

i=1

γ(u, (X

i

, Y

i

)) )

P

n(j)

= 1 p

n

X

i=n−p+1

δ

(Xi,Yi)

−→ P

n(j)

γ b s

m(−j)

VFCV −→ m b ∈ arg min M

n

n 1 V

P V

j=1 P n (j) γ

b s m (−j)

o

(84)

BM, VFCV: choosing the number of breakpoints

First step:

∀D, m(D) = b Argmin m|D

m

=D P n γ ( b s m ) Second Step:

1

BM (Homoscedastic):

P

n

γ( b s

m(D)b

) + pen(D) ≈ E h

s − b s

m(D)b

2

i

2

VFCV (Homoscedastic and Heteroscedastic):

1 V

V

X

j=1

P

n(j)

γ b s

(−j)

mb(−j)(D)

≈ E h

s − b s

m(D)b

2

i

(85)

BM, VFCV: choosing the number of breakpoints

First step:

∀D, m(D) = b Argmin m|D

m

=D P n γ ( b s m ) Second Step:

1

BM (Homoscedastic):

P

n

γ( b s

m(D)b

) + pen(D) ≈ E h

s − b s

m(D)b

2

i

2

VFCV (Homoscedastic and Heteroscedastic):

1 V

V

X

j=1

P

n(j)

γ b s

(−j)

mb(−j)(D)

≈ E h

s − b s

m(D)b

2

i

(86)

BM, VFCV: choosing the number of breakpoints

First step:

∀D, m(D) = b Argmin m|D

m

=D P n γ ( b s m ) Second Step:

1

BM (Homoscedastic):

P

n

γ( b s

m(D)b

) + pen(D) ≈ E h

s − b s

m(D)b

2

i

2

VFCV (Homoscedastic and Heteroscedastic):

1 V

V

X

j=1

P

n(j)

γ b s

(−j)

mb(−j)(D)

≈ E h

s − b s

m(D)b

2

i

(87)

BM, VFCV: choosing the number of breakpoints

First step:

∀D, m(D) = b Argmin m|D

m

=D P n γ ( b s m ) Second Step:

1

BM (Homoscedastic):

P

n

γ( b s

m(D)b

) + pen(D) ≈ E h

s − b s

m(D)b

2

i

2

VFCV (Homoscedastic and Heteroscedastic):

1 V

V

X

j=1

P

n(j)

γ b s

(−j)

mb(−j)(D)

≈ E h

s − b s

m(D)b

2

i

(88)

BM, VFCV: performance (number of breakpoints)

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3

0.35 Average estimated loss for Reg= 1, S= 2 Ideal VFCV BM

Homoscedastic

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2

0.25 Average estimated loss for Reg= 2, S= 5 Ideal VFCV BM

Heteroscedastic

σ · Ideal VFCV BM C p

2 2.19 ± 0.28 3.35 ± 0.35 2.96 ± 0.36 62.1 ± 7.1

3 4.09 ± 0.24 5.12 ± 0.3 4.84 ± 0.32 19.3 ± 1.2

5 4.04 ± 0.57 7.33 ± 0.9 38.5 ± 2.8 131 ± 14

6 2.13 ± 0.18 3.47 ± 0.28 5.1 ± 0.39 94.7 ± 7.7

(89)

Summary of main ideas

Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

Heteroscedastic setup: Resampling strongly outperforms upon

BM

(90)

Summary of main ideas

Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

Heteroscedastic setup: Resampling strongly outperforms upon

BM

(91)

Summary of main ideas

Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

Heteroscedastic setup: Resampling strongly outperforms upon

BM

(92)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(93)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(94)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(95)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(96)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(97)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(98)

Overfitting of ERM

ERM only takes into account the fit to the data Overfitting may occur when

∀D, m(D) = b Argmin m|D

m

=D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

Card ({m | D m = D}) =

n − 1 D − 1

Idea:

ERM may be replaced by resampling to provide { m(D)} b D

(99)

Algorithms

Goal:

See whether resampling outperforms upon ERM Alternatives:

ERM:

∀D, m(D) = arg b min

m|D

m

=D P n γ ( b s m ) Leave-one-out (LOO):

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

(100)

Algorithms

Goal:

See whether resampling outperforms upon ERM Alternatives:

ERM:

∀D, m(D) = arg b min

m|D

m

=D P n γ ( b s m ) Leave-one-out (LOO):

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

(101)

Algorithms

Goal:

See whether resampling outperforms upon ERM Alternatives:

ERM:

∀D, m(D) = arg b min

m|D

m

=D P n γ ( b s m ) Leave-one-out (LOO):

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

(102)

Algorithms

Goal:

See whether resampling outperforms upon ERM Alternatives:

ERM:

∀D, m(D) = arg b min

m|D

m

=D P n γ ( b s m ) Leave-one-out (LOO):

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

(103)

Algorithms

Goal:

See whether resampling outperforms upon ERM Alternatives:

ERM:

∀D, m(D) = arg b min

m|D

m

=D P n γ ( b s m ) Leave-one-out (LOO):

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

(104)

Quality of the segmentations: Homoscedastic

5 10 15 20 25 30 35 40 45 50

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Average estimated loss for r= 3, b= 2

ERM LOO

(105)

Quality of the segmentations: Heteroscedastic

5 10 15 20 25 30 35 40 45 50

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−5 Average estimated loss for r= 3, b= 12

ERM LOO

5 10 15 20 25 30 35

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Average estimated loss for r= 3, b= 7

ERM LOO

(106)

Overfitting with ERM: dimension 6

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

(107)

Overfitting with ERM: dimension 7

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

(108)

Overfitting with ERM: dimension 8

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

(109)

Overfitting with ERM: dimension 9

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

(110)

Overfitting with ERM: dimension 10

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

(111)

Global algorithm description

1

LOO at the first step to choose the best segmentation for each dimension:

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ b s m (−i )

2

Use the VFCV to choose the number of breakpoints

D b = arg min

D

 1 V

V

X

j =1

P n (j) γ b s (−j )

m b

(−j)

(D)

(112)

Global algorithm description

1

LOO at the first step to choose the best segmentation for each dimension:

∀D, m(D) = arg b min

m|D

m

=D

1 n

n

X

i=1

P n (i ) γ

b s m (−i )

2

Use the VFCV to choose the number of breakpoints

D b = arg min

D

 1 V

V

X

j =1

P n (j) γ b s (−j )

m b

(−j)

(D)

(113)

The Bt474 Cell lines

These are epithelial cells

Obtained from human breast cancer tumors

A test genome is compared to a reference male genome

We only consider chromosomes 1 and 9

(114)

The Bt474 Cell lines

These are epithelial cells

Obtained from human breast cancer tumors

A test genome is compared to a reference male genome

We only consider chromosomes 1 and 9

(115)

The Bt474 Cell lines

These are epithelial cells

Obtained from human breast cancer tumors

A test genome is compared to a reference male genome

We only consider chromosomes 1 and 9

(116)

The Bt474 Cell lines

These are epithelial cells

Obtained from human breast cancer tumors

A test genome is compared to a reference male genome

We only consider chromosomes 1 and 9

(117)

Results: Chromosome 9

Homoscedastic model (Picardet al.(05))

LOO+VFCV

(118)

Results: Chromosome 1

Homoscedastic model (Picardet al.(05))

LOO+VFCV

(119)

Conclusion

We have designed a resampling-based procedure

It may be effectively applied to the change-points detection problem

It behaves almost as well as BM in a homoscedastic framework It works well in a fully heteroscedastic setup

This methodology may be extended to other resampling schemes

This algorithm relies on the dimension as a criterion to gather

models: We may think about alternative criteria.

(120)

Conclusion

We have designed a resampling-based procedure

It may be effectively applied to the change-points detection problem

It behaves almost as well as BM in a homoscedastic framework It works well in a fully heteroscedastic setup

This methodology may be extended to other resampling schemes

This algorithm relies on the dimension as a criterion to gather

models: We may think about alternative criteria.

(121)

Conclusion

We have designed a resampling-based procedure

It may be effectively applied to the change-points detection problem

It behaves almost as well as BM in a homoscedastic framework It works well in a fully heteroscedastic setup

This methodology may be extended to other resampling schemes

This algorithm relies on the dimension as a criterion to gather

models: We may think about alternative criteria.

Références

Documents relatifs

• DR(mindestroy) uses weights that are associated with each assigned vari- able; when assigned, the weight of a variable v is initialised with the num- ber of values its

Version définitive du manuscrit publié dans / Final version of the manuscript published in :.. nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur /

To conclude, the combination of Lpo (for choosing a segmentation for each possible number of breakpoints) and VFCV yields the most reliable procedure for detecting changes in the

À saisir sans compter Tout ce qui m’est donné. Peu importe demain Le présent

Moreover, we also found some general differences between the perceived trustworthiness of people from different groups, particularly that older people are considered more

To conclude, the combination of Lpo (for choosing a segmentation for each possible number of change-points) and VFCV yields a reliable procedure for detecting changes in the mean of

The reason for this change of performance is that overpenalization improves the results of BM in most of the considered heteroscedastic settings, as shown by the right panel of Figure

In this section I consider first Li-Yorke chaos, then sensitivity, Auslander-Yorke and Devaney chaos, weak mixing and finally positive entropy, in relation with other