## Segmentation in the mean of heteroscedastic data via resampling

### Alain Celisse

1

### AgroParisTech, Paris

2

### University Paris-Sud 11, Orsay

3

### SSB Group, Paris

### Change point detection methods and Applications AgroParisTech, Paris, September, 11 2008

### joint work with Sylvain Arlot

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Statistical framework: regression on a fixed design

### (t 1 , Y 1 ), . . . , (t n , Y n ) ∈ [0, 1] × Y independent, Y _{i} = s (t _{i} ) + σ _{i} _{i} ∈ Y = [0, 1] or R Instants t _{i} : deterministic (t _{i} = i /n).

### Noise : E [ i ] = 0 and E ^{2} _{i}

### = 1.

### Noise level: σ _{i} (heteroscedastic)

### Goal: estimate s

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Change-points detection framework

### Y i = s (t i ) + σ i i

### s : piecewise constant with high jumps Heteroscedastic noise

### Purpose and strategy:

### Estimate s to recover most of the significant jumps w.r.t. the noise level.

### We choose our estimator among piecewise constant functions

## Estimation vs. Selection

### A change-point in a noisy region

### We do not systematically

### want to recover it

### Use the quadratic risk

### Detected change-points

### are meaningful

## Estimation vs. Selection

### A change-point in a noisy region

### We do not systematically

### want to recover it

### Use the quadratic risk

### Detected change-points

### are meaningful

## Estimation vs. Selection

### A change-point in a noisy region

### We do not systematically

### want to recover it

### Use the quadratic risk

### Detected change-points

### are meaningful

## Estimation vs. Selection

### A change-point in a noisy region

### We do not systematically

### want to recover it

### Use the quadratic risk

### Detected change-points

### are meaningful

## Estimation vs. Selection

### A change-point in a noisy region

### We do not systematically

### want to recover it

### Use the quadratic risk

### Detected change-points

### are meaningful

## Loss function, least-squares risk and contrast

### Loss function:

### ` (s , u ) = ks − uk ^{2} _{n} := 1 n

### n

### X

### i =1

### (s(t _{i} ) − u(t _{i} )) ^{2}

### Least-squares risk of an estimator b s : R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

### P n γ (u) := 1 n

### n

### X

### i=1

### γ (u, (t i , Y i )) ,

### with γ(u, (x, y )) = (u(x) − y ) ^{2}

## Loss function, least-squares risk and contrast

### Loss function:

### ` (s , u ) = ks − uk ^{2} _{n} := 1 n

### n

### X

### i =1

### (s(t _{i} ) − u(t _{i} )) ^{2} Least-squares risk of an estimator b s :

### R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

### P n γ (u) := 1 n

### n

### X

### i=1

### γ (u, (t i , Y i )) ,

### with γ(u, (x, y )) = (u(x) − y ) ^{2}

## Loss function, least-squares risk and contrast

### Loss function:

### ` (s , u ) = ks − uk ^{2} _{n} := 1 n

### n

### X

### i =1

### (s(t _{i} ) − u(t _{i} )) ^{2} Least-squares risk of an estimator b s :

### R n ( b s ) := E [ ` (s, b s ) ] Empirical risk:

### P n γ (u) := 1 n

### n

### X

### i=1

### γ (u, (t i , Y i )) ,

### with γ(u, (x, y )) = (u(x) − y ) ^{2}

## Least-squares estimator

### (I _{λ} ) λ∈Λ

m### : partition of [0, 1]

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) λ∈Λ

m
### Empirical risk minimizer over S m (= model):

### b s _{m} ∈ arg min

### u∈S

m### P _{n} γ(u, ·) = arg min

### u∈S

m### 1 n

### n

### X

### i=1

### (u(t _{i} ) − Y _{i} ) ^{2} . Regressogram

### b s _{m} = X

### λ∈Λ

m### β b _{λ} 1 I

λ ### β b _{λ} = 1 Card {t _{i} ∈ I λ }

### X

### t

i### ∈I

_{λ}

### Y _{i} .

## Least-squares estimator

### (I _{λ} ) λ∈Λ

m### : partition of [0, 1]

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) λ∈Λ

m
### Empirical risk minimizer over S m (= model):

### b s _{m} ∈ arg min

### u∈S

m### P _{n} γ(u, ·) = arg min

### u∈S

m### 1 n

### n

### X

### i=1

### (u(t _{i} ) − Y _{i} ) ^{2} . Regressogram

### b s _{m} = X

### λ∈Λ

m### β b _{λ} 1 I

λ ### β b _{λ} = 1 Card {t _{i} ∈ I λ }

### X

### t

i### ∈I

_{λ}

### Y _{i} .

## Least-squares estimator

### (I _{λ} ) λ∈Λ

m### : partition of [0, 1]

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) λ∈Λ

m
### Empirical risk minimizer over S m (= model):

### b s _{m} ∈ arg min

### u∈S

m### P _{n} γ(u, ·) = arg min

### u∈S

m### 1 n

### n

### X

### i=1

### (u(t _{i} ) − Y _{i} ) ^{2} .

### Regressogram b s _{m} = X

### λ∈Λ

m### β b _{λ} 1 I

λ ### β b _{λ} = 1 Card {t _{i} ∈ I λ }

### X

### t

i### ∈I

_{λ}

### Y _{i} .

## Least-squares estimator

### (I _{λ} ) λ∈Λ

m### : partition of [0, 1]

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) λ∈Λ

m
### Empirical risk minimizer over S m (= model):

### b s _{m} ∈ arg min

### u∈S

m### P _{n} γ(u, ·) = arg min

### u∈S

m### 1 n

### n

### X

### i=1

### (u(t _{i} ) − Y _{i} ) ^{2} . Regressogram

### b s _{m} = X

### λ∈Λ

m### β b _{λ} 1 I

λ ### β b _{λ} = 1 Card {t _{i} ∈ I λ }

### X

### t

i### ∈I

_{λ}

### Y _{i} .

## Data (t 1 , Y 1 ), . . . , (t n , Y n )

## Goal: reconstruct the signal

## How many breakpoints?

### D = 1 D = 3

## How many breakpoints?

### D = 1 D = 3

## The oracle

## Model selection

### (S _{m} ) m∈M −→ ( b s _{m} ) m∈M −→ b s

### m b ???

### Goals:

### Oracle inequality (in expectation, or with a large probability):

### ` (s, b s _{m} _{b} ) ≤ C inf

### M

n### {` (s, b s m ) + R(m, n)}

### Adaptivity (provided (S _{m} ) m∈M

_{n}

### is well chosen)

## Model selection

### (S _{m} ) m∈M −→ ( b s _{m} ) m∈M −→ b s

### m b ???

### Goals:

### Oracle inequality (in expectation, or with a large probability):

### ` (s, b s _{m} _{b} ) ≤ C inf

### M

n### {` (s, b s m ) + R(m, n)}

### Adaptivity (provided (S _{m} ) m∈M

_{n}

### is well chosen)

## Model selection

### (S _{m} ) m∈M −→ ( b s _{m} ) m∈M −→ b s

### m b ???

### Goals:

### Oracle inequality (in expectation, or with a large probability):

### ` (s, b s _{m} _{b} ) ≤ C inf

### M

n### {` (s, b s m ) + R(m, n)}

### Adaptivity (provided (S _{m} ) m∈M

_{n}

### is well chosen)

## Collection of models

### For any m ∈ M _{n} , (I _{λ} ) _{λ∈Λ}

m

### denotes a partition of [0, 1] such that

### I _{λ} = [t _{i}

_{k}

### , t _{i}

_{k+1}

### )

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) _{λ∈Λ}

m

### with dim(S m ) = D m

### ∀1 ≤ D ≤ n − 1, Card {m ∈ M _{n} | D _{m} = D} =

### n − 1 D − 1

## Collection of models

### For any m ∈ M _{n} , (I _{λ} ) _{λ∈Λ}

m

### denotes a partition of [0, 1] such that

### I _{λ} = [t _{i}

_{k}

### , t _{i}

_{k+1}

### )

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) _{λ∈Λ}

m

### with dim(S m ) = D m

### ∀1 ≤ D ≤ n − 1, Card {m ∈ M _{n} | D _{m} = D} =

### n − 1 D − 1

## Collection of models

### For any m ∈ M _{n} , (I _{λ} ) _{λ∈Λ}

m

### denotes a partition of [0, 1] such that

### I _{λ} = [t _{i}

_{k}

### , t _{i}

_{k+1}

### )

### S _{m} : linear space of piecewise constant functions on (I _{λ} ) _{λ∈Λ}

m

### with dim(S m ) = D m

### ∀1 ≤ D ≤ n − 1, Card {m ∈ M _{n} | D _{m} = D} =

### n − 1 D − 1

## The collection complexity idea

### Usual approach:

### Bias-variance tradeoff Mallows’ C _{p} :

### C _{p} (m) = P _{n} γ( b s _{m} ) + 2σ ^{2} D _{m} n

### This approach is useless:

### Mallows’ C p overfits (Figure)

### The collection complexity is involved in this phenomenon

### C

_{p}

### criterion

### Oracle dimension: 5

## The collection complexity idea

### Usual approach:

### Bias-variance tradeoff Mallows’ C _{p} :

### C _{p} (m) = P _{n} γ( b s _{m} ) + 2σ ^{2} D _{m} n

### This approach is useless:

### Mallows’ C p overfits (Figure)

### The collection complexity is involved in this phenomenon

### C

_{p}

### criterion

### Oracle dimension: 5

## The collection complexity idea

### Usual approach:

### Bias-variance tradeoff Mallows’ C _{p} :

### C _{p} (m) = P _{n} γ( b s _{m} ) + 2σ ^{2} D _{m} n

### This approach is useless:

### Mallows’ C p overfits (Figure)

### The collection complexity is involved in this phenomenon

### C

_{p}

### criterion

### Oracle dimension: 5

## The collection complexity idea

### Usual approach:

### Bias-variance tradeoff Mallows’ C _{p} :

### C _{p} (m) = P _{n} γ( b s _{m} ) + 2σ ^{2} D _{m} n

### This approach is useless:

### Mallows’ C p overfits (Figure)

### The collection complexity is involved in this phenomenon

### C

_{p}

### criterion

### Oracle dimension: 5

## The collection complexity idea

### Usual approach:

### Bias-variance tradeoff Mallows’ C _{p} :

### C _{p} (m) = P _{n} γ( b s _{m} ) + 2σ ^{2} D _{m} n

### This approach is useless:

### Mallows’ C p overfits (Figure)

### The collection complexity is involved in this phenomenon

### C

_{p}

### criterion

### Oracle dimension: 5

## Model collection complexity and overfitting

### Regular partitions At least 15 points

### At least 10 points At least 5 points

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u)

### m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u)

### m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Birg´ e and Massart (2001): Homoscedastic

### Algorithm 1:

### ∀m, b s _{m} = arg min _{u∈S}

_{m}

### P _{n} γ (u) m b = arg min

### m∈M {P _{n} γ( b s m ) + pen(D m )}

### Algorithm 1’: (=Algorithm 1)

### ∀D, b s

### m(D) b = arg min _{u∈e} _{S}

D

### P n γ(u ) D b = arg min

### D∈D

### P n γ( b s

### m(D) b ) + pen(D m ) where

### S e _{D} = [

### m|D

m### =D

### S m

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Complexity measure in the homoscedastic setup

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 2

### Complexity of S m : of order D m /n S e _{D} = S

### m|D

m### =D S _{m}

### S e _{D} : more complex than any S _{m} Effective complexity measure

### Curves may be superimposed Complexity of S e _{D} :

### pen(m) = c 1

### D m

### n + c 2

### D m

### n log n

### D _{m}

### What about the heteroscedastic setting?

## Heteroscedastic data

0 5 10 15 20 25 30 35 40 45 50

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Average estimated loss for Reg= 1, S= 5

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Strategy: Resampling with rich collections

### With NOT TOO RICH collections of models Homoscedastic:

### Penalties like c _{1} ^{D} _{n}

^{m}

### are reliable complexity measures

### Heteroscedastic:

### Resampling outperforms upon penalties c 1 D

m### n (Arlot (08)) With RICH collections of models

### Homoscedastic:

### c _{1} ^{D} _{n}

^{m}

### + c _{2} ^{D} _{n}

^{m}

### log

### n D

m### is a reliable complexity measure

### Heteroscedastic:

### We will use resampling to measure

### the complexity

## Cross-validation principle

## Cross-validation principle

## The training set: Computation of the estimator

## The training set: Computation of the estimator

## The test set: Quality assessment

## The test set: Quality assessment

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## V -Fold cross-validation (VFCV)

### Randomly split the data into V disjoint subsets (B _{j} ) of size p ≈ n/V

### For each 1 ≤ j ≤ V :

### (X

1### , Y

1### ), . . . , (X

_{n−p}

### , Y

_{n−p}

### )

### | {z }

### , (X

_{n−p+1}

### , Y

_{n−p+1}

### ), . . . , (X

n### , Y

n### )

### | {z }

### Training set Test set B

j### b s

_{m}

^{(−j)}

### = arg min

u∈Sm

### ( 1 n − p

n−p

### X

i=1

### γ(u, (X

i### , Y

i### )) )

### P

_{n}

^{(j)}

### = 1 p

n

### X

i=n−p+1

### δ

_{(X}

_{i}

_{,Y}

_{i}

_{)}

### −→ P

_{n}

^{(j)}

### γ b s

_{m}

^{(−j)}

### VFCV −→ m b ∈ arg min M

n### n 1 V

### P V

### j=1 P n ^{(j)} γ

### b s m ^{(−j)}

### o

## BM, VFCV: choosing the number of breakpoints

### First step:

### ∀D, m(D) = b Argmin _{m|D}

_{m}

_{=D} P _{n} γ ( b s _{m} ) Second Step:

1

### BM (Homoscedastic):

### P

n### γ( b s

m(D)b

### ) + pen(D) ≈ E h

### s − b s

m(D)b

2

### i

2

### VFCV (Homoscedastic and Heteroscedastic):

### 1 V

V

### X

j=1

### P

_{n}

^{(j)}

### γ b s

^{(−j)}

mb^{(−j)}(D)

### ≈ E h

### s − b s

m(D)b

2

### i

## BM, VFCV: choosing the number of breakpoints

### First step:

### ∀D, m(D) = b Argmin _{m|D}

_{m}

_{=D} P _{n} γ ( b s _{m} ) Second Step:

1

### BM (Homoscedastic):

### P

n### γ( b s

m(D)b

### ) + pen(D) ≈ E h

### s − b s

m(D)b

2

### i

2

### VFCV (Homoscedastic and Heteroscedastic):

### 1 V

V

### X

j=1

### P

_{n}

^{(j)}

### γ b s

^{(−j)}

mb^{(−j)}(D)

### ≈ E h

### s − b s

m(D)b

2

### i

## BM, VFCV: choosing the number of breakpoints

### First step:

### ∀D, m(D) = b Argmin _{m|D}

_{m}

_{=D} P _{n} γ ( b s _{m} ) Second Step:

1

### BM (Homoscedastic):

### P

n### γ( b s

m(D)b

### ) + pen(D) ≈ E h

### s − b s

m(D)b

2

### i

2

### VFCV (Homoscedastic and Heteroscedastic):

### 1 V

V

### X

j=1

### P

_{n}

^{(j)}

### γ b s

^{(−j)}

mb^{(−j)}(D)

### ≈ E h

### s − b s

m(D)b

2

### i

## BM, VFCV: choosing the number of breakpoints

### First step:

### ∀D, m(D) = b Argmin _{m|D}

_{m}

_{=D} P _{n} γ ( b s _{m} ) Second Step:

1

### BM (Homoscedastic):

### P

n### γ( b s

m(D)b

### ) + pen(D) ≈ E h

### s − b s

m(D)b

2

### i

2

### VFCV (Homoscedastic and Heteroscedastic):

### 1 V

V

### X

j=1

### P

_{n}

^{(j)}

### γ b s

^{(−j)}

mb^{(−j)}(D)

### ≈ E h

### s − b s

m(D)b

2

### i

## BM, VFCV: performance (number of breakpoints)

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2 0.25 0.3

0.35 Average estimated loss for Reg= 1, S= 2 Ideal VFCV BM

### Homoscedastic

0 10 20 30 40 50

0 0.05 0.1 0.15 0.2

0.25 Average estimated loss for Reg= 2, S= 5 Ideal VFCV BM

### Heteroscedastic

### σ · Ideal VFCV BM C p

### 2 2.19 ± 0.28 3.35 ± 0.35 2.96 ± 0.36 62.1 ± 7.1

### 3 4.09 ± 0.24 5.12 ± 0.3 4.84 ± 0.32 19.3 ± 1.2

### 5 4.04 ± 0.57 7.33 ± 0.9 38.5 ± 2.8 131 ± 14

### 6 2.13 ± 0.18 3.47 ± 0.28 5.1 ± 0.39 94.7 ± 7.7

## Summary of main ideas

### Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

### Heteroscedastic setup: Resampling strongly outperforms upon

### BM

## Summary of main ideas

### Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

### Heteroscedastic setup: Resampling strongly outperforms upon

### BM

## Summary of main ideas

### Resampling turns out to be reliable with rich collections Homoscedastic setup: Resampling performs almost as well as BM

### Heteroscedastic setup: Resampling strongly outperforms upon

### BM

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin m|D

_{m}

### =D P n γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Overfitting of ERM

### ERM only takes into account the fit to the data Overfitting may occur when

### ∀D, m(D) = b Argmin _{m|D}

m

### =D P _{n} γ (u) Overfitting is all the more strong as S e D is large In our setting,

### Card ({m | D _{m} = D}) =

### n − 1 D − 1

### Idea:

### ERM may be replaced by resampling to provide { m(D)} b _{D}

## Algorithms

### Goal:

### See whether resampling outperforms upon ERM Alternatives:

### ERM:

### ∀D, m(D) = arg b min

### m|D

m### =D P n γ ( b s m ) Leave-one-out (LOO):

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ

### b s _{m} ^{(−i} ^{)}

## Algorithms

### Goal:

### See whether resampling outperforms upon ERM Alternatives:

### ERM:

### ∀D, m(D) = arg b min

### m|D

m### =D P n γ ( b s m ) Leave-one-out (LOO):

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ

### b s _{m} ^{(−i} ^{)}

## Algorithms

### Goal:

### See whether resampling outperforms upon ERM Alternatives:

### ERM:

### ∀D, m(D) = arg b min

### m|D

m### =D P n γ ( b s m ) Leave-one-out (LOO):

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ

### b s _{m} ^{(−i} ^{)}

## Algorithms

### Goal:

### See whether resampling outperforms upon ERM Alternatives:

### ERM:

### ∀D, m(D) = arg b min

### m|D

m### =D P n γ ( b s m ) Leave-one-out (LOO):

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ

### b s _{m} ^{(−i} ^{)}

## Algorithms

### Goal:

### See whether resampling outperforms upon ERM Alternatives:

### ERM:

### ∀D, m(D) = arg b min

### m|D

m### =D P n γ ( b s m ) Leave-one-out (LOO):

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ

### b s _{m} ^{(−i} ^{)}

## Quality of the segmentations: Homoscedastic

5 10 15 20 25 30 35 40 45 50

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Average estimated loss for r= 3, b= 2

ERM LOO

## Quality of the segmentations: Heteroscedastic

5 10 15 20 25 30 35 40 45 50

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10^{−5} Average estimated loss for r= 3, b= 12

ERM LOO

5 10 15 20 25 30 35

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Average estimated loss for r= 3, b= 7

ERM LOO

## Overfitting with ERM: dimension 6

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

## Overfitting with ERM: dimension 7

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

## Overfitting with ERM: dimension 8

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

## Overfitting with ERM: dimension 9

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

## Overfitting with ERM: dimension 10

0 10 20 30 40 50 60 70 80 90 100

−2

−1.5

−1

−0.5 0 0.5 1 1.5

ERM LOO Oracle

## Global algorithm description

1

### LOO at the first step to choose the best segmentation for each dimension:

### ∀D, m(D) = arg b min

### m|D

_{m}

### =D

### 1 n

### n

### X

### i=1

### P _{n} ^{(i} ^{)} γ b s _{m} ^{(−i} ^{)}

2

### Use the VFCV to choose the number of breakpoints

### D b = arg min

### D

###

###

### 1 V

### V

### X

### j =1

### P _{n} ^{(j)} γ b s ^{(−j} ^{)}

### m b

^{(−j)}

### (D)

###

###

###

## Global algorithm description

1

### LOO at the first step to choose the best segmentation for each dimension:

### ∀D, m(D) = arg b min

### m|D

m### =D

### 1 n

### n

### X

### i=1

### P n ^{(i} ^{)} γ

### b s m ^{(−i} ^{)}

2

### Use the VFCV to choose the number of breakpoints

### D b = arg min

### D

###

###

### 1 V

### V

### X

### j =1

### P _{n} ^{(j)} γ b s ^{(−j} ^{)}

### m b

^{(−j)}

### (D)

###

###

###

## The Bt474 Cell lines

### These are epithelial cells

### Obtained from human breast cancer tumors

### A test genome is compared to a reference male genome

### We only consider chromosomes 1 and 9

## The Bt474 Cell lines

### These are epithelial cells

### Obtained from human breast cancer tumors

### A test genome is compared to a reference male genome

### We only consider chromosomes 1 and 9

## The Bt474 Cell lines

### These are epithelial cells

### Obtained from human breast cancer tumors

### A test genome is compared to a reference male genome

### We only consider chromosomes 1 and 9

## The Bt474 Cell lines

### These are epithelial cells

### Obtained from human breast cancer tumors

### A test genome is compared to a reference male genome

### We only consider chromosomes 1 and 9

## Results: Chromosome 9

Homoscedastic model (Picardet al.(05))

### LOO+VFCV

## Results: Chromosome 1

Homoscedastic model (Picardet al.(05))