**HAL Id: hal-02816806**

**https://hal.inrae.fr/hal-02816806**

### Submitted on 6 Jun 2020

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

### L’archive ouverte pluridisciplinaire **HAL, est** destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

**Segmentation in the mean of heteroscedastic data via** **resampling or cross-validation**

### Alain Celisse, Sylvain Arlot

**To cite this version:**

### Alain Celisse, Sylvain Arlot. Segmentation in the mean of heteroscedastic data via resampling or cross- validation. Workshop Change-Point Detection Methods and Applications, Sep 2008, Paris, France.

### �hal-02816806�

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

Segmentation of the mean of heterosedasti data via

ross-validation

SylvainArlotand Alain Celisse

April8, 2009

Abstrat

Thispapertaklestheproblem ofdetetingabrupthangesinthemeanofahet-

erosedasti signal by model seletion, without knowledge on the variations of the

noise. A newfamilyof hange-pointdetetionproedures is proposed,showing that

ross-validationmethodsanbesuessfulintheheterosedastiframework,whereas

mostexisting proeduresarenotrobusttoheterosedastiity. Therobustnesstohet-

erosedastiity of the proposed proedures is supported by an extensive simulation

study, together with reent theoretial results. An appliation to ComparativeGe-

nomiHybridization(CGH)dataisprovided,showingthatrobustnesstoheterosedas-

tiityanindeed berequiredfortheiranalysis.

1 Introdution

Theproblemtakledinthepaperisthedetetionofabrupthangesinthemeanofasignal

withoutassumingitsvarianeisonstant. Modelseletion andross-validation tehniques

are usedfor buildinghange-point detetion proedures that signiantly improve on ex-

isting proedures when the variane of the signal is not onstant. Before detailing the

approahand themainontributions ofthepaper,letusmotivatetheproblemandbriey

reallsome relatedworksinthehange-point detetionliterature.

1.1 Change-point detetion

Thehange-point detetionproblem,alsoalledone-dimensionalsegmentation,dealswith

astohastiproessthedistributionofwhihabruptlyhangesatsomeunknowninstants.

The purpose is to reover theloation of these hanges and their number. This problem

is motivated by a wide range of appliations, suh as voie reognition, nanial time-

series analysis[29℄ andComparative Genomi Hybridization (CGH) dataanalysis[35℄. A

large literature existsabout hange-point detetionin manyframeworks [see 12 ,17 , for a

omplete bibliography℄.

Therstpapersonhange-pointdetetionweredevotedtothesearhfortheloationof

a unique hange-point, alsonamed breakpoint [see 34 ,for instane℄. Looking formultiple

hange-points is a harder task and has been studied later. For instane, Yao [49 ℄ used

the BICriterion fordeteting multiplehange-pointsinaGaussian signal,and Miaoand

Zhao [33℄proposed anapproahrelying on rankstatistis.

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

The setting of the paper is the following. The values

### Y

_{1}

### , . . . , Y

_{n}

### ∈ R

^{of}

^{a}

^{noisy}

^{signal}

at points

### t

_{1}

### , . . . , t

_{n}

^{are}

^{observed,}

^{with}

### Y

i### = s(t

i### ) + σ(t

i### )ǫ

i### , E [ǫ

i### ] = 0

^{and}

### Var(ǫ

i### ) = 1 .

^{(1)}

Thefuntion

### s

^{is}

^{alled}

^{the}

^{regression}

^{funtion}

^{and}

^{is}

^{assumed}

^{to}

^{be}pieewise-onstant,or atleastwellapproximatedbypieewiseonstantfuntions,thatis,

### s

^{is}

^{smooth}

^{everywhere}

exept at a few breakpoints. The noise terms

### ǫ

1### , . . . , ǫ

n^{are}

^{assumed}

^{to}

^{be}independent and identially distributed. No assumption is made on

### σ : [0, 1] 7→ [0, ∞ )

^{.}

^{Note}

^{that}

^{all}

data

### (t

_{i}

### , Y

_{i}

### )

_{1≤i≤n}

^{are}

^{observed}

^{before}

^{deteting}

^{the}hange-points, asettingwhihisalled o-line.

As pointedout byLavielle [28 ℄, multiple hange-point detetion proedures generally

takle one amongthefollowing three problems:

1. Deteting hangesinthe mean

### s

^{assuming}

^{the}standard-deviation

### σ

^{is}

^{onstant,}

2. Deteting hangesinthestandard-deviation

### σ

^{assuming}

^{the}

^{mean}

### s

^{is}

^{onstant,}

3. Detetinghangesinthewholedistributionof

### Y

^{,}

^{with}

^{no}

^{distintion}

^{between}

^{hanges}

inthe mean

### s

^{,}

^{hanges}

^{in}

^{the}standard-deviation

### σ

^{and}

^{hanges}

^{in}

^{the}distribution of

### ǫ

^{.}

In appliations suh as CGH data analysis, hanges in the mean

### s

^{have}

^{an}

^{important}

biologial meaning, sine they orrespond to the limits of amplied or deleted areas of

hromosomes. However in the CGH setting, the standard-deviation

### σ

^{is}

^{not}

^{always}

^{on-}

stant, as assumed in problem 1. See Setion 6 for more details on CGH data, for whih

heterosedastiitythatis, variations of

### σ

^{}

^{orrespond}

^{to}experimental artefats or bio- logial nuisane that shouldberemoved.

Therefore,CGHdataanalysisrequiresto solve afourthproblem,whihisthepurpose

of thepresent artile:

4. Deteting hanges in the mean

### s

^{with}

^{no}

^{onstraint}

^{on}

^{the}standard-deviation

### σ : [0, 1] 7→ [0, ∞ )

^{.}

Comparedtoproblem1,the diereneisthepreseneofanadditional nuisane parameter

### σ

^{making}

^{problem}

^{4}

^{harder.}

^{Up}

^{to}

^{the}

^{best}

^{of}

^{our}

^{knowledge,}

^{no}hange-point detetion proedurehas everbeen proposed forsolving problem 4with no prior information on

### σ

^{.}

1.2 Model seletion

Modelseletion isa suessfulapproah for multiplehange-point detetion, asshown by

Lavielle [28℄ and by Lebarbier [30℄ for instane. Indeed, a set of hange-pointsalled a

segmentationisnaturallyassoiatedwiththesetofpieewise-onstantfuntionsthatmay

onlyjumpatthesehange-points. Givenasetoffuntions(alledamodel),estimationan

be performed by minimizing the least-squares riterion (or other riteria, see Setion 3).

Therefore,detetinghangesinthe meanof asignal, thatisthehoie ofa segmentation,

amounts to seletsuh amodel.

Morepreisely,givenaolletionofmodels

### { S

_{m}

### }

m∈Mnandtheassoiatedolletionof

least-squares estimators

### {b s

m### }

_{m∈M}

_{n}

^{,}

^{the}

^{purpose}

^{of}

^{model}

^{seletion}

^{is}

^{to}

^{provide}

^{a}

^{model}

index

### m b

^{suh}

^{that}

### s b

_{m}

_{b}

^{reahes}

^{the}

^{best}

^{performane}

^{among}

^{all}

^{estimators}

### {b s

_{m}

### }

m∈Mn^{.}

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

Model seletion an target two dierent goals. On the one hand, a model seletion

proedureiseient whenits quadratiriskissmallerthan thesmallestquadrati riskof

the estimators

### {b s

m### }

m∈Mn^{,}

^{up}

^{to}

^{a}

^{onstant}

^{fator}

### C

n### ≥ 1

^{.}

^{Suh}

^{a}

^{property}

^{is}

^{alled}

^{an}

orale inequality when it holds for every nite sample size. The proedure is said to be

asymptoti eient whenthe previous property holdswith

### C

_{n}

### → 1

^{as}

### n

^{tends}

^{to}

^{innity}

^{.}

Asymptoti eieny isthegoal ofAIC[2, 3℄and Mallows'

### C

p^{[32℄,}

^{among}

^{many}

^{others.}

On the otherhand, assuming that

### s

^{belongs}

^{to}

^{one}

^{of}

^{the}

^{models}

### { S

_{m}

### }

m∈Mn^{,}

^{a}

^{pro-}

edureismodelonsistentwhenithoosesthesmallestmodelontaining

### s

asymptotially withprobability one. Model onsistenyis the goal of BIC [39℄ for instane. See also theartile byYang [46℄ about the distintion between eieny andmodel onsisteny.

In the present paper as in [30 ℄, the quality of a multiple hange-point detetion pro-

edure isassessed bythequadrati risk; hene, a hange in the mean hidden by the noise

should not be deteted. Thishoie is motivatedbyappliations where the signal-to-noise

ratio maybe small,sothat exatlyreovering everytrue hange-point ishopeless. There-

fore,eientmodelseletion proedureswillbeusedinordertodetet thehange-points.

Withoutprior informationonthe loationsofthehange-points, thenatural olletion

of models for hange-point detetion depends on the sample size

### n

^{.}

^{Indeed,}

^{there}

^{exist}

n−1 D−1

dierentpartitionsofthe

### n

^{design}

^{points}

^{into}

### D

^{intervals,}

^{eah}

^{partition}

^{orrespond-}

ingto a set of

### (D − 1)

hange-points. Sine### D

^{an}

^{take}

^{any}

^{value}

^{between}

^{1}

^{and}

### n

^{,}

### 2

^{n−1}

modelsanbeonsidered. Therefore,modelseletionproeduresusedformultiplehange-

pointdetetionhaveto satisfynon-asymptotiorale inequalities: theolletionofmodels

annotbeassumedto bexedwiththesamplesize

### n

^{tending}

^{to}

^{innity}

^{.}

^{(See}

^{Setion}

^{2.3}

for apreise denitionof the olletion

### { S

_{m}

### }

_{m∈M}

_{n}

^{used}

^{for}hange-point detetion.) Most model seletion results onsider polynomial olletions of models

### { S

_{m}

### }

m∈Mn^{,}

thatis

### Card( M

n### ) ≤ Cn

^{α}

^{for}

^{some}

^{onstants}

### C, α ≥ 0

^{.}

^{F}

^{or}

^{polynomial}

^{olletions,}

^{proe-}

dureslikeAICorMallows'

### C

p^{are}

^{proved}

^{to}

^{satisfy}

^{orale}inequalitiesinvariousframeworks [9, 15,10,16 ℄,assuming thatdataarehomosedasti, thatis,

### σ(t

_{i}

### )

^{does}

^{not}

^{depend}

^{on}

### t

_{i}

^{.}

Howeverasshownin[6℄,Mallows'

### C

_{p}

^{is}

^{suboptimal}

^{when}

^{data}

^{are}heterosedasti,that is the variane is non-onstant. Therefore, other proedures must be used. For instane,

resampling penalization is optimal with heterosedasti data [5℄. Another approah has

been explored by Gendre [25 ℄, whih onsists insimultaneously estimating the mean and

thevariane,using apartiular polynomial olletionofmodels.

However in hange-point detetion, the olletion of models is exponential, that is

### Card( M

n### )

^{is}

^{of}

^{order}

### exp(αn)

^{for}

^{some}

### α > 0

^{.}

^{F}

^{or}

^{suh}

^{large}

^{olletions,}

^{espeially}

^{larger}

than polynomial, the above penalization proedures fail. Indeed, Birgé and Massart [16℄

proved that the minimal amount of penalization required for a proedure to satisfy an

orale inequality isofthe form

### pen(m) = c

_{1}

### σ

^{2}

### D

_{m}

### n + c

_{2}

### σ

^{2}

### D

_{m}

### n log

### n D

_{m}

### ,

^{(2)}

where

### c

_{1}

^{and}

### c

_{2}

^{are}

^{positive}

^{onstants}

^{and}

### σ

^{2}

^{is}

^{the}

^{variane}

^{of}

^{the}

^{noise,}

^{assumed}

^{to}

be onstant. Lebarbier [30℄ proposed

### c

_{1}

### = 5

^{and}

### c

_{2}

### = 2

^{for}

^{optimizing}

^{the}

^{penalty}

^{(2)}

in the ontext of hange-point detetion. Penalties similar to (2) have been introdued

independentlybyotherauthors[38 ,1,11,45 ℄andareshowntoprovidesatisfatoryresults.

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

Nevertheless,alltheseresultsassumethatdataarehomosedasti. Atually,themodel

seletion problem with heterosedasti data and an exponential olletion of models has

neverbeen onsideredinthe literature, up to the best ofour knowledge.

Furthermore,penaltiesoftheform(2)areverylosetobeproportionalto

### D

_{m}

^{,}

^{at}

^{least}

forsmallvaluesof

### D

_{m}

^{.}

^{Therefore,}

^{the}

^{results}

^{of}

^{[6 ℄}

^{lead}

^{to}

^{onjeture}

^{that}

^{the}

^{penalty}

^{(2)}

is suboptimalfor model seletion overanexponential olletionof models,when dataare

heterosedasti. Thesuggestof thispaperisto use ross-validation methods instead.

1.3 Cross-validation

Cross-validation(CV)methods allowtoestimate(almost)unbiasedly thequadratiriskof

anyestimator, suh as

### b s

_{m}

^{(see}

^{Setion}

^{3.2}

^{about}

^{the}

^{heuristis}

^{underlying}

^{CV).}

^{Classial}

examples of CV methods are the leave-one-out [Loo, 27, 43 ℄ and

### V

^{-fold}

^{ross-v}

^{alidation}

[VFCV, 23,24 ℄. Morerefereneson ross-validationan befound in[7 , 19℄for instane.

CV an be used for model seletion, by hoosing the model

### S

_{m}

^{for}

^{whih}

^{the}

^{CV}

estimate of the risk of

### b s

_{m}

^{is}

^{minimal.}

^{The}

^{properties}

^{of}

^{CV}

^{for}

^{model}

^{seletion}

^{with}

a polynomial olletion of models and homosedasti data have been widely studied. In

short,CVisknowntoadapttoawiderangeofstatistialsettings,fromdensityestimation

[42 ,20℄toregression[44 ,48 ℄andlassiation[26,47 ℄. Inpartiular,Looisasymptotially

equivalent to AIC or Mallows'

### C

p^{in}

^{several}

^{frameworks}

^{where}

^{they}

^{are}asymptotially optimal,andotherCVmethodshavesimilarperformanes,providedthesizeofthetraining

sample is lose enough to the sample size [see for instane 31, 40, 22℄. In addition, CV

methodsarerobust toheterosedastiityof data[5, 7℄,aswell asseveral otherresampling

methods [6 ℄. Therefore, CV is a natural alternative to penalization proedures assuming

homosedastiity.

Nevertheless,nearlynothingisknownaboutCVformodelseletionwithanexponential

olletionofmodels,suhasinthehange-pointdetetionsetting. Theliteratureonmodel

seletion and CV [14, 40 ,16, 21℄only suggests thatminimizing diretlythe Loo estimate

of the riskover

### 2

^{n−1}

^{models}

^{would}

^{lead}

^{to}

^{overtting.}

In thispaper,a remarkmade byBirgéand Massart[16 ℄ about penalization proedure

is used for solving this issue in the ontext of hange-point detetion. Model seletion is

perfomed in two steps: First, hoose a segmentation given the number of hange-points;

seond,hoosethenumberofhange-points. CVmethodsanbeusedateahstep,leading

to Proedure 6 (Setion 5). The papershows that suh an approah is indeed suessful

for detetinghanges inthemeanof aheterosedastisignal.

1.4 Contributions of the paper

The main purpose of the present work is to design a CV-based model seletion proe-

dure (Proedure 6) that an be used for deteting multiple hanges in the mean of a

heterosedastisignal. Suhaproedureexperimentallyadaptstoheterosedastiitywhen

the olletion of models is exponential, whih hasneverbeen obtained before. In parti-

ular, Proedure 6 is a reliable alternative to Birgé and Massart's penalization proedure

[15 ℄ whendataan beheterosedasti.

Another major diultytakledinthis paperisthe omputational ostof resampling

methods when seleting among

### 2

^{n}

^{models.}

^{Even}

^{when}

^{the}

^{number}

### (D − 1)

^{of}

^{hange-}

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

points isgiven, exploring the

n−1 D−1

partitions of

### [0, 1]

^{into}

### D

^{intervals}

^{and}

^{performing}

^{a}

resampling algorithm for eah partition is not feasible when

### n

^{is}

^{large}

^{and}

### D > 0

^{.}

^{An}

implementation of Proedure 6 witha tratable omputational omplexity is proposedin

thepaper,usinglosed-formformulasforLeave-

### p

^{-out}

^{(Lpo)}

^{estimators}

^{of}

^{the}

^{risk,}

^{dynami}

programming, and

### V

^{-fold}ross-validation.

The paper also points out that least-squares estimators are not reliable for hange-

point detetion when the number of breakpoints is given, although they are widely used

to this purpose in the literature. Indeed, experimental and theoretial results detailed in

Setion3.1showthatleast-squaresestimatorssuerfromloaloverttingwhenthevariane

of the signalis varyingoverthe sequene of observations. Onthe ontrary,minimizers of

the Lpo estimator of the risk do not suer from this drawbak, whih emphasizes the

interestof using ross-validation methods inthe ontext ofhange-point detetion.

The paperis organizedasfollows. ThestatistialframeworkisdesribedinSetion 2.

First,theproblemofseleting thebest segmentation giventhenumberof hange-points

is takled in Setion 3. Theoretial results and an extensive simulation study show that

the usual minimization of the least-squares riterion an be misleading when data are

heterosedasti, whereas ross-validation-based proedures provide satisfatory results in

the same framework.

Then, the problem of hoosing the number of breakpoints from data is addressed in

Setion4 . Assupportedbyanextensivesimulation study,

### V

^{-fold}ross-validation(VFCV) leads to a omputationally feasible and statistially eient model seletion proedure

whendataareheterosedasti,ontraryto proedures impliitly assuminghomosedasti-

ity.

The resampling methods of Setions 3 and 4 are ombined in Setion 5, leading to a

familyofresampling-basedproeduresfordetetinghangesinthemeanofaheterosedas-

tisignal. Awide simulationstudy showstheyperform wellwithboth homosedasti and

heterosedastidata, signiantly improving theperformaneof proedures whih impli-

itlyassume homosedastiity.

Finally,Setion 6illustratesonarealdatasetthepromisingbehaviouroftheproposed

proedures for analyzing CGH miroarray data, ompared to proedures previously used

inthis setting.

2 Statistial framework

In this setion, the statistial framework of hange-point detetion via model seletion is

introdued, aswell assome notation.

2.1 Regression on a xed design

Let

### S

^{∗}

^{denote}

^{the}

^{set}

^{of}

^{measurable}

^{funtions}

### [0, 1] 7→ R

^{.}

^{Let}

### t

_{1}

### < · · · < t

_{n}

### ∈ [0, 1]

^{be}

some deterministi design points,

### s ∈ S

^{∗}

^{and}

### σ : [0, 1] 7→ [0, ∞ )

^{be}

^{some}

^{funtions}

^{and}

dene

### ∀ i ∈ { 1, . . . , n } , Y

i### = s(t

i### ) + σ(t

i### )ǫ

i### ,

^{(3)}

where

### ǫ

_{1}

### , . . . , ǫ

_{n}

^{are}independentandidentiallydistributedrandomvariableswith

### E [ǫ

_{i}

### ] = 0

^{and}

### E

### ǫ

^{2}

_{i}

### = 1

^{.}

### Version définitive du manuscrit publié dans / Final version of the manuscript published in :

### nuscript Manuscrit d’auteur / Author manuscript Manuscrit d’auteur / Author manuscript

**Statistics and Computing, 2010, vol. 21, n° 4, 613-632, 10.1007/s11222-010-9196-x **

As explained inSetion 1.1 , thegoal is to nd from

### (t

_{i}

### , Y

_{i}

### )

1≤i≤n^{a}pieewise-onstant funtion

### f ∈ S

^{∗}

^{lose}

^{to}

### s

^{in}

^{terms}

^{of}

^{the}

^{quadrati}

^{loss}

### k s − f k

^{2}

_{n}

### := 1 n

### X

n i=1### (f (t

i### ) − s(t

i### ))

^{2}

### .

2.2 Least-squares estimator

A lassial estimator of

### s

^{is}

^{the}least-squares estimator, dened as follows. For every

### f ∈ S

^{∗}

^{,}

^{the}least-squares riterion at

### f

^{is}

^{dened}

^{by}

### P

_{n}

### γ (f ) := 1 n

### X

n i=1### (Y

_{i}

### − f (t

_{i}

### ))

^{2}

### .

The notation

### P

_{n}

### γ(f )

^{means}

^{that}

^{the}

^{funtion}

### (t, Y ) 7→ γ(f; (t, Y )) := (Y − f (t))

^{2}

^{is}

integrated with respet to the empirial distribution

### P

_{n}

### := n

^{−1}

### P

ni=1

### δ

_{(t}

_{i}

_{,Y}

_{i}

_{)}

^{.}

### P

_{n}

### γ(f)

^{is}

also alledthe empirial risk of

### f

^{.}

Then, given a set

### S ⊂ S

^{∗}

^{of}

^{funtions}

### [0, 1] 7→ R

^{(alled}

^{a}

^{model),}

^{the}least-squares estimatoron model

### S

^{is}

### ERM(S; P

_{n}

### ) := arg min

f∈S

### { P

_{n}

### γ(f) } .

The notation

### ERM(S; P

_{n}

### )

^{stresses}

^{that}

^{the}least-squares estimator is the output of the empirial risk minimization algorithm over

### S

^{,}

^{whih}

^{takes}

^{a}

^{model}

### S

^{and}

^{a}

^{data}

^{sample}

as inputs. When a olletion of models

### { S

_{m}

### }

m∈Mnis given,

### b s

_{m}

### (P

_{n}

### )

^{or}

### b s

_{m}

^{are}

^{shortuts}

for

### ERM(S

_{m}

### ; P

_{n}

### )

^{.}

2.3 Colletion of models

Sine thegoal is to detet jumps of

### s

^{,}

^{every}

^{model}

^{onsidered}

^{in}

^{this}

^{artile}

^{is}

^{the}

^{set}

^{of}

pieewise onstant funtionswith respetto some partition of

### [0, 1]

^{.}

Forevery

### K ∈ { 1, . . . , n − 1 }

^{and}

^{every}

^{sequene}

^{of}

^{integers}

### α

_{0}

### = 1 < α

_{1}

### < α

_{2}

### < · · · <

### α

_{K}

### ≤ n

^{(the}breakpoints),

### (I

_{λ}

### )

_{λ∈Λ}

(α1,...αK)

denotes the partition

### [t

_{α}

_{0}

### ; t

_{α}

_{1}

### ), . . . , [t

_{α}

_{K}

_{−1}

### ; t

_{α}

_{K}

### ), [t

_{α}

_{K}

### ; 1]

of

### [0, 1]

^{into}

### (K + 1)

^{intervals.}

^{Then,}

^{the}

^{model}

### S

_{(α}

_{1}

_{,...α}

_{K}

_{)}

^{is}

^{dened}

^{as}

^{the}

^{set}

^{of}

^{pieewise}

onstant funtionsthatan only jumpat

### t = t

_{α}

_{j}

^{for}

^{some}

### j ∈ { 1, . . . , K }

^{.}

For every

### K ∈ { 1, . . . , n − 1 }

^{,}

^{let}

### M f

n### (K + 1)

^{denote}

^{the}

^{set}

^{of}

^{suh}

^{sequenes}

### (α

_{1}

### , . . . α

_{K}

### )

^{of}

^{length}

### K

^{,}

^{so}

^{that}

### { S

_{m}

### }

_{m∈}Mfn(K+1)

^{is}

^{the}

^{olletion}

^{of}

^{models}

^{of}

^{piee-}

wise onstant funtions with

### K

breakpoints. When### K = 0

^{,}

### M f

n### (1) := {∅}

^{and}

^{the}

model

### S

_{∅}

^{is}

^{the}

^{linear}

^{spae}

^{of}

^{onstant}

^{funtions}

^{on}

### [0, 1]

^{.}

^{Remark}

^{that}

^{for}

^{every}

### K

and

### m ∈ M f

n### (K + 1)

^{,}

### S

m^{is}

^{a}

^{vetor}

^{spae}

^{of}

^{dimension}

### D

m### = K + 1

^{.}

^{In}

^{the}

^{rest}

^{of}

^{the}

^{pa-}

per,therelationship betweenthenumberofbreakpoints

### K

^{and}

^{the}

^{dimension}

### D = K + 1

ofthemodel

### S

_{(α}

_{1}

_{,...α}

_{K}

_{)}

^{is}

^{used}repeatedly;inpartiular,estimatingofthenumberofbreak- points(Setion4) isequivalent tohoosingthedimension ofa model. Inaddition, sine a

model

### S

_{m}

^{is}

^{uniquely}

^{dened}

^{by}

### m

^{,}

^{the}

^{index}

### m

^{is}

^{also}

^{alled}

^{a}

^{model.}