• Aucun résultat trouvé

Least Squares estimation of two ordered monotone regression curves

N/A
N/A
Protected

Academic year: 2021

Partager "Least Squares estimation of two ordered monotone regression curves"

Copied!
24
0
0

Texte intégral

(1)

HAL Id: hal-00417281

https://hal.archives-ouvertes.fr/hal-00417281

Preprint submitted on 15 Sep 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Least Squares estimation of two ordered monotone regression curves

Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio

To cite this version:

Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio. Least Squares estimation of two ordered monotone regression curves. 2009. �hal-00417281�

(2)

Least Squares estimation of two ordered monotone regression curves

running headline: ordered monotone regression

Fadoua Balabdaoui(1,2), Kaspar Rufibach(3)and Filippo Santambrogio(1)

1CEREMADE Universit´e de Paris-Dauphine Place du Mar´echal de Lattre de Tassigny

75775 Paris CEDEX 16, France

2Universit¨at G¨ottingen Institut f¨ur Mathematische Stochastik

Goldschmidtstrasse 7 37077 G¨ottingen

3Universit¨at Z¨urich

Institut f¨ur Sozial- und Pr¨aventivmedizin Abteilung Biostatistik

Hirschengraben 84 8001 Z¨urich

fadoua@ceremade.dauphine.fr

kaspar.rufibach@ifspm.uzh.ch (corresponding author) filippo@ceremade.dauphine.fr

Abstract

In this paper, we consider the problem of finding the Least Squares estimators of two isotonic regression curvesg1 andg2 under the additional constraint that they are ordered; e.g., g1 g2. Given two sets ofndata pointsy1, .., ynandz1, . . . , znobserved at (the same) design points, the esti- mates of the true curves are obtained by minimizing the weighted Least Squares criterionL2(a, b) = Pn

j=1(yjaj)2w1j +Pn

j=1(zjbj)2w2j over the class of pairs of vectors(a, b)Rn×Rnsuch thata1 a2 ... an,b1 b2 ... bn, andai bi, i = 1, ..., n. The characterization of the estimators is established. To compute these estimators, we use an iterative projected subgradient algorithm, where the projection is performed with a “generalized” pool-adjacent-violaters algorithm (PAVA), a byproduct of this work. Then, we apply the estimation method to real data from mechanical engineering.

Keywords: least squares, monotone regression, pool-adjacent-violaters algorithm, shape con- straint estimation, subgradient algorithm

1 Introduction and motivation

Estimating a monotone regression curve is one of the most classical estimation problems under shape restrictions, see e.g. Brunk (1958). A regression curve is said to be isotonic if it

(3)

is monotone nondecreasing. We chose in this paper to look at the class of isotonic regression functions. The simple transformationg → −gsuffices for the results of this paper to carry over to the antitonic class.

Givennfixed pointsx1, . . . , xn, assume that we observeyiatxifori= 1, . . . , n. When the points(xi, yi)are joined, the shape of the obtained graph can hint at the increasing mono- tonicity of the true regression curve,g say, assuming the modelyi = g(xi) +εi, withεi the unobserved errors. This shape restriction can also be a feature of the scientific problem at hand, and hence the need for estimating the true curve in the class of antitonic functions.

We refer to Barlow et al. (1972) and Robertson et al. (1988) for examples. The weighted Least Squares estimate ofg in the class of isotonic functions takingyi atxi is the unique minimizer of the criterion

L(a) =

n

X

i=1

wi(yi−ai)2 (1)

over the class of vectorsa∈Rnsuch thata1 ≤a2....≤anwherew1 >0, w2 >0, . . . , wn>

0are given positive weights. In what follows, we will say that a vector v∈Rnis increasing or isotonic ifv1 ≤. . .≤vn, and use the notationv≤wforv, w ∈Rnif the inequality holds componentwise.

It is well known that the solutiona of the Least Squares problem in (1) is given by the so-called min-max formula; i.e.,

ai = max

s≤i min

t≥i Av({s, . . . , t}) (2)

whereAv({s, . . . , t}) =Pt

i=syiwi/Pt

i=swi(see e.g. Barlow et al., 1972).

van Eeden (1957a,b) has generalized this problem to incorporate known bounds on the regression function to estimate; i.e., she considered minimization ofLunder the constraint

aL≤a≤aU, (3)

for two increasing vectorsaLandaU. As in the classical setting, the solution of this problem admits also a min-max representation. The PAVA can be generalized to efficiently compute this solution and has been implemented in the R package OrdMonReg(Balabdaoui et al., 2009). Computation relies on a suitable functional M defined on the setsA ⊆ {1, ..., n} which generalizes the functionAvin (2). This functional for the bounded monotone regres- sion in (3) is given by

M(A) =

Av(A)∨max

A aL

∧min

A aU

whereminAv= mini∈AviandmaxAv= maxi∈Avi. Compare Barlow et al. (1972, p. 57), where a functional notation is used. However, in the latter reference no formal justification was given for the form of the functional M nor for the validity of (the modified version of) the PAVA, see discussion after Theorem 2.1.

(4)

Chakravarti (1989) discusses the bounded isotonic regression problem for the absolute value criterion function, yielding the bounded isotonic median regressor. Chakravarti (1989) proposes a PAVA-like algorithm as well, and establishes some connections to linear program- ming theory. Unbounded isotonic median regression was first considered by Robertson and Waltman (1968), who provided a min-max formula for the estimator and a PAVA-like algorithm to compute it. They also studied its consistency.

Now suppose that instead of having only one set of observations y1, . . . , ynat the design pointsx1, . . . , xn, we are interested in analyzing two sets of data y1, . . . , ynand z1, . . . , zn

observed at the same design points. Furthermore, if we have the information that the un- derlying true regression curves are increasing and ordered, it is natural to try to construct estimators that fulfill the same constraints.

The current paper presents a solution to this problem of estimating two isotonic regres- sion curves under the additional constraint that they are ordered. This solution is the unique minimizer(a, b)of the criterion

L2(a, b) =

n

X

i=1

w1i(yi−ai)2+

n

X

i=1

w2i(zi−bi)2 (4) over the class of pairs of vectors(a, b)∈Rn×Rnsuch thataandbare increasing anda≤b, withw1andw2 given vectors of positive weights inRn.

The problem was motivated by an application from mechanical engineering. We will make use of experimental data obtained from dynamic material tests (see Shim and Mohr, 2009) to illustrate our estimation method. In engineering mechanics, it is common practice to determine the deformation resistance and strength of materials from uniaxial compression tests at different loading velocities. The experimental results are the so-called stress-strain curves (see Figure 1), and these may be used to determine the deformation resistance as a function of the applied deformation. The recorded signals contain substantial noise which is mostly due to variations in the loading velocity and electrical noise in the data acquisition system.

The data in this example consist of 1495 distinct pairs(xi, yi)and(xi, zi)wherexiis the measured strain, whileyi (gray curve) andzi (black curve) correspond to the experimental stress results for two different loading velocities. The true regression curves are expected to be (a) monotone increasing as the stress is known to be an increasing function of the strain (for a given constant loading velocity), and (b) ordered as the deformation resistance typically increases as the loading velocity increases. In Section 3, we show the resulting estimates as well as a smoothed version thereof.

We will show that minimizingL2 is equivalent to minimizing another convex functional over the class of isotonic vectorsa∈Rn. By doing so, we reduce a two-curve problem under the constraints of monotonicity and ordering to a one-curve problem under the constraint of monotonicity and boundedness. Actually, we can even perform the minimization over

(5)

0.0 0.2 0.4 0.6 0.8 1.0 0

5 10 15 20 25

measured strain, x

stress

Figure 1: Original observations.

the class of isotonic vectors (a1, . . . , an−1) of dimension n−1 satisfying the constraint a1 ≤. . .≤an−1≤anas we can explicitly determineanby a generalized min-max formula (see Proposition 2.3). The solution of this equivalent minimization problem, which gives the solution a (and also b because it is a function ofa), is computed using a projected subgradient algorithm where the projection step is performed using a suitable generalization of the PAVA.

We would like to note that Brunk et al. (1966) considered a related problem, that of non- parametric Maximum likelihood estimation of two ordered cumulative distribution functions.

In the same class of problems, Dykstra (1982) treated estimation of survival functions of two stochastically ordered random variables in the presence of censoring, which was extended by Feltz and Dykstra (1985) toN ≥ 2 stochastically ordered random variables. The theoreti- cal solution can be related to the well-known Kaplan-Meier estimator and can be computed using an iterative algorithmic procedure forN ≥ 3(see Feltz and Dykstra, 1985, p. 1016).

The√

n−asymptotics of the estimators forN = 2, whether there is censoring or not, were established by Præstgaard and Huang (1996).

The paper is organized as follows. In Section 2, we give the characterization of the ordered isotonic estimates. We also provide the explicit form of the solution of the related bounded

(6)

isotonic regression problem where the upper of the two isotonic curves is assumed to be fully known.

In Section 3 we describe the projected subgradient algorithm that we use to compute the Least Squares estimators of the ordered isotonic regression curves, and apply the method to real data from mechanical engineering. The technical proofs are deferred to appendices A and B.

2 Estimation of two ordered isotonic regression curves

If the larger of the two isotonic curves was known, then there would of course be no need to estimate it. If we putaU =a0, the weighted Least Squares estimateaof the smaller isotonic curve is the minimizer of

L(a) =

n

X

i=1

wi(yi−ai)2,

wherew∈Rnis a vector of given positive weights, anda∈ Ina0, the class of isotonic vectors a∈Rnsuch thata≤a0anda0 ∈Rn. When the components ofa0are all equal, the vector a0will be assimilated with the common value of its components as done in Proposition 3.4 below.

The notationInwwill be used again hereafter to denote the class of isotonic vectorsv∈Rn such thatv≤w.

The statement of Barlow et al. (1972, p. 57) implies that if we define M(A) =Av(A)∧min

A a0

for a subset A ⊆ {1, ..., n}, then the solution a can be computed using an appropriately modified version of the PAVA.

Theorem 2.1. Fori= 1, . . . , n, we have ai = max

s≤i min

t≥i M({s, . . . , t}) = max

s≤i min

t≥i

Av({s, . . . , t})∧a0s .

To keep this paper at a reasonable length, the proof of Theorem 2.1 is omitted. A short note containing a more thorough discussion of the one-curve problem and a proof of Theo- rem 2.1 can be obtained from the authors upon request. A general description of the modified PAVA and a proof that it works whenever the functionalM satisfies the so-called Averaging Property can be found in Section 3.

We now return to the main subject of this paper. Theorem 2.1 is crucial for finding the Least Squares estimates of two ordered isotonic regression curves. In particular, the result will be used to develop an appropriate algorithm to compute the solution.

(7)

Lety1, ...ynandz1, ..., znbe the observed data from two unknown isotonic curvesg1and g2such thatg1 ≤g2. Given two vectors inRnof positive weightsw1andw2, we would like to minimize

L2(a, b) =

n

X

i=1

(yi−ai)2w1i +

n

X

i=1

(zi−bi)2w2i (5) over the class of pairs of vectors(a, b)∈Rn×Rnsuch thataandbare isotonic anda≤b.

Call this classIn.

Existence and uniqueness of the solution. They follow from convexity and closedness of Inand strict convexity ofL2.

Characterization of the solution. For completeness, we give the characterization of the solution of minimizing (5) overIn; i.e, a necessary and sufficient condition for(a, b) ∈ In

to be equal to this solution. Leti1< ... < iksuch thati1 = 1, ik=nand a1 =...=ai1 < ai1+1=...=ai2−1 < ... < ai

k =...=an. We callBi0

j(resp. B1i

j) a set of indices{ij, ..., ij+1−1}, j = 1, ..., k−1such thatai

j =bi

j

(resp. aij < bij). Similarly, letl1 < ... < lrsuch thatl1 = 1, lr=nsuch that b1 =...=bl1 < bl1+1 =...=bl2−1 < ... < blk =...=bn and callCl0

j (resp. Cl1

j) a set of indices{lj, ..., lj+1−1}, j= 1, ..., r−1such thatbl

j =al

j

(resp. bl

j > al

j).

Theorem 2.2. The pair(a, b)∈ Inis the minimizer of (5) if and only if

n

X

i=1

(ai −yi)(ai −ai)w1i +

n

X

i=1

(bi −zi)(bi −bi)w2i ≥ 0, ∀(a, b)∈ In (6) X

s∈∪jB1ij

(as−ys)asw1s = 0, and (7) X

s∈∪jClj1

(bs−zs)bsw2s = 0. (8)

Proof. See Appendix A.

An explicit formula in the sense of a min-max representation similar to (2) of (a, b) turned out be to hard to find. However, sincea(resp. b) is also the minimizer of

n

X

i=1

(a−yi)2w1i resp.

n

X

i=1

(b−zi)2wi2

(8)

over the classInb(resp. the class of isotonic vectorsb∈Rnsuch thatb≥a), Theorem 2.1 implies that

ai = max

s≤i min

t≥i (Av1({s, . . . , t})∧bs) (9) bi = max

s≤i min

t≥i (Av2({s, . . . , t})∨at) (10) fori= 1, . . . , n, where

Av1(A) = P

i∈Ayiwi1 P

i∈Aw1i , andAv2(A) = P

i∈Aziw2i P

i∈Awi2 forA⊆ {1, ..., n}.

Thus, the solution(a, b)is a fixed point of the operatorP :In→ Indefined as

P((a, b)) = (P1(b),P2(a)) (11)

=

maxs≤i min

t≥i (Av1({s, . . . , t})∧bs),max

s≤i min

t≥i (Av2({s, . . . , t})∨at)

. However, this fixed point problem does not admit a unique solution. Therefore, there is no guarantee that an algorithm based on the above min-max formulas yields the solution, except in the unrealistic and uninteresting case where the starting point of the algorithm is the solution itself. To see thatP does not admit a unique fixed point, note that the minimizer of the criterion

n

X

i=1

(ai−yi)2w1i +B

n

X

i=1

(bi−zi)2wi2

is a fixed point ofP for any B > 0. Therefore, a computational method based on starting from an initial candidate and then alternating between (9) and (10) cannot be successful. In parallel, we have invested a substantial effort in trying to get a closed form for the estimators.

Although we did not succeed, we were able to obtain a closed form fora1(and by symmetry forbn).

Proposition 2.3. We have that a1 = min

t≥1 Av1({1, . . . , t})∧ min

t≥t≥1

M˜({1, . . . , t},{1, . . . , t}) (12) where

M˜(A, B) = Av1(A)(P

i∈Awi1) +Av2(B)(P

j∈Bw2j) P

i∈Aw1i +P

j∈Bw2j .

By symmetry, we also have that bn = max

t≤n Av2({t, . . . , n})∨ max

t≤t≤n

M˜({t, . . . , n},{t, . . . , n}). (13)

(9)

Some remarks are in order. On the one hand, the expressions obtained above indicate that the Least Squares estimator must depend, as expected, on the relative ratio of the weights w1 and w2. In particular, if w2 = 0 (resp. w1 = 0), the expression of a1 (resp. bn) specializes to the well-known min-max formula in the classical Least Squares estimation of an (unbounded) isotonic curve. On the other hand, the expression ofbn is essential for our subgradient algorithm below.

Proof of Proposition 2.3. See Appendix A.

In the next section, we describe how we can make use of the min-max formula in (9) to compute the estimators using a projected subgradient algorithm. As mentioned above, we use in this algorithm the identity (13) given in the previous proposition.

3 Algorithms and Application to real data

In this section, we show that the bounded isotonic estimator can be computed using the well- known PAVA, or to be more precise a modified version of it. Recall that the bounded isotonic estimator in the one-curve problem is given by

ai = max

s≤i min

t≥i M({s, . . . , t})

whereM(A) =Av(A)∨maxAa0, A⊆∈ {1, ..., n}. Thatacan be computed using a PAVA is a consequence of a more general result: This computational fact is true provided that a functionalM of setsA ⊆ {1, ..., n}satisfies what is referred to as the Averaging Property , (see Chakravarti, 1989, p. 138), also called Cauchy Mean Value Property by Leurgans (1981, Section 1). See also Robertson et al. (1988, p. 390). Note that in the classical unconstrained monotone regression problem, the min-max expression of the Least Squares estimator follows from Theorem 2.8 in Barlow et al. (1972, p. 80).

3.1 Getting the min-max solution by the PAVA

First, let us describe how the PAVA works for some set functionalM.

• At every step the current configuration is given by a subdivision of {1, ...,} into k subsets S1 = {1, . . . , i1}, S2 = {i1 + 1, . . . , i2}, . . . , Sk = {ik−1 + 1, . . . , n} for some indices1 =i0≤i1 < i2<· · ·< ik−1 < ik =n.

• The initial configuration is given by the finest subdivision; i.e.,Ij ={j}.

• At every step we look at the values ofM on the sets of the subdivision. A violation is noted each time there exists a valuejsuch thatM(Sj) > M(Sj+1). We consider the first violation (the one corresponding to the smallestj) and then merge the subsetsSj andSj+1into one interval.

(10)

• Given a new subdivision (which has one subset less than the previous one), we look for possible violations.

• The algorithm stops when there are no violations left.

Since for any violation a merging is performed (thus reducing the number of subsets), it is clear that the algorithm stops after a finite number of iterations.

We require now the set functional M to satisfy the following property. See Leurgans (1981, Section 1), Robertson et al. (1988, p. 390) and Chakravarti (1989, p. 138).

Definition 3.1. We say that the functionalM satisfies the Averaging Property if for any sets AandBsuch thatA∩B =∅we have that

min{M(A), M(B)} ≤M(A∪B)≤max{M(A), M(B)}.

Ifhandw >0are given vectors∈Rn, then beside

A7→Av(A) = X

i∈A

wihi/X

i∈A

wi,

the following examples of functions also satisfy the Averaging Property :

A 7→

Av(A)∨max

A h1i

∧min

A h0, withh0, h1 two vectors∈Rn,

A 7→ min

A h= min

i∈Ahi, A 7→ medAh= arg min

m∈R

X

i∈A

|hi−m|wi

where thearg minis taken to be the smallestmin case non-uniqueness occurs,

A 7→ max

A h= max

i∈A hi.

Note that the maximum, the minimum and the sum of two functionals satisfying the Av- eraging Property satisfy the same property as well.

Theorem 3.2. The final configuration obtained by the PAVA is such that the two following properties are satisfied.

1. The functionalM is increasing on the sets of the subdivision.

2. If one of the sets Sj = C ∪ D is the disjoint union of two subsets C = {ij−1 + 1, . . . , k} and D = {k+ 1, . . . , ij}, then M(C) > M(D); i.e., a finer subdivision would necessarily cause a violation.

Proof. The fact thatM is increasing on the final configuration is an easy consequence of the absence of violations (otherwise the algorithm would not have stopped).

(11)

As for the second part of the property, note that this is satisfied by the initial configuration (since no set is the disjoint union of two non-trivial subsets), as well as by any configura- tion that one could obtain after the first merging (since a merging occurs only because of a violation). Now we will use an inductive reasoning.

To this end, we have to check two situations: Suppose we merge two subsequent sets A andB and want to check whether there is a violation onCandD, withA∪B =C∪D. We are in one of the two following cases: eitherA =A1 ∪A2,C =A1 andD =A2∪B, or B =B1∪B2,C =A∪B1andD=B2(the caseC=AandD=Bis trivial).

In the first case, if we supposeM(D)≥M(C), we get

M(A2∪B)≥M(A1), M(A2)< M(A1), M(B)< M(A) =M(A1∪A2), (the first inequality follows by assumption, the second by induction, and the third is true since AandBhave been merged) and this is impossible since one would conclude that

max{M(A2), M(B)} ≥M(A1)> M(A2),

and henceM(A)> M(B)≥M(A1)> M(A2), which impliesM(A)>max{M(A1), M(A2)}, which contradicts the Averaging Property .

In the second case we would have

M(A∪B1)≤M(B2), M(B2)< M(B1), M(A)> M(B) =M(B1∪B2), which implies

min{M(A), M(B1)} ≤M(B2)< M(B1),

and thenmin{M(A), M(B1)}=M(A)andM(A)≤M(B2)< M(B1), which contradicts

eitherM(A)< M(B)or the Averaging Property . 2

Theorem 3.3. If(Sj)j is the partition obtained at the end of the PAVA described above, then mi =M(Sji)such thati∈Sji takes the same values given by the min-max formula for the indexi.

Proof. See Appendix A.

3.2 Preparing for a projected subgradient algorithm

The following proposition is crucial for computing the ordered isotonic estimators via a pro- jected subgradient algorithm.

Proposition 3.4. LetΨbe the criterion Ψ(b1, . . . , bn−1) =

n

X

i=1

max

s≤i (Gs,i∧bs)−yi2

wi1+

n−1

X

i=1

(bi−zi)2wi2 (14)

(12)

which is to be minimized on the convex set

In−1bn ={(b1, . . . , bn−1)∈Rn−1 : b1≤b2≤. . .≤bn−1 ≤bn} where

Gs,i= min

t≥i Av1({s, . . . , t}) and bn=bn in Gn,n∧bn, (14).

The criterion Ψ is convex. Furthermore, its unique minimizer (b∗∗1 , . . . , b∗∗n−1) equals (b1, . . . , bn−1).

Proof. Let us write

I =In={a= (a1, . . . , an)∈Rn:a1≤. . .≤an},

In =n

b= (b1, . . . , bn) : (b1, . . . , bn−1)∈ In−1bn andbn=bno and consider

Inb ={a:a∈ I anda≤b} forb∈ In.

Now note that the min-max formula in (9) allows us to write

n

X

j=1

max

s≤j (Gs,j∧bs)−yj2

w1j +

n−1

X

j=1

(bj−zj)2w2j

= min

a∈Inb n

X

j=1

(aj−yj)2w1j +

n−1

X

j=1

(bj−zj)2w2j.

Hence, we have forb∈ In

Ψ(b1, ..., bn−1) = min

a∈Inb n

X

j=1

(aj−yj)2w1j +

n−1

X

j=1

(bj−zj)2w2j

=

n

X

j=1

(˜aj(b)−yj)2wj1+

n−1

X

j=1

(bj−zj)2w2j

where˜aj(b) = maxs≤j(Gs,j ∧bs)is the j-th component of the minimizer of the function Pn

j=1(aj−yj)2w1j inInb. Letλ∈[0,1], andbandbinIn. By definition ofInb andInb, we have that

λa(b) + (1˜ −λ) ˜a(b)≤λ b+ (1−λ)b

(13)

and hence

n

X

j=1

˜

aj(λ b+ (1−λ)b)−yj

2

wj1

n

X

j=1

λ˜a(b) + (1−λ) ˜a(b)−yj2

wj1

≤λ

n

X

j=1

j(b)−yj2

w1j + (1−λ)

n

X

j=1

j(b)−yj2

w1j.

This shows convexity of the first term ofΨ. Convexity ofΨnow follows from convexity of the functionPn−1

j=1(bj−zj)2w2j and the fact that the sum of two convex functions defined

on the same domain is also convex. 2

The idea behind considering the convex functional Ψis to reduce the dimensionality of the problem as well as the number of constraints (from3n−2ton−1constraints). OnceΨ is minimized; i.e, the isotonic estimatebis computed,acan be obtained using the min-max formula given in (9). However, the convex functionalΨis not continuously differentiable, hence the need for an optimization algorithm that uses the subgradient instead of the gradient as the latter is not defined everywhere.

3.3 A projected subgradient algorithm to computeb1, . . . , bn−1

To minimize the non-smooth convex functionΨwe use a projected subgradient algorithm.

Since the gradient does not exist on the entire domain of the function, one has to resort to computation of a subgradient, the analogue of the gradient at points where the latter does not exist. As opposed to classical methods developed for minimizing smooth functions, the procedure of searching for the direction of descent and steplengths is entirely different. The classical reference for subgradient algorithms is Shor (1985). Boyd et al. (2003) provides a nice summary of the topic, including the projected variant. Note that a recent application in statistics of the subgradient algorithms gives now the possibility to compute the log-concave density estimator in high dimensions; see Cule et al. (2008).

The main steps of the algorithm. Now recall that the functional Ψshould be minimized over the(n−1)−dimensional convex setIn−1bn given in Proposition 3.4. Of course, this is the same as minimizingΨover then−dimensional convex set{(b1, . . . , bn)|b1 ≤. . .≤bn−1}, starting with an initial vector(b(0)1 , . . . , b(0)n )such thatb(0)n =bn and constraining then−th component of the sub-gradient ofΨto be equal to 0.

Given a steplength τk, the new iterate bk+1 = (bk1, . . . , bkn) at the k−th iteration of a subgradient algorithm is given by

vk+1 = bk−τkDk,

(14)

where Dk is the subgradient calculated at the previous iterate; i.e., Dk = ˜∇Ψ(vk) (see Appendix B). However, it may happen that vk+1 is not admissible; i.e. (bk+11 , . . . , bk+1n−1) does not belong toIn−1bn . When this occurs, an L2 projection of this iterate onto In−1bn is performed. This is equivalent to finding the minimizer of

n

X

i=1

(ai−bk+1i )2

over the setInbn. The latter problem can be solved using the generalized PAVA for bounded isotonic regression as described above.

The computation of the subgradient Dk is described in detail in Appendix B. As for the steplengthτk, we start the algorithm with a constant steplength. Once a pre-specified number of iterations has been reached we switch to

τk+1 = (h0.1k kDkk2)−1

whereγk := h−0.1k is such that0 ≤ γk → 0 ask → ∞and P

k=1γk = ∞. Here,k · k2

denotes theL2-norm of a vector in Rn. This combination of constant and non-summable diminishing steplength showed a good performance in our implementation of the algorithm over other classical choices of(γk)k. Furthermore, convergence is ensured by the following theorem.

Theorem 3.5. (Boyd et al. (2003)) A subgradient algorithm complemented with least-square projection and using non-summable diminishing steplength yields for anyη > 0afterk = k(η)iterations a vectorbk := (bk1, . . . , bkn)such that

i=1,...,kmin Ψ(bi)−Ψ(b) ≤ η, whereb = (b1, . . . , bn)is the vector given in Proposition 3.4.

The proof can be found in Boyd et al. (2003) by combining their arguments in Sections 2 and 3. Note that in our implementation we do not keep track of the iterate that yielded the minimal value ofΨ, since we apply a problem-motivated stopping criterion that guarantees us to have reached an iterate that is sufficiently close tob= (b1, . . . , bn).

Choice of stopping rule. Since in subgradient algorithms the convex target functional does not necessarily monotonically decrease with increasing number of iterations, the choice of a suitable stopping criterion is delicate. However, in our specific setting we use the fact that (a, b)is a fixed point of the operatorP defined in (11) wherea =P1(b); the solution of (1) with upper boundb. This motivates iterating the algorithm until the difference of entries of the two vectorsbkandbk#where

bk#=P2◦P1(bk) is below a pre-specified positive constantδ.

(15)

The implementation. The projected subgradient algorithm for the two curve problem as well as the generalized PAVA computing the solution for one curve under the constraints (3) were implemented inR(R Development Core Team, 2008). The corresponding package OrdMonRegBalabdaoui et al. (2009) is available on CRAN. Note that the data analyzed in Section 3.4 is made available as a dataset inOrdMonReg.

To conclude this section on the algorithmic aspects of our work, we would like to men- tion the work by Beran and D ¨umbgen (2009) who propose an active set algorithm which can be tailored to solve the problem given in (5) for an arbitrary number of ordered monotone curves. However, Beran and D ¨umbgen (2009) do not provide an analysis of the structure of the estimated curves such as characterizations and rather put their emphasis on the algorith- mic developments of the problem.

3.4 Real data example from mechanical engineering

We would like to estimate the stress-strain curves based on the available experimental data for two different velocity levels (see Figure 1). The expected curves have to be isotonic and ordered. The data consist of 1495 pairs(xi, yi)and(yi, zi). The values of the measured strain of the material (on thex-axis), are actually defined as (−)the logarithm of the ratio of the current over the initial specimen length. The values are positive and take the maximal value 1, which corresponds to a maximum shortening of 63%.

Furthermore, since the stress measurements for different velocities are not performed ex- actly at the same strain, the values of the stress have been interpolated at equally spaced values of the strain. As pointed out by a referee, this will induce correlation between the strain data. Even if the strain measurement were not interpolated, having correlated stress measurements is rather inevitable in this particular application because of the data process- ing procedures associated with the measurement technique (see Shim and Mohr, 2009). The estimation method is however still applicable. When studying statistical properties of the isotonic estimators such as consistency and convergence, the correlation between the data should be of course taken into account.

In such problems, practitioners usually fit parametric models using a trial and error ap- proach in an attempt to capture monotonicity of the stress-strain curves as well as their order- ing. The methods used are rather arbitrary and can also be time consuming, hence the need for an alternative estimation approach. Our main goal is to provide those practitioners with a rigorous way for estimating the ordered stress-strain curves.

In Figure 2 (upper plot) we provide the original data (black and gray dots) and the pro- posed ordered isotonic estimates a and b as described above. Being step functions, the estimated isotonic curves are non-smooth, a well known drawback of isotonic regression, see among others Wright (1978) and Mukerjee (1988). The latter author pioneered the combi- nation of isotonization followed by kernel smoothing. A thorough asymptotic analysis of

(16)

the smoothed isotonized and the isotonic smooth estimators was given by Mammen (1991).

Mukerjee (1988, p. 743) shows that monotonicity of the regression function is preserved by the smoothing operation if the used kernel is log-concave. Thus, we define our smoothed ordered monotone estimators by

˜

ah(x) = Pn

i=1Kh(x−t)ai Pn

i=1Kh(x−xi) ˜bh(x) = Pn

i=1Kh(x−t)bi Pn

i=1Kh(x−xi)

for0≤ x ≤1. For simplicity, we used the kernelKh(x) =φ(x/h) whereφis the density function of a standard normal distribution which is clearly log-concave. Figure 2 (lower plot) depicts the smoothed isotonic estimates. We set the bandwidth toh= 0.1n−1/5 ≈0.023.

Motivated by estimation of stress-strain curves, an application from mechanical engineer- ing, we consider in this paper weighted Least Squares estimators in the problem of estimating two ordered isotonic regression curves. We provide characterizations of the solution and de- scribe a projected subgradient algorithm which can be used to compute this solution. As a by-product, we show how an adaptation of the well-known PAVA can be used to compute min-max estimators for any set functional satisfying the Averaging Property.

Acknowledgements. The first author would like to thank C´ecile Durot for some interest- ing discussions around the subject. We also thank JongMin Shim for having made the data available to us.

A Proofs

Proof of Theorem 2.2. Suppose that(a, b)is the solution. Forǫ∈(0,1), and(a, b) ∈ In

consider the pair(aǫ, bǫ)∈Rn×Rndefined as

aǫ = a+ǫ(a−a) bǫ = b+ǫ(b−b).

Fori≤j∈ {1, ..., n}, we have

aǫj−aǫi = (1−ǫ)(aj −ai) +ǫ(aj−ai)≥0 bǫj −bǫi = (1−ǫ)(bj −bi) +ǫ(bj−bi)≥0.

Also, fori∈ {1, ..., n}we have

aǫi −bǫi = (1−ǫ)(ai −bi) +ǫ(ai−bi)≤0.

(17)

0.0 0.2 0.4 0.6 0.8 1.0 0

5 10 15 20 25

measured strain, x

stress

upper isotonic estimate b*

lower isotonic estimate a*

0.0 0.2 0.4 0.6 0.8 1.0

0 5 10 15 20 25

measured strain, x

stress

upper isotonic smoothed estimate b ~

* lower isotonic smoothed estimate a ~*

Figure 2: Original observations, isotonic and isotonic smoothed estimates.

(18)

Hence,(aǫ, bǫ)∈ In, and 0 ≤ lim

ǫց0

1

ǫ(L2(aǫ, bǫ)−L2(a, b))

=

n

X

i=1

(ai −yi)(ai−ai)w1i +

n

X

i=1

(bi −zi)(bi−bi)w2i

yielding the inequality in (6).

Now consider the vectorsaǫandbǫsuch that forl= 1, ..., n aǫl = al +ǫ al 1l∈B1

ij

bǫl = bl

Letr≤s∈ {1, ..., n}. Ifr /∈Bi1jands /∈Bi1j, thenaǫs−aǫr =as−ar ≥0. Ifr∈Bi1j and s /∈Bi1

j, thenas > ar and aǫs−aǫr =as−ar+ǫas >0for|ǫ|small enough. The same reasoning applies ifr /∈Bi1jands∈Bi1j. Finally, ifr, s∈Bi1j, thenaǫs−aǫr= 0.

Now, forr ∈ {1, ..., n}, we haveaǫr =ar≤brifr /∈Bi1j. Otherwise,aǫr =ar(1+ǫ)< br if|ǫ|is small enough. Hence,(aǫ, bǫ)∈ In, and

0 = lim

ǫց0

1

ǫ(L2(aǫ, bǫ)−L2(a, b))

=

n

X

r=1

(ar−yr)1r∈B1

ijarwr1. Summing up over all the setsBi1

j yields the identity in (7). We can prove very similarly the identity in (8).

Conversely, suppose that(a, b)∈ Insatisfies the inequality in (6). For any(a, b)∈ In, we have

L2(a, b)−L2(a, b) = 1 2

n

X

i=1

(ai−ai)2w1i +1 2

n

X

i=1

(bi−bi)2wi2

+

n

X

i=1

(ai −yi)(ai−ai)wi1

+

n

X

i=1

(bi −zi)(bi−bi)w2i

≥ 0.

We conclude that(a, b)is the solution of the minimization problem. 2 Proof of Proposition 2.3. Letǫ >0and consider(a, b)∈Rn×Rnsuch that

ai = ai −ǫ1i∈{1,..,t}, t∈ {1, ...n} bi = bi

(19)

for i = 1, ..., n. For small ǫ, (a, b) ∈ In. Using the characterization in Theorem 2.2, it follows that

t

X

j=1

(aj −yj)w1j ≤0

implying that

t

X

j=1

(a1−yj)w1j ≤0, for t∈ {1, ...n} or equivalently

a1≤min

t≥1 Av1({1, . . . , t}).

Now, consider(a, b)∈Rn×Rnsuch that

aj = aj−ǫ1j∈{1,...,t}, t∈ {1, ..., n} bj = bj −ǫ1j∈{1,...,t}, 1≤t≤t

forj= 1, ..., n, withǫ >0. For smallǫ, we have that(a, b)∈ I2, and hence

t

X

j=1

(aj −yj)w1j +

t

X

j=1

(bj −zj)wj2 ≤0.

It follows that

t

X

j=1

(a1−yj)w1j +

t

X

j=1

(a1−zj)wj2≥0,

that is

a1≤ min

1≤t≤t≤n

M˜({1, . . . , t},{1, . . . , t}).

We conclude that a1≤min

t≥1 Av1({1, . . . , t})∧ min

t≥t≥1

M(˜ {1, . . . , t},{1, . . . , t}).

Now ifa1 < b1, leti1{1, ..., n}be such thata1 =. . .=ai1. Then(a, b)is such that aj = aj+ǫ1j∈{1,...,i1}

bj = bj

forj= 1, ..., nis inInwhen|ǫ|is small enough. It follows that Av1({1, . . . , i1}) = a1.

(20)

Ifa1 =b1, andi1 andi′′1 are such thata1 =.... =ai

1 andb1 =.... =bi′′

1, then(a, b)such that

aj = aj+ǫ1j∈{1,...,i 1}

bj = bj +ǫ1j∈{1,...,i′′

1}

forj= 1, ..., nis inInfor|ǫ|small enough. Hence,

a1 = ˜M({1, . . . , i1},{1, . . . , i′′1}).

(note thati′′1 ≤i1). Therefore, a1= min

t≥1 Av1({1, . . . , t})∧ max

t≥t≥1

M˜({1, . . . , t},{1, . . . , t}).

The expression of b1 follows easily by replacing respectively yi and zi by −zn−i+1 and

−yn−i+1fori= 1, . . . , n.2

Proof of Theorem 3.3. Considera∈Rngiven by ai = max

s≤i min

t≥i M({s, . . . , t})

and also the subdivision into subsetsSj ={ij−1+ 1, . . . , ij}obtained by the PAVA. Let us denote byG(resp. G+) the grid set of indices which correspond to points at the beginning (resp. end) of those subsets; i.e. of the formij+ 1(resp. ij).

We obviously have

ai ≤max

s≤i min

t≥i, t∈G+M({s, . . . , t}).

Then, considers /∈G. This means that we have a set{s, . . . , t}of the formB∪C,Cbeing a union of subsets in the subdivision andBa right subset of a set of the partition of the form A∪B. We want to prove thatM({s, . . . , t}) =M(B∪C)is either smaller thanM(C)or M(A∪B∪C). Suppose this is not the case. Then we would have

M(B∪C)> M(C), M(B∪C)> M(A∪B∪C), M(A)> M(B),

where the last inequality is implied by the second property in Theorem 3.2. Yet, the second inequality, together with the Averaging Property , implies thatM(A) < M(B ∪C). In the end we get

M(B∪C)> M(C), M(B∪C)> M(A)> M(B), which contradicts the Averaging Property .

We conclude thatM({s, . . . , t})is smaller than the value ofM at a set which is a union of sets of the subdivision; i.e. either A∪B ∪C orC itself. But on sets of this kind it is obvious, by the Averaging Property , thatM is smaller than the valuemt, since this is the

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

convex discrete distribution, nonparametric estimation, least squares, support reduction algorithm, abundance

Based on the LOCAL calibration algorithm (Shenk et al., 1997) and the Locally Weighted Regression (LWR ; Naes et al., 1990), we developed a powerful routine associating local

In this paper we proposed m-power regularized least squares regression (RLSR), a supervised regression algorithm based on a regularization raised to the power of m, where m is with

Penalized least squares regression is often used for signal denoising and inverse problems, and is commonly interpreted in a Bayesian framework as a Maximum A Posteriori

In general, for projection estimators in L 2 density estimation, the cross- validation risk estimator is not a (conditional) empirical process but a (weighted) U-statistic of order

For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order d/n without logarithmic factor unlike some stan- dard

In particular, a numerical illustration is given for the three following models: a Gaussian AR(1) model for which our approach can be well understood, an AR(1) process with