Least Squares estimation of two ordered monotone regression curves

(1)

HAL Id: hal-00417281

https://hal.archives-ouvertes.fr/hal-00417281

Preprint submitted on 15 Sep 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Least Squares estimation of two ordered monotone regression curves

Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio

To cite this version:

Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio. Least Squares estimation of two ordered monotone regression curves. 2009. �hal-00417281�

(2)

Least Squares estimation of two ordered monotone regression curves

running headline: ordered monotone regression

Fadoua Balabdaoui^(1,2), Kaspar Rufibach⁽³⁾and Filippo Santambrogio⁽¹⁾

1CEREMADE Universit´e de Paris-Dauphine Place du Mar´echal de Lattre de Tassigny

75775 Paris CEDEX 16, France

2Universität Göttingen Institut für Mathematische Stochastik

Goldschmidtstrasse 7 37077 G¨ottingen

3Universit¨at Z¨urich

Institut f¨ur Sozial- und Pr¨aventivmedizin Abteilung Biostatistik

Hirschengraben 84 8001 Z¨urich

fadoua@ceremade.dauphine.fr

kaspar.rufibach@ifspm.uzh.ch (corresponding author) filippo@ceremade.dauphine.fr

Abstract

In this paper, we consider the problem of finding the Least Squares estimators of two isotonic regression curvesg^◦1 andg2^◦ under the additional constraint that they are ordered; e.g., g^◦1 ≤ g^◦2. Given two sets ofndata pointsy¹, .., ynandz¹, . . . , znobserved at (the same) design points, the estimates of the true curves are obtained by minimizing the weighted Least Squares criterionL2(a, b) = Pn

j=1(yj−aj)²w¹_j +Pn

j=1(zj−bj)²w²_j over the class of pairs of vectors(a, b)∈Rⁿ×Rⁿsuch thata¹ ≤ a² ≤... ≤ an,b¹ ≤ b² ≤ ... ≤bn, andai ≤ bi, i = 1, ..., n. The characterization of the estimators is established. To compute these estimators, we use an iterative projected subgradient algorithm, where the projection is performed with a “generalized” pool-adjacent-violaters algorithm (PAVA), a byproduct of this work. Then, we apply the estimation method to real data from mechanical engineering.

Keywords: least squares, monotone regression, pool-adjacent-violaters algorithm, shape con- straint estimation, subgradient algorithm

1 Introduction and motivation

Estimating a monotone regression curve is one of the most classical estimation problems under shape restrictions, see e.g. Brunk (1958). A regression curve is said to be isotonic if it

(3)

is monotone nondecreasing. We chose in this paper to look at the class of isotonic regression functions. The simple transformationg → −gsuffices for the results of this paper to carry over to the antitonic class.

Givennfixed pointsx₁, . . . , x_n, assume that we observey_iatx_ifori= 1, . . . , n. When the points(xi, yi)are joined, the shape of the obtained graph can hint at the increasing monotonicity of the true regression curve,g^◦ say, assuming the modely_i = g^◦(x_i) +ε_i, withε_i the unobserved errors. This shape restriction can also be a feature of the scientific problem at hand, and hence the need for estimating the true curve in the class of antitonic functions.

We refer to Barlow et al. (1972) and Robertson et al. (1988) for examples. The weighted Least Squares estimate ofg^◦ in the class of isotonic functions takingy_i atx_i is the unique minimizer of the criterion

L(a) =

n

X

i=1

wi(yi−ai)² (1)

over the class of vectorsa∈Rⁿsuch thata₁ ≤a₂....≤a_nwherew₁ >0, w₂ >0, . . . , w_n>

0are given positive weights. In what follows, we will say that a vector v∈Rⁿis increasing or isotonic ifv₁ ≤. . .≤v_n, and use the notationv≤wforv, w ∈Rⁿif the inequality holds componentwise.

It is well known that the solutiona^∗ of the Least Squares problem in (1) is given by the so-called min-max formula; i.e.,

a^∗_i = max

s≤i min

t≥i Av({s, . . . , t}) (2)

whereAv({s, . . . , t}) =P_t

i=sy_iw_i/P_t

i=sw_i(see e.g. Barlow et al., 1972).

van Eeden (1957a,b) has generalized this problem to incorporate known bounds on the regression function to estimate; i.e., she considered minimization ofLunder the constraint

a_L≤a≤a_U, (3)

for two increasing vectorsa_Landa_U. As in the classical setting, the solution of this problem admits also a min-max representation. The PAVA can be generalized to efficiently compute this solution and has been implemented in the R package OrdMonReg(Balabdaoui et al., 2009). Computation relies on a suitable functional M defined on the setsA ⊆ {1, ..., n} which generalizes the functionAvin (2). This functional for the bounded monotone regression in (3) is given by

M(A) =

Av(A)∨max

A a_L

∧min

A a_U

wheremin_Av= min_i∈Av_iandmax_Av= max_i∈Av_i. Compare Barlow et al. (1972, p. 57), where a functional notation is used. However, in the latter reference no formal justification was given for the form of the functional M nor for the validity of (the modified version of) the PAVA, see discussion after Theorem 2.1.

(4)

Chakravarti (1989) discusses the bounded isotonic regression problem for the absolute value criterion function, yielding the bounded isotonic median regressor. Chakravarti (1989) proposes a PAVA-like algorithm as well, and establishes some connections to linear program- ming theory. Unbounded isotonic median regression was first considered by Robertson and Waltman (1968), who provided a min-max formula for the estimator and a PAVA-like algorithm to compute it. They also studied its consistency.

Now suppose that instead of having only one set of observations y₁, . . . , y_nat the design pointsx1, . . . , xn, we are interested in analyzing two sets of data y1, . . . , ynand z1, . . . , zn

observed at the same design points. Furthermore, if we have the information that the un- derlying true regression curves are increasing and ordered, it is natural to try to construct estimators that fulfill the same constraints.

The current paper presents a solution to this problem of estimating two isotonic regression curves under the additional constraint that they are ordered. This solution is the unique minimizer(a^∗, b^∗)of the criterion

L₂(a, b) =

n

X

i=1

w¹_i(y_i−a_i)²+

n

X

i=1

w²_i(z_i−b_i)² (4) over the class of pairs of vectors(a, b)∈Rⁿ×Rⁿsuch thataandbare increasing anda≤b, withw¹andw² given vectors of positive weights inRⁿ.

The problem was motivated by an application from mechanical engineering. We will make use of experimental data obtained from dynamic material tests (see Shim and Mohr, 2009) to illustrate our estimation method. In engineering mechanics, it is common practice to determine the deformation resistance and strength of materials from uniaxial compression tests at different loading velocities. The experimental results are the so-called stress-strain curves (see Figure 1), and these may be used to determine the deformation resistance as a function of the applied deformation. The recorded signals contain substantial noise which is mostly due to variations in the loading velocity and electrical noise in the data acquisition system.

The data in this example consist of 1495 distinct pairs(xi, yi)and(xi, zi)wherexiis the measured strain, whiley_i (gray curve) andz_i (black curve) correspond to the experimental stress results for two different loading velocities. The true regression curves are expected to be (a) monotone increasing as the stress is known to be an increasing function of the strain (for a given constant loading velocity), and (b) ordered as the deformation resistance typically increases as the loading velocity increases. In Section 3, we show the resulting estimates as well as a smoothed version thereof.

We will show that minimizingL₂ is equivalent to minimizing another convex functional over the class of isotonic vectorsa∈Rⁿ. By doing so, we reduce a two-curve problem under the constraints of monotonicity and ordering to a one-curve problem under the constraint of monotonicity and boundedness. Actually, we can even perform the minimization over

(5)

0.0 0.2 0.4 0.6 0.8 1.0 0

5 10 15 20 25

measured strain, x

stress

Figure 1: Original observations.

the class of isotonic vectors (a1, . . . , an−1) of dimension n−1 satisfying the constraint a₁ ≤. . .≤a_n−1≤a^∗_nas we can explicitly determinea^∗_nby a generalized min-max formula (see Proposition 2.3). The solution of this equivalent minimization problem, which gives the solution a^∗ (and also b^∗ because it is a function ofa^∗), is computed using a projected subgradient algorithm where the projection step is performed using a suitable generalization of the PAVA.

We would like to note that Brunk et al. (1966) considered a related problem, that of non- parametric Maximum likelihood estimation of two ordered cumulative distribution functions.

In the same class of problems, Dykstra (1982) treated estimation of survival functions of two stochastically ordered random variables in the presence of censoring, which was extended by Feltz and Dykstra (1985) toN ≥ 2 stochastically ordered random variables. The theoreti- cal solution can be related to the well-known Kaplan-Meier estimator and can be computed using an iterative algorithmic procedure forN ≥ 3(see Feltz and Dykstra, 1985, p. 1016).

The√

n−asymptotics of the estimators forN = 2, whether there is censoring or not, were established by Præstgaard and Huang (1996).

The paper is organized as follows. In Section 2, we give the characterization of the ordered isotonic estimates. We also provide the explicit form of the solution of the related bounded

(6)

isotonic regression problem where the upper of the two isotonic curves is assumed to be fully known.

In Section 3 we describe the projected subgradient algorithm that we use to compute the Least Squares estimators of the ordered isotonic regression curves, and apply the method to real data from mechanical engineering. The technical proofs are deferred to appendices A and B.

2 Estimation of two ordered isotonic regression curves

If the larger of the two isotonic curves was known, then there would of course be no need to estimate it. If we puta_U =a⁰, the weighted Least Squares estimatea^∗of the smaller isotonic curve is the minimizer of

L(a) =

n

X

i=1

w_i(y_i−a_i)²,

wherew∈Rⁿis a vector of given positive weights, anda∈ In^a⁰, the class of isotonic vectors a∈Rⁿsuch thata≤a⁰anda⁰ ∈Rⁿ. When the components ofa⁰are all equal, the vector a⁰will be assimilated with the common value of its components as done in Proposition 3.4 below.

The notationIn^wwill be used again hereafter to denote the class of isotonic vectorsv∈Rⁿ such thatv≤w.

The statement of Barlow et al. (1972, p. 57) implies that if we define M(A) =Av(A)∧min

A a⁰

for a subset A ⊆ {1, ..., n}, then the solution a^∗ can be computed using an appropriately modified version of the PAVA.

Theorem 2.1. Fori= 1, . . . , n, we have a^∗_i = max

s≤i min

t≥i M({s, . . . , t}) = max

s≤i min

t≥i

Av({s, . . . , t})∧a⁰_s .

To keep this paper at a reasonable length, the proof of Theorem 2.1 is omitted. A short note containing a more thorough discussion of the one-curve problem and a proof of Theo- rem 2.1 can be obtained from the authors upon request. A general description of the modified PAVA and a proof that it works whenever the functionalM satisfies the so-called Averaging Property can be found in Section 3.

We now return to the main subject of this paper. Theorem 2.1 is crucial for finding the Least Squares estimates of two ordered isotonic regression curves. In particular, the result will be used to develop an appropriate algorithm to compute the solution.

(7)

Lety₁, ...y_nandz₁, ..., z_nbe the observed data from two unknown isotonic curvesg^◦₁and g^◦₂such thatg^◦₁ ≤g^◦₂. Given two vectors inRⁿof positive weightsw¹andw², we would like to minimize

L₂(a, b) =

n

X

i=1

(y_i−a_i)²w¹_i +

n

X

i=1

(z_i−b_i)²w²_i (5) over the class of pairs of vectors(a, b)∈Rⁿ×Rⁿsuch thataandbare isotonic anda≤b.

Call this classIn.

Existence and uniqueness of the solution. They follow from convexity and closedness of Inand strict convexity ofL₂.

Characterization of the solution. For completeness, we give the characterization of the solution of minimizing (5) overIn; i.e, a necessary and sufficient condition for(a, b) ∈ In

to be equal to this solution. Leti₁< ... < i_ksuch thati₁ = 1, i_k=nand a^∗₁ =...=a^∗_i₁ < a^∗_i₁₊₁=...=a^∗_i₂₋₁ < ... < a^∗_i

k =...=a^∗_n. We callB_i⁰

j(resp. B¹_i

j) a set of indices{i_j, ..., i_j+1−1}, j = 1, ..., k−1such thata^∗_i

j =b^∗_i

j

(resp. a^∗_i_j < b^∗_i_j). Similarly, letl1 < ... < lrsuch thatl1 = 1, lr=nsuch that b^∗₁ =...=b^∗_l₁ < b^∗_l₁₊₁ =...=b^∗_l₂₋₁ < ... < b^∗_l_k =...=b^∗_n and callC_l⁰

j (resp. C_l¹

j) a set of indices{l_j, ..., l_j+1−1}, j= 1, ..., r−1such thatb^∗_l

j =a^∗_l

j

(resp. b^∗_l

j > a^∗_l

j).

Theorem 2.2. The pair(a^∗, b^∗)∈ Inis the minimizer of (5) if and only if

n

X

i=1

(a^∗_i −y_i)(a^∗_i −a_i)w¹_i +

n

X

i=1

(b^∗_i −z_i)(b^∗_i −b_i)w²_i ≥ 0, ∀(a, b)∈ In (6) X

s∈∪jB¹_ij

(a^∗_s−y_s)a^∗_sw¹_s = 0, and (7) X

s∈∪jC_lj¹

(b^∗_s−z_s)b^∗_sw²_s = 0. (8)

Proof. See Appendix A.

An explicit formula in the sense of a min-max representation similar to (2) of (a^∗, b^∗) turned out be to hard to find. However, sincea^∗(resp. b^∗) is also the minimizer of

n

X

i=1

(a−y_i)²w¹_i resp.

n

X

i=1

(b−z_i)²w_i²

(8)

over the classIn^b^∗(resp. the class of isotonic vectorsb∈Rⁿsuch thatb≥a^∗), Theorem 2.1 implies that

a^∗_i = max

s≤i min

t≥i (Av₁({s, . . . , t})∧b^∗_s) (9) b^∗_i = max

s≤i min

t≥i (Av2({s, . . . , t})∨a^∗_t) (10) fori= 1, . . . , n, where

Av₁(A) = P

i∈Ay_iw_i¹ P

i∈Aw¹_i , andAv₂(A) = P

i∈Az_iw²_i P

i∈Aw_i² forA⊆ {1, ..., n}.

Thus, the solution(a^∗, b^∗)is a fixed point of the operatorP :In→ Indefined as

P((a, b)) = (P1(b),P2(a)) (11)

=

maxs≤i min

t≥i (Av₁({s, . . . , t})∧b_s),max

s≤i min

t≥i (Av₂({s, . . . , t})∨a_t)

. However, this fixed point problem does not admit a unique solution. Therefore, there is no guarantee that an algorithm based on the above min-max formulas yields the solution, except in the unrealistic and uninteresting case where the starting point of the algorithm is the solution itself. To see thatP does not admit a unique fixed point, note that the minimizer of the criterion

n

X

i=1

(ai−yi)²w¹_i +B

n

X

i=1

(bi−zi)²w_i²

is a fixed point ofP for any B > 0. Therefore, a computational method based on starting from an initial candidate and then alternating between (9) and (10) cannot be successful. In parallel, we have invested a substantial effort in trying to get a closed form for the estimators.

Although we did not succeed, we were able to obtain a closed form fora^∗₁(and by symmetry forb^∗_n).

Proposition 2.3. We have that a^∗₁ = min

t≥1 Av1({1, . . . , t})∧ min

t≥t^′≥1

M˜({1, . . . , t},{1, . . . , t^′}) (12) where

M˜(A, B) = Av1(A)(P

i∈Aw_i¹) +Av2(B)(P

j∈Bw²_j) P

i∈Aw¹_i +P

j∈Bw²_j .

By symmetry, we also have that b^∗_n = max

t≤n Av₂({t, . . . , n})∨ max

t≤t^′≤n

M˜({t^′, . . . , n},{t, . . . , n}). (13)

(9)

Some remarks are in order. On the one hand, the expressions obtained above indicate that the Least Squares estimator must depend, as expected, on the relative ratio of the weights w¹ and w². In particular, if w² = 0 (resp. w¹ = 0), the expression of a^∗₁ (resp. b^∗_n) specializes to the well-known min-max formula in the classical Least Squares estimation of an (unbounded) isotonic curve. On the other hand, the expression ofb^∗_n is essential for our subgradient algorithm below.

Proof of Proposition 2.3. See Appendix A.

In the next section, we describe how we can make use of the min-max formula in (9) to compute the estimators using a projected subgradient algorithm. As mentioned above, we use in this algorithm the identity (13) given in the previous proposition.

3 Algorithms and Application to real data

In this section, we show that the bounded isotonic estimator can be computed using the well- known PAVA, or to be more precise a modified version of it. Recall that the bounded isotonic estimator in the one-curve problem is given by

a^∗_i = max

s≤i min

t≥i M({s, . . . , t})

whereM(A) =Av(A)∨max_Aa⁰, A⊆∈ {1, ..., n}. Thata^∗can be computed using a PAVA is a consequence of a more general result: This computational fact is true provided that a functionalM of setsA ⊆ {1, ..., n}satisfies what is referred to as the Averaging Property , (see Chakravarti, 1989, p. 138), also called Cauchy Mean Value Property by Leurgans (1981, Section 1). See also Robertson et al. (1988, p. 390). Note that in the classical unconstrained monotone regression problem, the min-max expression of the Least Squares estimator follows from Theorem 2.8 in Barlow et al. (1972, p. 80).

3.1 Getting the min-max solution by the PAVA

First, let us describe how the PAVA works for some set functionalM.

• At every step the current configuration is given by a subdivision of {1, ...,} into k subsets S1 = {1, . . . , i1}, S2 = {i1 + 1, . . . , i2}, . . . , S_k = {i_k−1 + 1, . . . , n} for some indices1 =i₀≤i₁ < i₂<· · ·< i_k−1 < i_k =n.

• The initial configuration is given by the finest subdivision; i.e.,I_j ={j}.

• At every step we look at the values ofM on the sets of the subdivision. A violation is noted each time there exists a valuejsuch thatM(S_j) > M(S_j+1). We consider the first violation (the one corresponding to the smallestj) and then merge the subsetsS_j andSj+1into one interval.

(10)

• Given a new subdivision (which has one subset less than the previous one), we look for possible violations.

• The algorithm stops when there are no violations left.

Since for any violation a merging is performed (thus reducing the number of subsets), it is clear that the algorithm stops after a finite number of iterations.

We require now the set functional M to satisfy the following property. See Leurgans (1981, Section 1), Robertson et al. (1988, p. 390) and Chakravarti (1989, p. 138).

Definition 3.1. We say that the functionalM satisfies the Averaging Property if for any sets AandBsuch thatA∩B =∅we have that

min{M(A), M(B)} ≤M(A∪B)≤max{M(A), M(B)}.

Ifhandw >0are given vectors∈Rⁿ, then beside

A7→Av(A) = X

i∈A

wihi/X

i∈A

wi,

the following examples of functions also satisfy the Averaging Property :

A 7→

Av(A)∨max

A h¹_i

∧min

A h⁰, withh⁰, h¹ two vectors∈Rⁿ,

A 7→ min

A h= min

i∈Ahi, A 7→ med_Ah= arg min

m∈R

X

i∈A

|h_i−m|w_i

where thearg minis taken to be the smallestmin case non-uniqueness occurs,

A 7→ max

A h= max

i∈A h_i.

Note that the maximum, the minimum and the sum of two functionals satisfying the Av- eraging Property satisfy the same property as well.

Theorem 3.2. The final configuration obtained by the PAVA is such that the two following properties are satisfied.

1. The functionalM is increasing on the sets of the subdivision.

2. If one of the sets S_j = C ∪ D is the disjoint union of two subsets C = {i_j−1 + 1, . . . , k} and D = {k+ 1, . . . , ij}, then M(C) > M(D); i.e., a finer subdivision would necessarily cause a violation.

Proof. The fact thatM is increasing on the final configuration is an easy consequence of the absence of violations (otherwise the algorithm would not have stopped).

(11)

As for the second part of the property, note that this is satisfied by the initial configuration (since no set is the disjoint union of two non-trivial subsets), as well as by any configuration that one could obtain after the first merging (since a merging occurs only because of a violation). Now we will use an inductive reasoning.

To this end, we have to check two situations: Suppose we merge two subsequent sets A andB and want to check whether there is a violation onCandD, withA∪B =C∪D. We are in one of the two following cases: eitherA =A₁ ∪A₂,C =A₁ andD =A₂∪B, or B =B1∪B2,C =A∪B1andD=B2(the caseC=AandD=Bis trivial).

In the first case, if we supposeM(D)≥M(C), we get

M(A₂∪B)≥M(A₁), M(A₂)< M(A₁), M(B)< M(A) =M(A₁∪A₂), (the first inequality follows by assumption, the second by induction, and the third is true since AandBhave been merged) and this is impossible since one would conclude that

max{M(A₂), M(B)} ≥M(A₁)> M(A₂),

and henceM(A)> M(B)≥M(A₁)> M(A₂), which impliesM(A)>max{M(A₁), M(A₂)}, which contradicts the Averaging Property .

In the second case we would have

M(A∪B₁)≤M(B₂), M(B₂)< M(B₁), M(A)> M(B) =M(B₁∪B₂), which implies

min{M(A), M(B₁)} ≤M(B₂)< M(B₁),

and thenmin{M(A), M(B1)}=M(A)andM(A)≤M(B2)< M(B1), which contradicts

eitherM(A)< M(B)or the Averaging Property . 2

Theorem 3.3. If(S_j)_j is the partition obtained at the end of the PAVA described above, then mi =M(Sji)such thati∈Sji takes the same values given by the min-max formula for the indexi.

Proof. See Appendix A.

3.2 Preparing for a projected subgradient algorithm

The following proposition is crucial for computing the ordered isotonic estimators via a projected subgradient algorithm.

Proposition 3.4. LetΨbe the criterion Ψ(b₁, . . . , b_n−1) =

n

X

i=1

max

s≤i (G_s,i∧b_s)−y_i2

w_i¹+

n−1

X

i=1

(b_i−z_i)²w_i² (14)

(12)

which is to be minimized on the convex set

In−1^b^∗ⁿ ={(b₁, . . . , b_n−1)∈Rⁿ⁻¹ : b₁≤b₂≤. . .≤b_n−1 ≤b^∗_n} where

G_s,i= min

t≥i Av₁({s, . . . , t}) and b_n=b^∗_n in G_n,n∧b_n, (14).

The criterion Ψ is convex. Furthermore, its unique minimizer (b^∗∗₁ , . . . , b^∗∗_n−1) equals (b^∗₁, . . . , b^∗_n−1).

Proof. Let us write

I =In^∞={a= (a1, . . . , an)∈Rⁿ:a1≤. . .≤an},

In^∗ =n

b= (b₁, . . . , b_n) : (b₁, . . . , b_n−1)∈ In−1^b^∗ⁿ andb_n=b^∗_no and consider

In^b ={a:a∈ I anda≤b} forb∈ In^∗.

Now note that the min-max formula in (9) allows us to write

n

X

j=1

max

s≤j (G_s,j∧b_s)−y_j2

w¹_j +

n−1

X

j=1

(b_j−z_j)²w²_j

= min

a∈I_n^b n

X

j=1

(aj−yj)²w¹_j +

n−1

X

j=1

(bj−zj)²w²_j.

Hence, we have forb∈ In^∗

Ψ(b₁, ..., b_n−1) = min

a∈I_n^b n

X

j=1

(a_j−y_j)²w¹_j +

n−1

X

j=1

(b_j−z_j)²w²_j

=

n

X

j=1

(˜a_j(b)−y_j)²w_j¹+

n−1

X

j=1

(b_j−z_j)²w²_j

where˜aj(b) = maxs≤j(Gs,j ∧bs)is the j-th component of the minimizer of the function P_n

j=1(a_j−y_j)²w¹_j inIn^b. Letλ∈[0,1], andbandb^′inIn^∗. By definition ofIn^b andIn^b^′, we have that

λa(b) + (1˜ −λ) ˜a(b^′)≤λ b+ (1−λ)b^′

(13)

and hence

n

X

j=1

˜

aj(λ b+ (1−λ)b^′)−yj

2

w_j¹

≤

n

X

j=1

λ˜a(b) + (1−λ) ˜a(b^′)−y_j2

w_j¹

≤λ

n

X

j=1

a˜_j(b)−y_j2

w¹_j + (1−λ)

n

X

j=1

a˜_j(b^′)−y_j2

w¹_j.

This shows convexity of the first term ofΨ. Convexity ofΨnow follows from convexity of the functionP_n−1

j=1(b_j−z_j)²w²_j and the fact that the sum of two convex functions defined

on the same domain is also convex. 2

The idea behind considering the convex functional Ψis to reduce the dimensionality of the problem as well as the number of constraints (from3n−2ton−1constraints). OnceΨ is minimized; i.e, the isotonic estimateb^∗is computed,a^∗can be obtained using the min-max formula given in (9). However, the convex functionalΨis not continuously differentiable, hence the need for an optimization algorithm that uses the subgradient instead of the gradient as the latter is not defined everywhere.

3.3 A projected subgradient algorithm to computeb^∗₁, . . . , b^∗_n−1

To minimize the non-smooth convex functionΨwe use a projected subgradient algorithm.

Since the gradient does not exist on the entire domain of the function, one has to resort to computation of a subgradient, the analogue of the gradient at points where the latter does not exist. As opposed to classical methods developed for minimizing smooth functions, the procedure of searching for the direction of descent and steplengths is entirely different. The classical reference for subgradient algorithms is Shor (1985). Boyd et al. (2003) provides a nice summary of the topic, including the projected variant. Note that a recent application in statistics of the subgradient algorithms gives now the possibility to compute the log-concave density estimator in high dimensions; see Cule et al. (2008).

The main steps of the algorithm. Now recall that the functional Ψshould be minimized over the(n−1)−dimensional convex setIn−1^b^∗ⁿ given in Proposition 3.4. Of course, this is the same as minimizingΨover then−dimensional convex set{(b₁, . . . , b_n)|b₁ ≤. . .≤b_n−1}, starting with an initial vector(b⁽⁰⁾₁ , . . . , b⁽⁰⁾_n )such thatb⁽⁰⁾_n =b^∗_n and constraining then−th component of the sub-gradient ofΨto be equal to 0.

Given a steplength τ_k, the new iterate b^k+1 = (b^k₁, . . . , b^k_n) at the k−th iteration of a subgradient algorithm is given by

v_k+1 = b_k−τ_kD_k,

(14)

where D_k is the subgradient calculated at the previous iterate; i.e., D_k = ˜∇Ψ(v_k) (see Appendix B). However, it may happen that v_k+1 is not admissible; i.e. (b^k+1₁ , . . . , b^k+1_n−1) does not belong toI_n−1^b^∗ⁿ . When this occurs, an L2 projection of this iterate onto I_n−1^b^∗ⁿ is performed. This is equivalent to finding the minimizer of

n

X

i=1

(a_i−b^k+1_i )²

over the setIn^b^∗ⁿ. The latter problem can be solved using the generalized PAVA for bounded isotonic regression as described above.

The computation of the subgradient D_k is described in detail in Appendix B. As for the steplengthτ_k, we start the algorithm with a constant steplength. Once a pre-specified number of iterations has been reached we switch to

τ_k+1 = (h^0.1_k kD_kk2)⁻¹

whereγ_k := h^−0.1_k is such that0 ≤ γ_k → 0 ask → ∞and P∞

k=1γ_k = ∞. Here,k · k2

denotes theL2-norm of a vector in Rⁿ. This combination of constant and non-summable diminishing steplength showed a good performance in our implementation of the algorithm over other classical choices of(γ_k)_k. Furthermore, convergence is ensured by the following theorem.

Theorem 3.5. (Boyd et al. (2003)) A subgradient algorithm complemented with least-square projection and using non-summable diminishing steplength yields for anyη > 0afterk = k(η)iterations a vectorb^k := (b^k₁, . . . , b^k_n)such that

i=1,...,kmin Ψ(bⁱ)−Ψ(b^∗) ≤ η, whereb^∗ = (b^∗₁, . . . , b^∗_n)is the vector given in Proposition 3.4.

The proof can be found in Boyd et al. (2003) by combining their arguments in Sections 2 and 3. Note that in our implementation we do not keep track of the iterate that yielded the minimal value ofΨ, since we apply a problem-motivated stopping criterion that guarantees us to have reached an iterate that is sufficiently close tob^∗= (b^∗₁, . . . , b^∗_n).

Choice of stopping rule. Since in subgradient algorithms the convex target functional does not necessarily monotonically decrease with increasing number of iterations, the choice of a suitable stopping criterion is delicate. However, in our specific setting we use the fact that (a^∗, b^∗)is a fixed point of the operatorP defined in (11) wherea^∗ =P1(b^∗); the solution of (1) with upper boundb^∗. This motivates iterating the algorithm until the difference of entries of the two vectorsb^kandb^k_#where

b^k_#=P2◦P1(b^k) is below a pre-specified positive constantδ.

(15)

The implementation. The projected subgradient algorithm for the two curve problem as well as the generalized PAVA computing the solution for one curve under the constraints (3) were implemented inR(R Development Core Team, 2008). The corresponding package OrdMonRegBalabdaoui et al. (2009) is available on CRAN. Note that the data analyzed in Section 3.4 is made available as a dataset inOrdMonReg.

To conclude this section on the algorithmic aspects of our work, we would like to men- tion the work by Beran and D ¨umbgen (2009) who propose an active set algorithm which can be tailored to solve the problem given in (5) for an arbitrary number of ordered monotone curves. However, Beran and D ¨umbgen (2009) do not provide an analysis of the structure of the estimated curves such as characterizations and rather put their emphasis on the algorithmic developments of the problem.

3.4 Real data example from mechanical engineering

We would like to estimate the stress-strain curves based on the available experimental data for two different velocity levels (see Figure 1). The expected curves have to be isotonic and ordered. The data consist of 1495 pairs(x_i, y_i)and(y_i, z_i). The values of the measured strain of the material (on thex-axis), are actually defined as (−)the logarithm of the ratio of the current over the initial specimen length. The values are positive and take the maximal value 1, which corresponds to a maximum shortening of 63%.

Furthermore, since the stress measurements for different velocities are not performed ex- actly at the same strain, the values of the stress have been interpolated at equally spaced values of the strain. As pointed out by a referee, this will induce correlation between the strain data. Even if the strain measurement were not interpolated, having correlated stress measurements is rather inevitable in this particular application because of the data process- ing procedures associated with the measurement technique (see Shim and Mohr, 2009). The estimation method is however still applicable. When studying statistical properties of the isotonic estimators such as consistency and convergence, the correlation between the data should be of course taken into account.

In such problems, practitioners usually fit parametric models using a trial and error approach in an attempt to capture monotonicity of the stress-strain curves as well as their ordering. The methods used are rather arbitrary and can also be time consuming, hence the need for an alternative estimation approach. Our main goal is to provide those practitioners with a rigorous way for estimating the ordered stress-strain curves.

In Figure 2 (upper plot) we provide the original data (black and gray dots) and the pro- posed ordered isotonic estimates a^∗ and b^∗ as described above. Being step functions, the estimated isotonic curves are non-smooth, a well known drawback of isotonic regression, see among others Wright (1978) and Mukerjee (1988). The latter author pioneered the combination of isotonization followed by kernel smoothing. A thorough asymptotic analysis of

(16)

the smoothed isotonized and the isotonic smooth estimators was given by Mammen (1991).

Mukerjee (1988, p. 743) shows that monotonicity of the regression function is preserved by the smoothing operation if the used kernel is log-concave. Thus, we define our smoothed ordered monotone estimators by

˜

a^∗_h(x) = Pn

i=1K_h(x−t)a^∗_i P_n

i=1K_h(x−x_i) ˜b^∗_h(x) = Pn

i=1K_h(x−t)b^∗_i P_n

i=1K_h(x−x_i)

for0≤ x ≤1. For simplicity, we used the kernelK_h(x) =φ(x/h) whereφis the density function of a standard normal distribution which is clearly log-concave. Figure 2 (lower plot) depicts the smoothed isotonic estimates. We set the bandwidth toh= 0.1n^−1/5 ≈0.023.

Motivated by estimation of stress-strain curves, an application from mechanical engineering, we consider in this paper weighted Least Squares estimators in the problem of estimating two ordered isotonic regression curves. We provide characterizations of the solution and describe a projected subgradient algorithm which can be used to compute this solution. As a by-product, we show how an adaptation of the well-known PAVA can be used to compute min-max estimators for any set functional satisfying the Averaging Property.

Acknowledgements. The first author would like to thank C´ecile Durot for some interest- ing discussions around the subject. We also thank JongMin Shim for having made the data available to us.

A Proofs

Proof of Theorem 2.2. Suppose that(a^∗, b^∗)is the solution. Forǫ∈(0,1), and(a, b) ∈ In

consider the pair(a^ǫ, b^ǫ)∈Rⁿ×Rⁿdefined as

a^ǫ = a^∗+ǫ(a−a^∗) b^ǫ = b^∗+ǫ(b−b^∗).

Fori≤j∈ {1, ..., n}, we have

a^ǫ_j−a^ǫ_i = (1−ǫ)(a^∗_j −a^∗_i) +ǫ(a_j−a_i)≥0 b^ǫ_j −b^ǫ_i = (1−ǫ)(b^∗_j −b^∗_i) +ǫ(b_j−b_i)≥0.

Also, fori∈ {1, ..., n}we have

a^ǫ_i −b^ǫ_i = (1−ǫ)(a^∗_i −b^∗_i) +ǫ(a_i−b_i)≤0.

(17)

0.0 0.2 0.4 0.6 0.8 1.0 0

5 10 15 20 25

measured strain, x

stress

upper isotonic estimate b*

lower isotonic estimate a*

0.0 0.2 0.4 0.6 0.8 1.0

0 5 10 15 20 25

measured strain, x

stress

upper isotonic smoothed estimate b ~

* lower isotonic smoothed estimate a ~*

Figure 2: Original observations, isotonic and isotonic smoothed estimates.

(18)

Hence,(a^ǫ, b^ǫ)∈ In, and 0 ≤ lim

ǫց0

1

ǫ(L₂(a^ǫ, b^ǫ)−L₂(a^∗, b^∗))

=

n

X

i=1

(a^∗_i −y_i)(a_i−a^∗_i)w¹_i +

n

X

i=1

(b^∗_i −z_i)(b_i−b^∗_i)w²_i

yielding the inequality in (6).

Now consider the vectorsa^ǫandb^ǫsuch that forl= 1, ..., n a^ǫ_l = a^∗_l +ǫ a^∗_l 1_l∈B¹

ij

b^ǫ_l = b^∗_l

Letr≤s∈ {1, ..., n}. Ifr /∈B_i¹_jands /∈B_i¹_j, thena^ǫ_s−a^ǫ_r =a^∗_s−a^∗_r ≥0. Ifr∈B_i¹_j and s /∈B_i¹

j, thena^∗_s > a^∗_r and a^ǫ_s−a^ǫ_r =a^∗_s−a^∗_r+ǫa^∗_s >0for|ǫ|small enough. The same reasoning applies ifr /∈B_i¹_jands∈B_i¹_j. Finally, ifr, s∈B_i¹_j, thena^ǫ_s−a^ǫ_r= 0.

Now, forr ∈ {1, ..., n}, we havea^ǫ_r =a^∗_r≤b^∗_rifr /∈B_i¹_j. Otherwise,a^ǫ_r =a^∗_r(1+ǫ)< b^∗_r if|ǫ|is small enough. Hence,(a^ǫ, b^ǫ)∈ In, and

0 = lim

ǫց0

1

ǫ(L₂(a^ǫ, b^ǫ)−L₂(a^∗, b^∗))

=

n

X

r=1

(a^∗_r−y_r)1_r∈B¹

ija^∗_rw_r¹. Summing up over all the setsB_i¹

j yields the identity in (7). We can prove very similarly the identity in (8).

Conversely, suppose that(a^∗, b^∗)∈ Insatisfies the inequality in (6). For any(a, b)∈ In, we have

L₂(a, b)−L₂(a^∗, b^∗) = 1 2

n

X

i=1

(a_i−a^∗_i)²w¹_i +1 2

n

X

i=1

(b_i−b^∗_i)²w_i²

+

n

X

i=1

(a^∗_i −y_i)(a_i−a^∗_i)w_i¹

+

n

X

i=1

(b^∗_i −z_i)(b_i−b^∗_i)w²_i

≥ 0.

We conclude that(a^∗, b^∗)is the solution of the minimization problem. 2 Proof of Proposition 2.3. Letǫ >0and consider(a, b)∈Rⁿ×Rⁿsuch that

a_i = a^∗_i −ǫ1_i∈{1,..,t}, t∈ {1, ...n} b_i = b^∗_i

(19)

for i = 1, ..., n. For small ǫ, (a, b) ∈ In. Using the characterization in Theorem 2.2, it follows that

t

X

j=1

(a^∗_j −y_j)w¹_j ≤0

implying that

t

X

j=1

(a^∗₁−y_j)w¹_j ≤0, for t∈ {1, ...n} or equivalently

a^∗₁≤min

t≥1 Av₁({1, . . . , t}).

Now, consider(a, b)∈Rⁿ×Rⁿsuch that

a_j = a^∗_j−ǫ1j∈{1,...,t}, t∈ {1, ..., n} b_j = b^∗_j −ǫ1_j∈{1,...,t′}, 1≤t^′≤t

forj= 1, ..., n, withǫ >0. For smallǫ, we have that(a, b)∈ I2, and hence

t

X

j=1

(a^∗_j −y_j)w¹_j +

t

X

j=1

(b^∗_j −z_j)w_j² ≤0.

It follows that

t

X

j=1

(a^∗₁−y_j)w¹_j +

t^′

X

j=1

(a^∗₁−z_j)w_j²≥0,

that is

a^∗₁≤ min

1≤t^′≤t≤n

M˜({1, . . . , t},{1, . . . , t^′}).

We conclude that a^∗₁≤min

t≥1 Av1({1, . . . , t})∧ min

t≥t^′≥1

M(˜ {1, . . . , t},{1, . . . , t^′}).

Now ifa^∗₁ < b^∗₁, leti₁{1, ..., n}be such thata^∗₁ =. . .=a^∗_i₁. Then(a, b)is such that a_j = a^∗_j+ǫ1_j∈{1,...,i₁_}

b_j = b^∗_j

forj= 1, ..., nis inInwhen|ǫ|is small enough. It follows that Av₁({1, . . . , i₁}) = a^∗₁.

(20)

Ifa^∗₁ =b^∗₁, andi^′₁ andi^′′₁ are such thata^∗₁ =.... =a^∗_i′

1 andb^∗₁ =.... =b^∗_i′′

1, then(a, b)such that

a_j = a^∗_j+ǫ1_j∈{1,...,i′ 1}

b_j = b^∗_j +ǫ1_j∈{1,...,i′′

1}

forj= 1, ..., nis inInfor|ǫ|small enough. Hence,

a^∗₁ = ˜M({1, . . . , i^′₁},{1, . . . , i^′′₁}).

(note thati^′′₁ ≤i^′₁). Therefore, a^∗₁= min

t≥1 Av₁({1, . . . , t})∧ max

t≥t^′≥1

M˜({1, . . . , t},{1, . . . , t^′}).

The expression of b^∗₁ follows easily by replacing respectively yi and zi by −zn−i+1 and

−y_n−i+1fori= 1, . . . , n.2

Proof of Theorem 3.3. Considera∈Rⁿgiven by a_i = max

s≤i min

t≥i M({s, . . . , t})

and also the subdivision into subsetsSj ={ij−1+ 1, . . . , ij}obtained by the PAVA. Let us denote byG⁻(resp. G⁺) the grid set of indices which correspond to points at the beginning (resp. end) of those subsets; i.e. of the formi_j+ 1(resp. i_j).

We obviously have

a_i ≤max

s≤i min

t≥i, t∈G⁺M({s, . . . , t}).

Then, considers /∈G⁻. This means that we have a set{s, . . . , t}of the formB∪C,Cbeing a union of subsets in the subdivision andBa right subset of a set of the partition of the form A∪B. We want to prove thatM({s, . . . , t}) =M(B∪C)is either smaller thanM(C)or M(A∪B∪C). Suppose this is not the case. Then we would have

M(B∪C)> M(C), M(B∪C)> M(A∪B∪C), M(A)> M(B),

where the last inequality is implied by the second property in Theorem 3.2. Yet, the second inequality, together with the Averaging Property , implies thatM(A) < M(B ∪C). In the end we get

M(B∪C)> M(C), M(B∪C)> M(A)> M(B), which contradicts the Averaging Property .

We conclude thatM({s, . . . , t})is smaller than the value ofM at a set which is a union of sets of the subdivision; i.e. either A∪B ∪C orC itself. But on sets of this kind it is obvious, by the Averaging Property , thatM is smaller than the valuem_t, since this is the