HAL Id: hal-00417281
https://hal.archives-ouvertes.fr/hal-00417281
Preprint submitted on 15 Sep 2009
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Least Squares estimation of two ordered monotone regression curves
Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio
To cite this version:
Fadoua Balabdaoui, Kaspar Rufibach, Filippo Santambrogio. Least Squares estimation of two ordered monotone regression curves. 2009. �hal-00417281�
Least Squares estimation of two ordered monotone regression curves
running headline: ordered monotone regression
Fadoua Balabdaoui(1,2), Kaspar Rufibach(3)and Filippo Santambrogio(1)
1CEREMADE Universit´e de Paris-Dauphine Place du Mar´echal de Lattre de Tassigny
75775 Paris CEDEX 16, France
2Universit¨at G¨ottingen Institut f¨ur Mathematische Stochastik
Goldschmidtstrasse 7 37077 G¨ottingen
3Universit¨at Z¨urich
Institut f¨ur Sozial- und Pr¨aventivmedizin Abteilung Biostatistik
Hirschengraben 84 8001 Z¨urich
fadoua@ceremade.dauphine.fr
kaspar.rufibach@ifspm.uzh.ch (corresponding author) filippo@ceremade.dauphine.fr
Abstract
In this paper, we consider the problem of finding the Least Squares estimators of two isotonic regression curvesg◦1 andg2◦ under the additional constraint that they are ordered; e.g., g◦1 ≤ g◦2. Given two sets ofndata pointsy1, .., ynandz1, . . . , znobserved at (the same) design points, the esti- mates of the true curves are obtained by minimizing the weighted Least Squares criterionL2(a, b) = Pn
j=1(yj−aj)2w1j +Pn
j=1(zj−bj)2w2j over the class of pairs of vectors(a, b)∈Rn×Rnsuch thata1 ≤ a2 ≤... ≤ an,b1 ≤ b2 ≤ ... ≤bn, andai ≤ bi, i = 1, ..., n. The characterization of the estimators is established. To compute these estimators, we use an iterative projected subgradient algorithm, where the projection is performed with a “generalized” pool-adjacent-violaters algorithm (PAVA), a byproduct of this work. Then, we apply the estimation method to real data from mechanical engineering.
Keywords: least squares, monotone regression, pool-adjacent-violaters algorithm, shape con- straint estimation, subgradient algorithm
1 Introduction and motivation
Estimating a monotone regression curve is one of the most classical estimation problems under shape restrictions, see e.g. Brunk (1958). A regression curve is said to be isotonic if it
is monotone nondecreasing. We chose in this paper to look at the class of isotonic regression functions. The simple transformationg → −gsuffices for the results of this paper to carry over to the antitonic class.
Givennfixed pointsx1, . . . , xn, assume that we observeyiatxifori= 1, . . . , n. When the points(xi, yi)are joined, the shape of the obtained graph can hint at the increasing mono- tonicity of the true regression curve,g◦ say, assuming the modelyi = g◦(xi) +εi, withεi the unobserved errors. This shape restriction can also be a feature of the scientific problem at hand, and hence the need for estimating the true curve in the class of antitonic functions.
We refer to Barlow et al. (1972) and Robertson et al. (1988) for examples. The weighted Least Squares estimate ofg◦ in the class of isotonic functions takingyi atxi is the unique minimizer of the criterion
L(a) =
n
X
i=1
wi(yi−ai)2 (1)
over the class of vectorsa∈Rnsuch thata1 ≤a2....≤anwherew1 >0, w2 >0, . . . , wn>
0are given positive weights. In what follows, we will say that a vector v∈Rnis increasing or isotonic ifv1 ≤. . .≤vn, and use the notationv≤wforv, w ∈Rnif the inequality holds componentwise.
It is well known that the solutiona∗ of the Least Squares problem in (1) is given by the so-called min-max formula; i.e.,
a∗i = max
s≤i min
t≥i Av({s, . . . , t}) (2)
whereAv({s, . . . , t}) =Pt
i=syiwi/Pt
i=swi(see e.g. Barlow et al., 1972).
van Eeden (1957a,b) has generalized this problem to incorporate known bounds on the regression function to estimate; i.e., she considered minimization ofLunder the constraint
aL≤a≤aU, (3)
for two increasing vectorsaLandaU. As in the classical setting, the solution of this problem admits also a min-max representation. The PAVA can be generalized to efficiently compute this solution and has been implemented in the R package OrdMonReg(Balabdaoui et al., 2009). Computation relies on a suitable functional M defined on the setsA ⊆ {1, ..., n} which generalizes the functionAvin (2). This functional for the bounded monotone regres- sion in (3) is given by
M(A) =
Av(A)∨max
A aL
∧min
A aU
whereminAv= mini∈AviandmaxAv= maxi∈Avi. Compare Barlow et al. (1972, p. 57), where a functional notation is used. However, in the latter reference no formal justification was given for the form of the functional M nor for the validity of (the modified version of) the PAVA, see discussion after Theorem 2.1.
Chakravarti (1989) discusses the bounded isotonic regression problem for the absolute value criterion function, yielding the bounded isotonic median regressor. Chakravarti (1989) proposes a PAVA-like algorithm as well, and establishes some connections to linear program- ming theory. Unbounded isotonic median regression was first considered by Robertson and Waltman (1968), who provided a min-max formula for the estimator and a PAVA-like algorithm to compute it. They also studied its consistency.
Now suppose that instead of having only one set of observations y1, . . . , ynat the design pointsx1, . . . , xn, we are interested in analyzing two sets of data y1, . . . , ynand z1, . . . , zn
observed at the same design points. Furthermore, if we have the information that the un- derlying true regression curves are increasing and ordered, it is natural to try to construct estimators that fulfill the same constraints.
The current paper presents a solution to this problem of estimating two isotonic regres- sion curves under the additional constraint that they are ordered. This solution is the unique minimizer(a∗, b∗)of the criterion
L2(a, b) =
n
X
i=1
w1i(yi−ai)2+
n
X
i=1
w2i(zi−bi)2 (4) over the class of pairs of vectors(a, b)∈Rn×Rnsuch thataandbare increasing anda≤b, withw1andw2 given vectors of positive weights inRn.
The problem was motivated by an application from mechanical engineering. We will make use of experimental data obtained from dynamic material tests (see Shim and Mohr, 2009) to illustrate our estimation method. In engineering mechanics, it is common practice to determine the deformation resistance and strength of materials from uniaxial compression tests at different loading velocities. The experimental results are the so-called stress-strain curves (see Figure 1), and these may be used to determine the deformation resistance as a function of the applied deformation. The recorded signals contain substantial noise which is mostly due to variations in the loading velocity and electrical noise in the data acquisition system.
The data in this example consist of 1495 distinct pairs(xi, yi)and(xi, zi)wherexiis the measured strain, whileyi (gray curve) andzi (black curve) correspond to the experimental stress results for two different loading velocities. The true regression curves are expected to be (a) monotone increasing as the stress is known to be an increasing function of the strain (for a given constant loading velocity), and (b) ordered as the deformation resistance typically increases as the loading velocity increases. In Section 3, we show the resulting estimates as well as a smoothed version thereof.
We will show that minimizingL2 is equivalent to minimizing another convex functional over the class of isotonic vectorsa∈Rn. By doing so, we reduce a two-curve problem under the constraints of monotonicity and ordering to a one-curve problem under the constraint of monotonicity and boundedness. Actually, we can even perform the minimization over
0.0 0.2 0.4 0.6 0.8 1.0 0
5 10 15 20 25
measured strain, x
stress
Figure 1: Original observations.
the class of isotonic vectors (a1, . . . , an−1) of dimension n−1 satisfying the constraint a1 ≤. . .≤an−1≤a∗nas we can explicitly determinea∗nby a generalized min-max formula (see Proposition 2.3). The solution of this equivalent minimization problem, which gives the solution a∗ (and also b∗ because it is a function ofa∗), is computed using a projected subgradient algorithm where the projection step is performed using a suitable generalization of the PAVA.
We would like to note that Brunk et al. (1966) considered a related problem, that of non- parametric Maximum likelihood estimation of two ordered cumulative distribution functions.
In the same class of problems, Dykstra (1982) treated estimation of survival functions of two stochastically ordered random variables in the presence of censoring, which was extended by Feltz and Dykstra (1985) toN ≥ 2 stochastically ordered random variables. The theoreti- cal solution can be related to the well-known Kaplan-Meier estimator and can be computed using an iterative algorithmic procedure forN ≥ 3(see Feltz and Dykstra, 1985, p. 1016).
The√
n−asymptotics of the estimators forN = 2, whether there is censoring or not, were established by Præstgaard and Huang (1996).
The paper is organized as follows. In Section 2, we give the characterization of the ordered isotonic estimates. We also provide the explicit form of the solution of the related bounded
isotonic regression problem where the upper of the two isotonic curves is assumed to be fully known.
In Section 3 we describe the projected subgradient algorithm that we use to compute the Least Squares estimators of the ordered isotonic regression curves, and apply the method to real data from mechanical engineering. The technical proofs are deferred to appendices A and B.
2 Estimation of two ordered isotonic regression curves
If the larger of the two isotonic curves was known, then there would of course be no need to estimate it. If we putaU =a0, the weighted Least Squares estimatea∗of the smaller isotonic curve is the minimizer of
L(a) =
n
X
i=1
wi(yi−ai)2,
wherew∈Rnis a vector of given positive weights, anda∈ Ina0, the class of isotonic vectors a∈Rnsuch thata≤a0anda0 ∈Rn. When the components ofa0are all equal, the vector a0will be assimilated with the common value of its components as done in Proposition 3.4 below.
The notationInwwill be used again hereafter to denote the class of isotonic vectorsv∈Rn such thatv≤w.
The statement of Barlow et al. (1972, p. 57) implies that if we define M(A) =Av(A)∧min
A a0
for a subset A ⊆ {1, ..., n}, then the solution a∗ can be computed using an appropriately modified version of the PAVA.
Theorem 2.1. Fori= 1, . . . , n, we have a∗i = max
s≤i min
t≥i M({s, . . . , t}) = max
s≤i min
t≥i
Av({s, . . . , t})∧a0s .
To keep this paper at a reasonable length, the proof of Theorem 2.1 is omitted. A short note containing a more thorough discussion of the one-curve problem and a proof of Theo- rem 2.1 can be obtained from the authors upon request. A general description of the modified PAVA and a proof that it works whenever the functionalM satisfies the so-called Averaging Property can be found in Section 3.
We now return to the main subject of this paper. Theorem 2.1 is crucial for finding the Least Squares estimates of two ordered isotonic regression curves. In particular, the result will be used to develop an appropriate algorithm to compute the solution.
Lety1, ...ynandz1, ..., znbe the observed data from two unknown isotonic curvesg◦1and g◦2such thatg◦1 ≤g◦2. Given two vectors inRnof positive weightsw1andw2, we would like to minimize
L2(a, b) =
n
X
i=1
(yi−ai)2w1i +
n
X
i=1
(zi−bi)2w2i (5) over the class of pairs of vectors(a, b)∈Rn×Rnsuch thataandbare isotonic anda≤b.
Call this classIn.
Existence and uniqueness of the solution. They follow from convexity and closedness of Inand strict convexity ofL2.
Characterization of the solution. For completeness, we give the characterization of the solution of minimizing (5) overIn; i.e, a necessary and sufficient condition for(a, b) ∈ In
to be equal to this solution. Leti1< ... < iksuch thati1 = 1, ik=nand a∗1 =...=a∗i1 < a∗i1+1=...=a∗i2−1 < ... < a∗i
k =...=a∗n. We callBi0
j(resp. B1i
j) a set of indices{ij, ..., ij+1−1}, j = 1, ..., k−1such thata∗i
j =b∗i
j
(resp. a∗ij < b∗ij). Similarly, letl1 < ... < lrsuch thatl1 = 1, lr=nsuch that b∗1 =...=b∗l1 < b∗l1+1 =...=b∗l2−1 < ... < b∗lk =...=b∗n and callCl0
j (resp. Cl1
j) a set of indices{lj, ..., lj+1−1}, j= 1, ..., r−1such thatb∗l
j =a∗l
j
(resp. b∗l
j > a∗l
j).
Theorem 2.2. The pair(a∗, b∗)∈ Inis the minimizer of (5) if and only if
n
X
i=1
(a∗i −yi)(a∗i −ai)w1i +
n
X
i=1
(b∗i −zi)(b∗i −bi)w2i ≥ 0, ∀(a, b)∈ In (6) X
s∈∪jB1ij
(a∗s−ys)a∗sw1s = 0, and (7) X
s∈∪jClj1
(b∗s−zs)b∗sw2s = 0. (8)
Proof. See Appendix A.
An explicit formula in the sense of a min-max representation similar to (2) of (a∗, b∗) turned out be to hard to find. However, sincea∗(resp. b∗) is also the minimizer of
n
X
i=1
(a−yi)2w1i resp.
n
X
i=1
(b−zi)2wi2
over the classInb∗(resp. the class of isotonic vectorsb∈Rnsuch thatb≥a∗), Theorem 2.1 implies that
a∗i = max
s≤i min
t≥i (Av1({s, . . . , t})∧b∗s) (9) b∗i = max
s≤i min
t≥i (Av2({s, . . . , t})∨a∗t) (10) fori= 1, . . . , n, where
Av1(A) = P
i∈Ayiwi1 P
i∈Aw1i , andAv2(A) = P
i∈Aziw2i P
i∈Awi2 forA⊆ {1, ..., n}.
Thus, the solution(a∗, b∗)is a fixed point of the operatorP :In→ Indefined as
P((a, b)) = (P1(b),P2(a)) (11)
=
maxs≤i min
t≥i (Av1({s, . . . , t})∧bs),max
s≤i min
t≥i (Av2({s, . . . , t})∨at)
. However, this fixed point problem does not admit a unique solution. Therefore, there is no guarantee that an algorithm based on the above min-max formulas yields the solution, except in the unrealistic and uninteresting case where the starting point of the algorithm is the solution itself. To see thatP does not admit a unique fixed point, note that the minimizer of the criterion
n
X
i=1
(ai−yi)2w1i +B
n
X
i=1
(bi−zi)2wi2
is a fixed point ofP for any B > 0. Therefore, a computational method based on starting from an initial candidate and then alternating between (9) and (10) cannot be successful. In parallel, we have invested a substantial effort in trying to get a closed form for the estimators.
Although we did not succeed, we were able to obtain a closed form fora∗1(and by symmetry forb∗n).
Proposition 2.3. We have that a∗1 = min
t≥1 Av1({1, . . . , t})∧ min
t≥t′≥1
M˜({1, . . . , t},{1, . . . , t′}) (12) where
M˜(A, B) = Av1(A)(P
i∈Awi1) +Av2(B)(P
j∈Bw2j) P
i∈Aw1i +P
j∈Bw2j .
By symmetry, we also have that b∗n = max
t≤n Av2({t, . . . , n})∨ max
t≤t′≤n
M˜({t′, . . . , n},{t, . . . , n}). (13)
Some remarks are in order. On the one hand, the expressions obtained above indicate that the Least Squares estimator must depend, as expected, on the relative ratio of the weights w1 and w2. In particular, if w2 = 0 (resp. w1 = 0), the expression of a∗1 (resp. b∗n) specializes to the well-known min-max formula in the classical Least Squares estimation of an (unbounded) isotonic curve. On the other hand, the expression ofb∗n is essential for our subgradient algorithm below.
Proof of Proposition 2.3. See Appendix A.
In the next section, we describe how we can make use of the min-max formula in (9) to compute the estimators using a projected subgradient algorithm. As mentioned above, we use in this algorithm the identity (13) given in the previous proposition.
3 Algorithms and Application to real data
In this section, we show that the bounded isotonic estimator can be computed using the well- known PAVA, or to be more precise a modified version of it. Recall that the bounded isotonic estimator in the one-curve problem is given by
a∗i = max
s≤i min
t≥i M({s, . . . , t})
whereM(A) =Av(A)∨maxAa0, A⊆∈ {1, ..., n}. Thata∗can be computed using a PAVA is a consequence of a more general result: This computational fact is true provided that a functionalM of setsA ⊆ {1, ..., n}satisfies what is referred to as the Averaging Property , (see Chakravarti, 1989, p. 138), also called Cauchy Mean Value Property by Leurgans (1981, Section 1). See also Robertson et al. (1988, p. 390). Note that in the classical unconstrained monotone regression problem, the min-max expression of the Least Squares estimator follows from Theorem 2.8 in Barlow et al. (1972, p. 80).
3.1 Getting the min-max solution by the PAVA
First, let us describe how the PAVA works for some set functionalM.
• At every step the current configuration is given by a subdivision of {1, ...,} into k subsets S1 = {1, . . . , i1}, S2 = {i1 + 1, . . . , i2}, . . . , Sk = {ik−1 + 1, . . . , n} for some indices1 =i0≤i1 < i2<· · ·< ik−1 < ik =n.
• The initial configuration is given by the finest subdivision; i.e.,Ij ={j}.
• At every step we look at the values ofM on the sets of the subdivision. A violation is noted each time there exists a valuejsuch thatM(Sj) > M(Sj+1). We consider the first violation (the one corresponding to the smallestj) and then merge the subsetsSj andSj+1into one interval.
• Given a new subdivision (which has one subset less than the previous one), we look for possible violations.
• The algorithm stops when there are no violations left.
Since for any violation a merging is performed (thus reducing the number of subsets), it is clear that the algorithm stops after a finite number of iterations.
We require now the set functional M to satisfy the following property. See Leurgans (1981, Section 1), Robertson et al. (1988, p. 390) and Chakravarti (1989, p. 138).
Definition 3.1. We say that the functionalM satisfies the Averaging Property if for any sets AandBsuch thatA∩B =∅we have that
min{M(A), M(B)} ≤M(A∪B)≤max{M(A), M(B)}.
Ifhandw >0are given vectors∈Rn, then beside
A7→Av(A) = X
i∈A
wihi/X
i∈A
wi,
the following examples of functions also satisfy the Averaging Property :
A 7→
Av(A)∨max
A h1i
∧min
A h0, withh0, h1 two vectors∈Rn,
A 7→ min
A h= min
i∈Ahi, A 7→ medAh= arg min
m∈R
X
i∈A
|hi−m|wi
where thearg minis taken to be the smallestmin case non-uniqueness occurs,
A 7→ max
A h= max
i∈A hi.
Note that the maximum, the minimum and the sum of two functionals satisfying the Av- eraging Property satisfy the same property as well.
Theorem 3.2. The final configuration obtained by the PAVA is such that the two following properties are satisfied.
1. The functionalM is increasing on the sets of the subdivision.
2. If one of the sets Sj = C ∪ D is the disjoint union of two subsets C = {ij−1 + 1, . . . , k} and D = {k+ 1, . . . , ij}, then M(C) > M(D); i.e., a finer subdivision would necessarily cause a violation.
Proof. The fact thatM is increasing on the final configuration is an easy consequence of the absence of violations (otherwise the algorithm would not have stopped).
As for the second part of the property, note that this is satisfied by the initial configuration (since no set is the disjoint union of two non-trivial subsets), as well as by any configura- tion that one could obtain after the first merging (since a merging occurs only because of a violation). Now we will use an inductive reasoning.
To this end, we have to check two situations: Suppose we merge two subsequent sets A andB and want to check whether there is a violation onCandD, withA∪B =C∪D. We are in one of the two following cases: eitherA =A1 ∪A2,C =A1 andD =A2∪B, or B =B1∪B2,C =A∪B1andD=B2(the caseC=AandD=Bis trivial).
In the first case, if we supposeM(D)≥M(C), we get
M(A2∪B)≥M(A1), M(A2)< M(A1), M(B)< M(A) =M(A1∪A2), (the first inequality follows by assumption, the second by induction, and the third is true since AandBhave been merged) and this is impossible since one would conclude that
max{M(A2), M(B)} ≥M(A1)> M(A2),
and henceM(A)> M(B)≥M(A1)> M(A2), which impliesM(A)>max{M(A1), M(A2)}, which contradicts the Averaging Property .
In the second case we would have
M(A∪B1)≤M(B2), M(B2)< M(B1), M(A)> M(B) =M(B1∪B2), which implies
min{M(A), M(B1)} ≤M(B2)< M(B1),
and thenmin{M(A), M(B1)}=M(A)andM(A)≤M(B2)< M(B1), which contradicts
eitherM(A)< M(B)or the Averaging Property . 2
Theorem 3.3. If(Sj)j is the partition obtained at the end of the PAVA described above, then mi =M(Sji)such thati∈Sji takes the same values given by the min-max formula for the indexi.
Proof. See Appendix A.
3.2 Preparing for a projected subgradient algorithm
The following proposition is crucial for computing the ordered isotonic estimators via a pro- jected subgradient algorithm.
Proposition 3.4. LetΨbe the criterion Ψ(b1, . . . , bn−1) =
n
X
i=1
max
s≤i (Gs,i∧bs)−yi2
wi1+
n−1
X
i=1
(bi−zi)2wi2 (14)
which is to be minimized on the convex set
In−1b∗n ={(b1, . . . , bn−1)∈Rn−1 : b1≤b2≤. . .≤bn−1 ≤b∗n} where
Gs,i= min
t≥i Av1({s, . . . , t}) and bn=b∗n in Gn,n∧bn, (14).
The criterion Ψ is convex. Furthermore, its unique minimizer (b∗∗1 , . . . , b∗∗n−1) equals (b∗1, . . . , b∗n−1).
Proof. Let us write
I =In∞={a= (a1, . . . , an)∈Rn:a1≤. . .≤an},
In∗ =n
b= (b1, . . . , bn) : (b1, . . . , bn−1)∈ In−1b∗n andbn=b∗no and consider
Inb ={a:a∈ I anda≤b} forb∈ In∗.
Now note that the min-max formula in (9) allows us to write
n
X
j=1
max
s≤j (Gs,j∧bs)−yj2
w1j +
n−1
X
j=1
(bj−zj)2w2j
= min
a∈Inb n
X
j=1
(aj−yj)2w1j +
n−1
X
j=1
(bj−zj)2w2j.
Hence, we have forb∈ In∗
Ψ(b1, ..., bn−1) = min
a∈Inb n
X
j=1
(aj−yj)2w1j +
n−1
X
j=1
(bj−zj)2w2j
=
n
X
j=1
(˜aj(b)−yj)2wj1+
n−1
X
j=1
(bj−zj)2w2j
where˜aj(b) = maxs≤j(Gs,j ∧bs)is the j-th component of the minimizer of the function Pn
j=1(aj−yj)2w1j inInb. Letλ∈[0,1], andbandb′inIn∗. By definition ofInb andInb′, we have that
λa(b) + (1˜ −λ) ˜a(b′)≤λ b+ (1−λ)b′
and hence
n
X
j=1
˜
aj(λ b+ (1−λ)b′)−yj
2
wj1
≤
n
X
j=1
λ˜a(b) + (1−λ) ˜a(b′)−yj2
wj1
≤λ
n
X
j=1
a˜j(b)−yj2
w1j + (1−λ)
n
X
j=1
a˜j(b′)−yj2
w1j.
This shows convexity of the first term ofΨ. Convexity ofΨnow follows from convexity of the functionPn−1
j=1(bj−zj)2w2j and the fact that the sum of two convex functions defined
on the same domain is also convex. 2
The idea behind considering the convex functional Ψis to reduce the dimensionality of the problem as well as the number of constraints (from3n−2ton−1constraints). OnceΨ is minimized; i.e, the isotonic estimateb∗is computed,a∗can be obtained using the min-max formula given in (9). However, the convex functionalΨis not continuously differentiable, hence the need for an optimization algorithm that uses the subgradient instead of the gradient as the latter is not defined everywhere.
3.3 A projected subgradient algorithm to computeb∗1, . . . , b∗n−1
To minimize the non-smooth convex functionΨwe use a projected subgradient algorithm.
Since the gradient does not exist on the entire domain of the function, one has to resort to computation of a subgradient, the analogue of the gradient at points where the latter does not exist. As opposed to classical methods developed for minimizing smooth functions, the procedure of searching for the direction of descent and steplengths is entirely different. The classical reference for subgradient algorithms is Shor (1985). Boyd et al. (2003) provides a nice summary of the topic, including the projected variant. Note that a recent application in statistics of the subgradient algorithms gives now the possibility to compute the log-concave density estimator in high dimensions; see Cule et al. (2008).
The main steps of the algorithm. Now recall that the functional Ψshould be minimized over the(n−1)−dimensional convex setIn−1b∗n given in Proposition 3.4. Of course, this is the same as minimizingΨover then−dimensional convex set{(b1, . . . , bn)|b1 ≤. . .≤bn−1}, starting with an initial vector(b(0)1 , . . . , b(0)n )such thatb(0)n =b∗n and constraining then−th component of the sub-gradient ofΨto be equal to 0.
Given a steplength τk, the new iterate bk+1 = (bk1, . . . , bkn) at the k−th iteration of a subgradient algorithm is given by
vk+1 = bk−τkDk,
where Dk is the subgradient calculated at the previous iterate; i.e., Dk = ˜∇Ψ(vk) (see Appendix B). However, it may happen that vk+1 is not admissible; i.e. (bk+11 , . . . , bk+1n−1) does not belong toIn−1b∗n . When this occurs, an L2 projection of this iterate onto In−1b∗n is performed. This is equivalent to finding the minimizer of
n
X
i=1
(ai−bk+1i )2
over the setInb∗n. The latter problem can be solved using the generalized PAVA for bounded isotonic regression as described above.
The computation of the subgradient Dk is described in detail in Appendix B. As for the steplengthτk, we start the algorithm with a constant steplength. Once a pre-specified number of iterations has been reached we switch to
τk+1 = (h0.1k kDkk2)−1
whereγk := h−0.1k is such that0 ≤ γk → 0 ask → ∞and P∞
k=1γk = ∞. Here,k · k2
denotes theL2-norm of a vector in Rn. This combination of constant and non-summable diminishing steplength showed a good performance in our implementation of the algorithm over other classical choices of(γk)k. Furthermore, convergence is ensured by the following theorem.
Theorem 3.5. (Boyd et al. (2003)) A subgradient algorithm complemented with least-square projection and using non-summable diminishing steplength yields for anyη > 0afterk = k(η)iterations a vectorbk := (bk1, . . . , bkn)such that
i=1,...,kmin Ψ(bi)−Ψ(b∗) ≤ η, whereb∗ = (b∗1, . . . , b∗n)is the vector given in Proposition 3.4.
The proof can be found in Boyd et al. (2003) by combining their arguments in Sections 2 and 3. Note that in our implementation we do not keep track of the iterate that yielded the minimal value ofΨ, since we apply a problem-motivated stopping criterion that guarantees us to have reached an iterate that is sufficiently close tob∗= (b∗1, . . . , b∗n).
Choice of stopping rule. Since in subgradient algorithms the convex target functional does not necessarily monotonically decrease with increasing number of iterations, the choice of a suitable stopping criterion is delicate. However, in our specific setting we use the fact that (a∗, b∗)is a fixed point of the operatorP defined in (11) wherea∗ =P1(b∗); the solution of (1) with upper boundb∗. This motivates iterating the algorithm until the difference of entries of the two vectorsbkandbk#where
bk#=P2◦P1(bk) is below a pre-specified positive constantδ.
The implementation. The projected subgradient algorithm for the two curve problem as well as the generalized PAVA computing the solution for one curve under the constraints (3) were implemented inR(R Development Core Team, 2008). The corresponding package OrdMonRegBalabdaoui et al. (2009) is available on CRAN. Note that the data analyzed in Section 3.4 is made available as a dataset inOrdMonReg.
To conclude this section on the algorithmic aspects of our work, we would like to men- tion the work by Beran and D ¨umbgen (2009) who propose an active set algorithm which can be tailored to solve the problem given in (5) for an arbitrary number of ordered monotone curves. However, Beran and D ¨umbgen (2009) do not provide an analysis of the structure of the estimated curves such as characterizations and rather put their emphasis on the algorith- mic developments of the problem.
3.4 Real data example from mechanical engineering
We would like to estimate the stress-strain curves based on the available experimental data for two different velocity levels (see Figure 1). The expected curves have to be isotonic and ordered. The data consist of 1495 pairs(xi, yi)and(yi, zi). The values of the measured strain of the material (on thex-axis), are actually defined as (−)the logarithm of the ratio of the current over the initial specimen length. The values are positive and take the maximal value 1, which corresponds to a maximum shortening of 63%.
Furthermore, since the stress measurements for different velocities are not performed ex- actly at the same strain, the values of the stress have been interpolated at equally spaced values of the strain. As pointed out by a referee, this will induce correlation between the strain data. Even if the strain measurement were not interpolated, having correlated stress measurements is rather inevitable in this particular application because of the data process- ing procedures associated with the measurement technique (see Shim and Mohr, 2009). The estimation method is however still applicable. When studying statistical properties of the isotonic estimators such as consistency and convergence, the correlation between the data should be of course taken into account.
In such problems, practitioners usually fit parametric models using a trial and error ap- proach in an attempt to capture monotonicity of the stress-strain curves as well as their order- ing. The methods used are rather arbitrary and can also be time consuming, hence the need for an alternative estimation approach. Our main goal is to provide those practitioners with a rigorous way for estimating the ordered stress-strain curves.
In Figure 2 (upper plot) we provide the original data (black and gray dots) and the pro- posed ordered isotonic estimates a∗ and b∗ as described above. Being step functions, the estimated isotonic curves are non-smooth, a well known drawback of isotonic regression, see among others Wright (1978) and Mukerjee (1988). The latter author pioneered the combi- nation of isotonization followed by kernel smoothing. A thorough asymptotic analysis of
the smoothed isotonized and the isotonic smooth estimators was given by Mammen (1991).
Mukerjee (1988, p. 743) shows that monotonicity of the regression function is preserved by the smoothing operation if the used kernel is log-concave. Thus, we define our smoothed ordered monotone estimators by
˜
a∗h(x) = Pn
i=1Kh(x−t)a∗i Pn
i=1Kh(x−xi) ˜b∗h(x) = Pn
i=1Kh(x−t)b∗i Pn
i=1Kh(x−xi)
for0≤ x ≤1. For simplicity, we used the kernelKh(x) =φ(x/h) whereφis the density function of a standard normal distribution which is clearly log-concave. Figure 2 (lower plot) depicts the smoothed isotonic estimates. We set the bandwidth toh= 0.1n−1/5 ≈0.023.
Motivated by estimation of stress-strain curves, an application from mechanical engineer- ing, we consider in this paper weighted Least Squares estimators in the problem of estimating two ordered isotonic regression curves. We provide characterizations of the solution and de- scribe a projected subgradient algorithm which can be used to compute this solution. As a by-product, we show how an adaptation of the well-known PAVA can be used to compute min-max estimators for any set functional satisfying the Averaging Property.
Acknowledgements. The first author would like to thank C´ecile Durot for some interest- ing discussions around the subject. We also thank JongMin Shim for having made the data available to us.
A Proofs
Proof of Theorem 2.2. Suppose that(a∗, b∗)is the solution. Forǫ∈(0,1), and(a, b) ∈ In
consider the pair(aǫ, bǫ)∈Rn×Rndefined as
aǫ = a∗+ǫ(a−a∗) bǫ = b∗+ǫ(b−b∗).
Fori≤j∈ {1, ..., n}, we have
aǫj−aǫi = (1−ǫ)(a∗j −a∗i) +ǫ(aj−ai)≥0 bǫj −bǫi = (1−ǫ)(b∗j −b∗i) +ǫ(bj−bi)≥0.
Also, fori∈ {1, ..., n}we have
aǫi −bǫi = (1−ǫ)(a∗i −b∗i) +ǫ(ai−bi)≤0.
0.0 0.2 0.4 0.6 0.8 1.0 0
5 10 15 20 25
measured strain, x
stress
upper isotonic estimate b*
lower isotonic estimate a*
0.0 0.2 0.4 0.6 0.8 1.0
0 5 10 15 20 25
measured strain, x
stress
upper isotonic smoothed estimate b ~
* lower isotonic smoothed estimate a ~*
Figure 2: Original observations, isotonic and isotonic smoothed estimates.
Hence,(aǫ, bǫ)∈ In, and 0 ≤ lim
ǫց0
1
ǫ(L2(aǫ, bǫ)−L2(a∗, b∗))
=
n
X
i=1
(a∗i −yi)(ai−a∗i)w1i +
n
X
i=1
(b∗i −zi)(bi−b∗i)w2i
yielding the inequality in (6).
Now consider the vectorsaǫandbǫsuch that forl= 1, ..., n aǫl = a∗l +ǫ a∗l 1l∈B1
ij
bǫl = b∗l
Letr≤s∈ {1, ..., n}. Ifr /∈Bi1jands /∈Bi1j, thenaǫs−aǫr =a∗s−a∗r ≥0. Ifr∈Bi1j and s /∈Bi1
j, thena∗s > a∗r and aǫs−aǫr =a∗s−a∗r+ǫa∗s >0for|ǫ|small enough. The same reasoning applies ifr /∈Bi1jands∈Bi1j. Finally, ifr, s∈Bi1j, thenaǫs−aǫr= 0.
Now, forr ∈ {1, ..., n}, we haveaǫr =a∗r≤b∗rifr /∈Bi1j. Otherwise,aǫr =a∗r(1+ǫ)< b∗r if|ǫ|is small enough. Hence,(aǫ, bǫ)∈ In, and
0 = lim
ǫց0
1
ǫ(L2(aǫ, bǫ)−L2(a∗, b∗))
=
n
X
r=1
(a∗r−yr)1r∈B1
ija∗rwr1. Summing up over all the setsBi1
j yields the identity in (7). We can prove very similarly the identity in (8).
Conversely, suppose that(a∗, b∗)∈ Insatisfies the inequality in (6). For any(a, b)∈ In, we have
L2(a, b)−L2(a∗, b∗) = 1 2
n
X
i=1
(ai−a∗i)2w1i +1 2
n
X
i=1
(bi−b∗i)2wi2
+
n
X
i=1
(a∗i −yi)(ai−a∗i)wi1
+
n
X
i=1
(b∗i −zi)(bi−b∗i)w2i
≥ 0.
We conclude that(a∗, b∗)is the solution of the minimization problem. 2 Proof of Proposition 2.3. Letǫ >0and consider(a, b)∈Rn×Rnsuch that
ai = a∗i −ǫ1i∈{1,..,t}, t∈ {1, ...n} bi = b∗i
for i = 1, ..., n. For small ǫ, (a, b) ∈ In. Using the characterization in Theorem 2.2, it follows that
t
X
j=1
(a∗j −yj)w1j ≤0
implying that
t
X
j=1
(a∗1−yj)w1j ≤0, for t∈ {1, ...n} or equivalently
a∗1≤min
t≥1 Av1({1, . . . , t}).
Now, consider(a, b)∈Rn×Rnsuch that
aj = a∗j−ǫ1j∈{1,...,t}, t∈ {1, ..., n} bj = b∗j −ǫ1j∈{1,...,t′}, 1≤t′≤t
forj= 1, ..., n, withǫ >0. For smallǫ, we have that(a, b)∈ I2, and hence
t
X
j=1
(a∗j −yj)w1j +
t
X
j=1
(b∗j −zj)wj2 ≤0.
It follows that
t
X
j=1
(a∗1−yj)w1j +
t′
X
j=1
(a∗1−zj)wj2≥0,
that is
a∗1≤ min
1≤t′≤t≤n
M˜({1, . . . , t},{1, . . . , t′}).
We conclude that a∗1≤min
t≥1 Av1({1, . . . , t})∧ min
t≥t′≥1
M(˜ {1, . . . , t},{1, . . . , t′}).
Now ifa∗1 < b∗1, leti1{1, ..., n}be such thata∗1 =. . .=a∗i1. Then(a, b)is such that aj = a∗j+ǫ1j∈{1,...,i1}
bj = b∗j
forj= 1, ..., nis inInwhen|ǫ|is small enough. It follows that Av1({1, . . . , i1}) = a∗1.
Ifa∗1 =b∗1, andi′1 andi′′1 are such thata∗1 =.... =a∗i′
1 andb∗1 =.... =b∗i′′
1, then(a, b)such that
aj = a∗j+ǫ1j∈{1,...,i′ 1}
bj = b∗j +ǫ1j∈{1,...,i′′
1}
forj= 1, ..., nis inInfor|ǫ|small enough. Hence,
a∗1 = ˜M({1, . . . , i′1},{1, . . . , i′′1}).
(note thati′′1 ≤i′1). Therefore, a∗1= min
t≥1 Av1({1, . . . , t})∧ max
t≥t′≥1
M˜({1, . . . , t},{1, . . . , t′}).
The expression of b∗1 follows easily by replacing respectively yi and zi by −zn−i+1 and
−yn−i+1fori= 1, . . . , n.2
Proof of Theorem 3.3. Considera∈Rngiven by ai = max
s≤i min
t≥i M({s, . . . , t})
and also the subdivision into subsetsSj ={ij−1+ 1, . . . , ij}obtained by the PAVA. Let us denote byG−(resp. G+) the grid set of indices which correspond to points at the beginning (resp. end) of those subsets; i.e. of the formij+ 1(resp. ij).
We obviously have
ai ≤max
s≤i min
t≥i, t∈G+M({s, . . . , t}).
Then, considers /∈G−. This means that we have a set{s, . . . , t}of the formB∪C,Cbeing a union of subsets in the subdivision andBa right subset of a set of the partition of the form A∪B. We want to prove thatM({s, . . . , t}) =M(B∪C)is either smaller thanM(C)or M(A∪B∪C). Suppose this is not the case. Then we would have
M(B∪C)> M(C), M(B∪C)> M(A∪B∪C), M(A)> M(B),
where the last inequality is implied by the second property in Theorem 3.2. Yet, the second inequality, together with the Averaging Property , implies thatM(A) < M(B ∪C). In the end we get
M(B∪C)> M(C), M(B∪C)> M(A)> M(B), which contradicts the Averaging Property .
We conclude thatM({s, . . . , t})is smaller than the value ofM at a set which is a union of sets of the subdivision; i.e. either A∪B ∪C orC itself. But on sets of this kind it is obvious, by the Averaging Property , thatM is smaller than the valuemt, since this is the