HAL Id: hal-00984010
https://hal.archives-ouvertes.fr/hal-00984010
Submitted on 26 Apr 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Optimal extrapolation design for the Chebyshev
regression
Michel Broniatowski, Giorgio Celant
To cite this version:
Michel Broniatowski, Giorgio Celant. Optimal extrapolation design for the Chebyshev regression. Annales de l’ISUP, Publications de l’Institut de Statistique de l’Université de Paris, 2015, 59 (3), pp.3-22. �hal-00984010�
Optimal extrapolation design for
the Chebyshev regression
Michel Broniatowski
(1), Giorgio Celant
(2; )(1)
LSTA, Université Pierre et Marie Curie, Paris, France
(2)Dipartmento di Scienze Statistiche, Università degli
Studi di Padova, Italy
( )
Corresponding author: giorgio.celant@stat.unipd.it
Abstract
This paper introduces optiomal designs in the context of a regression model when the regression function is assumed to be generated by a Chebyshev system of functions. The criterion for optimality is the variance of a Gauss Markov estimator for an extrapolated value.
Key words: Chebyshev system; optimal design; extrapolation de-sign; Borel-Chebyshev Theorem
1
Introduction
This paper deals with a natural extension of the Hoel Levine optimal extrapolation design, as described in [Hoel, 1966]. We recall that this classical design results as a consequence of the following fact.
A design is de…ned as a discrete probability measure on a set of measurements points x0; ::; xg 1 which for notational convenience belong
to the observable environmental set [ 1; 1] , denoting ni=n := (xi) the
frequency of replications of the experiment to be performed at point xi;
0 i g 1, where the ni’s satisfy n0+ :: + ng 1 = n: The points xi’s
are the nodes of the design, and (xi) is the so-called frequency of the
design at node xi : Recall that the model writes
Y (x) = f (x) + (x)
for x in [ 1; 1] , the real valued function f is unknown but belongs to a speci…ed class of functions, and the random variable (x) is cen-tered, with a …nite variance, in the present context. Observations are performed under the design, with the constraint
on the global budget of the design. Replications of the ni measurements
Yj(xi) , 1 j ni are independent. Independence also holds from node
to node, which is to assume that all measurements errors due to the r.v.’s (x) are independent. The model is supposed to be homoscedasticity; hence the variance of (x) may not depend on x:
For a given c not in [ 1; 1] consider an estimate of f (c) with smallest variance among all unbiased estimators of f (c) which are linear functions of the observations Yj(xi) , 1 j ni;,0 i g 1; hence under a
given design : An optimal design achieves the minimal variance among all such designs. This design is achieved by the Hoel Levine design when the function f is assumed to belong to the class of all polynomials de…ned on R with degree less or equal g 1, hence to the span of the class of monomials f1; x; ::; xg 1g :
The main mathematical argument in order to obtain the Hoel Levine design lies in the solution of the following basic question: …nd a poly-nomial with equioscillations in g + 1 points in [ 1; 1] which assumes maximal absolute values all equal to 1 at those points. Up to a mul-tiplicative constant such a polynomial results as the best polynomial approximation of the null function on [ 1; 1] by polynomials with de-gree g 1: Existence and uniqueness of this polynomial follows from the Borel-Chebyshev Theorem. We refer to[Dzyadyk and Shevchuk, 2008] for details and derivation of these results.
The aim is now to provide a larger context for similar questions, assuming that the function f may belong to some other functional class, still in a …nitely generated set of functions.
De…nition 1 The system of functions ('0; :::; 'g 1) in C (R) is a
Cheby-shev (or Haar) system on [ 1; 1] when 1) ('0; :::; 'g 1) are linearly independent
2) Any equation
a0'0(x) + ::: + ag 1'g 1(x) = 0
with (a0; ::; ag 1) 6= (0; ::; 0) has at most g roots in [ 1; 1] :
Denote
V := span f'0; :::; 'g 1g C ([ 1; 1])
the linear space generated by the Chebyshev system ('0; :::; 'g 1) :
Haar Theorem (see [Dzyadyk and Shevchuk, 2008] ) states that the two following assertions are equivalent:
b) for any f in C ([ 1; 1]) there exists a unique best uniform approx-imation in V:
In the sequel we assume that the system f'0; :::; 'g 1g is a Chebyshev
system in C ([ 1; 1]) and in C ([ 1; c]) with c > 1: This implies that no non null linear combination of the 'i’s may have roots in (1; c] :
We also make use of the following result.The following properties are equivalent
Proposition 2 1) f'0; :::; 'g 1g is a Chebyshev system;
2) for any set of g points (x0; :::; xg 1) in [ 1; 1] such that xi 6= xj,
and for any (y0; :::; yg 1) in Rg, there exists a unique function g in V
such that g (xk) = yk;
3) for any g points (x0; :::; xg 1) in [ 1; 1] such that xi 6= xj, the
determinant := det G; G := 0 B B B B @ '0(x0) : '0(xj) : '0(xg 1) : : : : : 'i(x0) : 'i(xj) : 'i(xg 1) : : :: : : 'g 1(x0) : 'g 1(xj) : 'g 1(xg 1) 1 C C C C A:
does not equal 0:
Proof. Assume 3) holds. With the set of g points (x0; :::; xg 1) in [ 1; 1]
such that xi 6= xj, = 0 i¤ the matrix G is not invertible, which is to
say that the system of equations de…ned through 0 = Pgi=01aigi(xj) ; j =
0; :::; g 1; admits a solution a := a0; :::; ag 1 di¤erent from (0; :::; 0)
in Rg: De…ne g := Pg 1
i=0 aigi , an element in V which is not the
func-tion x ! 0: Since , Pgi=01aigi(x) = 0 for x in fx0; :::; xg 1g it follows
that g has g distinct roots in [ 1; 1] : It follows that whenever = 0; f'0; :::; 'g 1g is not a Chebyshev system. It follows that 3) is equivalent
to 1): Now 2) is equivalent to 3): Indeed when G is invertible then for any (y0; :::; yg 1) in Rg 1 the systemPi=0g 1aigi(xj) = yj; j = 0; :::; g 1,
has a unique solution , which means that there is a unique g in V with g (xj) = yj for all j:
We therefore introduce the basic de…nition De…nition 3 A regression model
Y (x) = f (x) + (x)
is a Chebyshev regression model i¤ f belongs to V := span f'0; :::; 'g 1g
where ('0; :::; 'g 1) is Chebyshev system (or Haar system) of functions
The following result stands as a generalization of the Borel Chebyshev Theorem and improves on the Haar Theorem
Theorem 4 .(Generalization of Chebyshev-Borel Theorem) Let f'0; :::; 'g 1g
be a Chebyshev system on [ 1; 1] , and g is any function in C ([ 1; 1]) : Then there exists a unique function h in V := span f'0; :::; 'g 1g
de…ned on [ 1; 1] , which achieves sup
x2[ 1;1]
jg(x) h(x)j = inf
f 2V x2[ 1;1]sup jg(x) f (x)j :
Furthermore h is the only function in V such that p := g h attains its unique maximal values in at least g + 1 points in [ 1; 1]; the sign of p on those points alternates.
Proof. See [Achieser, 1992].
Remark 5 The above function h plays a similar role as the function Tg 1 (Chebyshev polynomial of the …rst kind) in the polynomial regression
case; see [Broniatowski and Celant, 2014].
The notation Md ([ 1; 1]) designates the class of all discrete
proba-bility measures with support in [ 1; 1] :
The aim of this paper is to present the contribution of Hoel [Hoel, 1966] to the construction of optimal designs for the extrapolated value of the
regression function as treated by Kiefer and Wolfowitz [Kiefer and Wolfowitz, 1965]. The model and the Gauss Markov estimator are de…ned in the next
Sec-tion. An orthogonalization procedure allows to express the extrapolated value as a parameter in an adequate regression model. Finally the sup-port of the optimal design will be obtained through geometrical argu-ments; the number of replications of the experiments on the nodes will then be deduced.
2
The model and Gauss Markov estimator
We consider a Chebyshev system on [ 1; 1] f'0; :::; 'g 1g :
For any x 2 [ 1; 1] we assume that we may observe a r.v. y (x), such that, denoting := ( 0; ::; g 1) 0 f (x) := E (Y (x)) = g 1 X j=0 j'j(x) = (X (x)) 0 : (1)
We notice that the function
f : R ! R; x 7! f (x)
is continuous on R: Indeed since the system of the g equations in 8 < : f (x0) =Pgj=01 j'j(x0) :::::::::::::::::::::::::::::::: f (xg 1) = Pgj=01 j'j(xg 1)
has a unique solution whenever (f (x0) ; :::; f (xg 1)) 0
is known, for any (x0; :::; xg 1)
0
2 [ 1; 1]g with 1 x0 < ::: < xg 1 1; the
function f can be extended on R; this extension is continuous since so are the 'i’s.
Recall that the measurements can be performed only on [ 1; 1], and not for jxj > 1:
2.1
Examples of Chebyshev systems
Here is a short list of classical chebyshev systems. We refer to the classi-cal treaties of [Karlin and Studden, 1966] for a extensive study of those systems and their applications in analysis and in statistics.
a) f'0(x) = 1; '1(x) = x3g is a Chebyshev system on whole R,
b) '0(x) = 1; '1(x) = x13 is a Chebyshev system on (0; +1),
c) f1; cos x; cos 2x; :::; cos nxgis a Chebyshev system on [0; ),
d)f1; sin x; cos x; sin 2x; cos 2x:::; sin nx; cos nxg is a Chebyshev sys-tem on R 2
e) fsin x; sin 2x; :::; sin nxg is a Chebyshev system on [0; ),
f) f'0(x) = x2 x; '1(x) = x2+ x; '2(x) = x2+ 1gis a Chebyshev system on R; g)fxa0; :::; xan; where 0 = a 0 < ::: < an g is a Chebyshev system on [0; +1), h) fea0x; :::; eanx; where 0 = a
0 < ::: < an gis a Chebyshev system on
R,
i) f1; sinh x; cosh x; :::; sinh nx; :::; cosh nxg is a Chebyshev system on R,
j) (x + a0) 1; :::; (x + an) 1; where 0 = a0 < ::: < an is a
Cheby-shev system on [0; +1),
k) f1; log x; x; x log x; x2; x2log x; :::; xn; xnlog xgis a Chebyshev
sys-tem on (0; 1), .... .
Finally note that being a Chebyshev system is a linear property; indeed if ('0; :::; 'g 1) is a Chebyshev system then any other basis of
span f'0; :::; 'g 1g is a Chebyshev system.
2.2
Description of the dataset coming from the
ex-periment
Given the set of nodes 1 x0 < ::: < xg 1 1, the experiment is
described through the following measurements 8 > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > : Y1(x0) =Pgj=01 j'j(x0) + "1(x0) :::::::::::::::::::::::::::::::::::: Yn0(x0) = Pg 1 j=0 j'j(x0) + "n1(x0) :::::::::::::::::::::::::::::::::::: Y1(xi) =Pgj=01 j'j(xi) + "1(xi) :::::::::::::::::::::::::::::::::::: Yni(xi) = Pg 1 j=0 j'j(xi) + "ni(xi) :::::::::::::::::::::::::::::::::::: Y1(xg 1) = Pgj=01 j'j(xg 1) + "1(xg 1) :::::::::::::::::::::::::::::::::::: Yng 1(xg 1) = Pg 1 j=0 j'j(xg 1) + "ng 1(xg 1)
or through the more synthetic form, with
Y (xi) := Y1(xi) ; :::; Ynj(xi) 0 , X (xi) := ('0(xi) ; :::; 'g 1(xi)) 0 , " (xi) := ("1(xi) ; :::; "ni(xi)) 0 , Y (xi) = 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A 0 @ :0 g 1 1 A+" (xi) , i = 0; :::; g 1. Denote, Xi := 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A :
The matrix Xi has ni lines and g columns. All lines of X equal
(X (xi)) 0
Denote
H := Im X (x) := fX (x) 2 Rg : x 2 [ 1; 1]g : (2) The set H is called the regression range.
It may be at time convenient to attribute distinct indices to the same xj when repeated nj times.
The discrete measure de…ned through
z }| {
x0; :::; x0 (n0 times); :::;z }| {xj; :::; xj(nj times ); :::;zxg 1; :::; x}| g {1(ng 1 times)
with
n0+ ::: + ng 1 = n
will hence be written as
t1; :::; tn (3)
with t1 = t2 = :: = tn0 = x0; :::; tn0+::+ng 2+1 = tn0+::+ng 2+2 = ::: =
tn0+::+ng 2+ng 1 = xg 1; hence t1; :::; tn0 indicates the same point x0
re-peated n0 times, etc.
The system which describes the n observations writes therefore as Y = C + " where Y := 0 @ Y:1 Yn 1 A C := 0 B B B B @ '0(t1) ::: 'g 1(t1) :: ::: ::: '0(ti) ::: 'g 1(ti) ::: :: ::: '0(tn) ::: 'g 1(tn) 1 C C C C A, := 0 @ :0 g 1 1 A , " := 0 @ ":1 "n 1 A ; E (Y) = C , var (") = 2In;
and In is the Identity matrix of order n:
The Gauss Markov estimator of f (x) = E (y (x)) is the solution of the linear system
Xi0Xi = X 0 iY (xi) , i = 0; :::; g 1: It holds X0 i Xi = 0 @ ':::0(xi) ::: '::: :::0(xi) 'g 1(xi) ::: 'g 1(xi) 1 A 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A = niMi where Mi := 0 B B B B B B @ ('0(xi))2 ::: '0(xi) 'k(xi) ::: '0(xi) 'g 1(xi) :::::: :::: :::: :::: ::::: 'h(xi) '0(xi) .::: 'h(xi) 'k(xi) .::: 'h(xi) 'g 1(xi) ::::: .::: .::: .::: .::: 'g 2(xi) '0(xi) ::::: 'g 2(xi) 'k(xi) ::::: 'g 2(xi) 'g 1(xi) 'g 1(xi) '0(xi) ::::: 'g 1(xi) 'k(xi) ::::: ('g 1(xi))2 1 C C C C C C A . We have Mi = X (xi) X 0 (xi) : In X (xi) X 0 (xi) = X 0 i Y (xi) , i = 0; :::; g 1
sum both sides with respect to i to obtain
g 1 X i=0 XiXi0 = g 1 X i=0 X0 i Y (xi) : Therefore n g 1 X i=0 ni nMi ! = g 1 X i=0 X0 i Y (xi) . Denote i := (x) := ni n if x = xi 0 if x =2 fx0; :::; xg 1g : The matrix M ( ) := g 1 X i=0 ni nMi = g 1 X i=0 iMi (4)
is the moment matrix of the measure . By de…nition supp( ) = fx0; :::; xg 1g . Since Mi = X (xi) X 0 (xi) we may write M ( ) = g 1 X i=0 iMi = g 1 X i=0 i X (xi) X 0 (xi) = Z [ 1;1] X (x) X0(x) d (x) .
Speci…c study of this matrix is needed for the estimation of linear forms of the coe¢cients i’s. This area has been developed by Elfving
see e.g. [Pukelsheim, 1993]), out of the scope of the present paper.
3
An expression of the extrapolated value through
an orthogonalization procedure
We will consider an alternative way, developed by Kiefer and Wolfowitz [Kiefer and Wolfowitz, 1965] as follows. It has the main advantage that up to a coe¢cient g 1 which depends on the values of f on the x0js,
the estimate of f (c) is 'g 1(c): It follows that only the coe¢cient g 1
has to be estimated, a clear advantage. Recall that c does not belong to [ 1; 1] :
It is more convenient, at this stage, to introduce the following no-tation. It will be assumed that n measurements of Y are performed, namely
Y (t1); ::; Y (tn)
where the t0
is belong to [ 1; 1] : The points of measurement t1; ::; tnmight
be distinct or not, as de…ned in (3). Obviously when de…ning the optimal design with nodes x0; ::; xg 1, then nj values of the t0is coincide on xj for
0 j g 1: In order to de…ne the estimator, and not the design, it is however more convenient to di¤erentiate between all the measurements Y (ti); 1 i n: This allows to inherit from the classical geometric least
square formalism.
We consider the basis of V de…ned as follows: Set for all j between 0 and g 2
hj(x) := 'j(x)
'j(c)
'g 1(c)
and
hg 1(x) := 'g 1(x):
Clearly (h0; ::; hg 1) generate V . Also (h0; ::; hg 1) is a Chebyshev system
on [ 1; c] :
Denote ( 0; ::; g 1) the coordinates of f on (h0; ::; hg 1) , namely
f (x) =
g 1
X
j=0
jhj(x):
We evaluate the coe¢cients j with respect to the k0s de…ned in (1). It
holds j := j for j = 0; :::; g 2 and g 1 := Pg 1 j=0 j'j(c) 'g 1(c)
assuming 'g 1(c) 6= 0; and obviously we have
f (x) = g 1 X j=0 jhj(x) = g 1 X j=0 j'j(x) : In x = c we get f (c) := g 1 X j=0 jhj(c) = g 1 X j=0 j'j(c) :
By the de…nition of g 1 we have
g 1 :=
Pg 1
j=0 j'j(c)
'g 1(c)
and therefore we have proved Lemma 6
f (c) = g 1'g 1(c) : (6)
4
The Gauss Markov estimator of the extrapolated
value
It holds f (c) = g 1 X i=1 i'i(c)where the i’s are de…ned through g equations of the form f (xj) = g 1 X i=0 i'i(xj)
with 1 xj 1 for all 0 j g 1:
Replace f (xj) by its estimate
[ f (xj) := 1 nj nj X i=1 Yi(xj):
Under the present model, [f (xj) is an unbiased estimate of f (xj):
Deter-mine bi though the system de…ned by
[ f (xj) = g 1 X i=0 bi'i(xj):
The resulting bi’s are unbiased and so is
d f (c) = g 1 X i=0 bi'i(c):
The natural optimality criterion associated to this procedure is the vari-ance of the estimate df (c) which depends on the location of the nodes and on the weights nj’s.
We now write the above Gauss Markov estimator of f (c) on the new basis (h0; ::; hg 1) : Substituting the function f by its expansion on the
basis (h0; ::; hg 1) the model write as
8 > > > > < > > > > : Y (t1) = 0h0(t1) + ::: + g 2hg 2(t1) + g 1'g 1(t1) + "1 ::::::::::::::::::::::::::::::::::::::::::::::: Y (ti) = 0h0(ti) + ::: + g 2hg 2(ti) + g 1'g 1(ti) + "i ::::::::::::::::::::::::::::::::::::::::::::::::: Y (tn) = 0h0(tn) + ::: + g 2hg 2(tn) + g 1'g 1(tn) + "n : because of (5), Y (t) = T + " where t := (t1; ::; tn) 0
T := 0 B B B B @ h0(t1) : : hg 2(t1) 'g 1(t1) : : : : : h0(ti) : : hg 2(ti) 'g 1(ti) : : : : : h0(tn) : : hg 2(tn) 'g 1(tn) 1 C C C C A; := 0 B B @ 0 : g 2 g 1 1 C C A ; " := 0 @":1 "n 1 A :
Recall that we intend to estimate g 1: We make a further change
of the basis of V: We introduce a vector Gg 1, which together with
h0; :::; hg 2 will produce a basis (h0; :::; hg 2; Gg 1) for which the
vec-tor Gg 1 is orthogonal to any of the hj, 0 j g 2: The aim of this
construction is to express f (c) as a linear combination of the components of Gg 1: Since Gg 1 belongs to V = span (h0; ::; hg 1) we write
Gg 1(ti) := hg 1(ti) g 2
X
j=0
jhj(ti)
for some vector := ( 0; ::; g 1) 0
: We impose the following condition
*0 @Gg 1:(t1) Gg 1(tn) 1 A ; 0 @hj(t: 1) hj(tn) 1 A + = 0; for all j = 0; :::; g 2 ;
where the above symbol <; > is the inner product in Rn:The
j’s in R
are to be chosen now. The linear system
n
X
i=1
G (ti) hj(ti) = 0; for j = 0; :::; g 2
with g 1 equations has g 1 unknown variables j .
Once obtained the solution j; j = 0; :::; g 2 ; and since
hg 1(t) = Gg 1(t) + g 2
X
j=0
jhj(t) ;
we may write f (t) for any t
f (t) = g 1 X j=0 jhj(t) = 0h0(t) + ::: + g 2hg 2(t) + g 1Gg 1(t) + g 1 0h0(t) + ::: + g 1 g 2hg 2(t) = ( 0+ g 1 0) h0(t) + ::: + ( g 2+ g 1 g 2) hg 2(t) + g 1Gg 1(t) = 0h0(t) + ::: + g 2hg 2(t) + g 1Gg 1(t) ;
where the 0
js are de…ned by
j := j + g 1 j for j = 0; :::; g 2 g 1 for j = g 1 :
The point is that g 1 appears as the coe¢cient of Gg 1 , namely the
last term in the regression of f (t) on the regressors (h0; ::; hg 2; Gg 1) :
Furthermore Gg 1 is orthogonal to the other regressors. The system
which describes the data is now written by Y (t) = eT e + " where e T := 0 B B B B @ h0(t1) : : hg 2(t1) Gg 1(t1) : : : : : h0(ti) : : hg 2(ti) Gg 1(ti) : : : : : h0(tn) : : hg 2(tn) Gg 1(tn) 1 C C C C A; e := 0 B B @ 0 : g 2 g 1 1 C C A :
The minimum least square estimation of g 1 is obtained through
the normal equations imposing
Y (t) T be e 2 V?
where be hence designates the least square estimator of the vector of coe¢cients e; and where V?
is the orthogonal linear space of V:
We have, denoting dg 1 the least square estimator of g 1; and noting
that V = span fh0; :::; hg 2; Gg 1g *0 B B @ Y (t1) Pgj=02 jhj(t1) dg 1Gg 1(t1) : : Y (tn) Pg 2 j=0 jhj(tn) dg 1Gg 1(tn) 1 C C A 0 ; 0 B B @ hj(t1) : : hj(tn) 1 C C A + = 0; for j = 0; :::g 2 and *0 B B @ Y (t1) Pgj=02 jhj(t1) dg 1Gg 1(t1) : : Y (tn) Pgj=02 jhj(tn) dg 1Gg 1(tn) 1 C C A 0 ; 0 B B @ Gg 1(t1) : : Gg 1(tn) 1 C C A + = 0 Hence
X i=1;::;n Y (ti) g 2 X j=0 jhj(ti) dg 1Gg 1(ti) ! Gg 1(ti) = 0: (7)
Inserting the orthogonality condition
n X i=1 G (ti) hj(ti) = 0; for j = 0; :::; g 2; in (7) we have n X j=1 Y (tj) Gg 1(tj) dg 1 n X j=1 G2g 1(tj) = 0; and dg 1 = Pn j=1Y (tj) Gg 1(tj) Pn j=1G2g 1(tj) :
Finally we obtain the explicit form of the estimator of f (c): It holds Proposition 7 The least square estimator (Gauss Markov) of the ex-trapolated value f (c) is d f (c) = 'g 1(c) dg 1 = 'g 1(c) Pn j=1Y (tj) Gg 1(tj) Pn j=1G2g 1(tj) :
5
The Optimal extrapolation design for the
Cheby-shev regression
5.1
The support of the optimal design
We determine the support of the optimal design for the extrapolation of f at point c:
Recall that a design is optimal if and only if it produces a Gauss Markov estimator of f (c) with minimal variance among all such estima-tors built upon other designs.
var f (c) = ('d g 1(c))2 Pn j=1var (Y (tj)) G2g 1(tj) Pn j=1G2g 1(tj) 2 = ( 'g 1(c)) 2 Pn j=1G2g 1(tj) :
The design is de…ned through a discrete probability measure 2 Md
([ 1; 1]) with support (x0; ::; xg 1) with (xj) := nj=n and nj equals the
number of the t0
is which equal xj, for 0 j g 1:
We now determine the support of the optimal design denoted .
:= arg min 2Md([ 1;1]) 1 Pg 1 j=0njG2g 1(xj) = arg max 2M X g 1 X i=0 njG2g 1(xj) = arg max 2Md([ 1;1]) g 1 X i=0 nj hg 1(xi) g 2 X j=0 jhj(xi) !2 :
The solution can be obtained in a simple way through some analysis of the objective function. By convenience in order to use simple geo-metric arguments and to simplify the resulting expressions it is more convenient to write the derivation of the optimal design in terms of the t0 is: The function n X i=1 hg 1(ti) g 2 X j=0 jhj(ti) !2 = 0 @ hg 1(t1) Pgj=02 jhj(t1) : hg 1(tn) Pg 2 j=0 jhj(tn) 1 A 2
is the distance from the orthogonal projection of the vector h:= hg 1(t1) ::: hg 1(tn)
0
on the linear space V generated by the family fh0; :::; hg 2; Gg 1g :
Therefore by the minimal projection property
n X i=1 hg 1(ti) g 2 X j=0 jhj(ti) !2 = min 2V dist (h; ) : Let := 0::: g 2 0 :
The optimal design is obtained through a two steps procedure. Fix the frequencies n0; ::; ng 1 with sum n and determine the discrete
mea-sure on [ 1; 1] which minimizes var f (c) among alld 0
s with sup-port x := (x0; ::; xg 1) and masses (xj) = nj=n; 0 j g 1. The
optimization is performed upon the x0 js:
The optimal design solves therefore the problem
= arg max 2Md([ 1;1])min 2V dist (h; ) = arg max x2[ 1;1]g 2Rming 1 g 1 X i=0 ni hg 1(xi) g 2 X j=0 jhj(xi) !2 = arg max 2Md([ 1;1]) 2Rming 1 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) : The integrand hg 1(x) Pgj=02 jhj(x) 2
is always non negative. Henceforth it is enough to minimize its square root w.r.t. x. This optimization turns therefore to be independent of the n0
js:
Denote j; j = 0; :::; g 2; the values which minimize dist (h; ) w.r.t.
j. The optimality condition writes
max x2[ 1;1] hg 1(x) g 2 X j=0 jhj(x) = min 2Rg 1x2[ 1;1]max hg 1(x) g 2 X j=0 jhj(x) (8) = min p2Wx2[ 1;1]max jhg 1(x) p (x)j where W := span fh0; ::; hg 2g : (9)
If we prove that fh0; :::; hg 2g is a Chebyshev system on [ 1; 1] ; then
clearly the support of the optimal measure consists in the points of maximal value in [ 1; 1] for the function
jhg 1(x) p (x)j
where p is the best uniform approximating polynomial of hg 1 in W:
Indeed the support of consists in the set of points where
hg 1(x) g 2
X
j=0
in (8) attains its maximal value for p = p the best uniform approxima-tion of hg 1 in W:
This is the major argument of the present derivation, which justi…es all of the uniform approximation theory in this context.
De…nition 8 The vector in Rg 1 is a Chebyshev vector i¤ it
desig-nates the vector of the coe¢cients of p , where p is the best uniform approximating polynomial of hg 1 in W de…ned in (9). It is de…ned
through (8). Now writing
:= 0; :::; g 2 0
we de…ne the set of all pointsex in [ 1; 1] where the distance between hg 1 and its best approximation on the hk, 0 k g 2 is minimal.
These points are precisely the support of the optimal design . Formally we de…ne E := min 2Rg 1x2[ 1;1]max hg 1(x) g 2 X j=0 jhj(x) (10) and B ( ) := ( ex 2 [ 1; 1] : hg 1(ex) g 2 X j=0 jhj(ex) = E ) : (11)
It holds (see Proposition 10 below) (B ( )) = 1:
We prove that fh0; :::; hg 2g is a Chebyshev system on [ 1; 1] :
Proposition 9 (Hoel) The functions h0; :::; hg 2 are a Chebyshev
sys-tem on [ 1; 1] :
Proof. For any choice of fx0; :::; xgg in [ 1; 1], with x0 < ::: < xg 1;
since the family f'0; :::; 'g 1g is a Chebyshev system on [ 1; 1], we have,
by Proposition 2, assuming a positive sign of the determinant, without loss of generality
0 < det 0 B B B B @ '0(x0) '0(x1) : '0(xg 2) '0(c) '1(x0) '1(x1) : '1(xg 2) '1(c) : : : : : 'g 2(x0) 'g 2(x1) : 'g 2(xg 2) 'g 2(c) 'g 1(x0) 'g 1(x1) : 'g 1(xg 2) 'g 1(c) 1 C C C C A:
For j = 0; :::; g 1;the operations
'j(xi) 7! 'j(xi) 'j(c)
'g(xi)
'g(c)
do not change the value of the determinant. Hence, 0 < det 0 B B B B @ '0(x0) '0(x1) : '0(xg 2) '0(c) '1(x0) '1(x1) : '1(xg 2) '1(c) : : : : : 'g 2(x0) 'g 2(x1) : 'g 2(xg 2) 'g 2(c) 'g 1(x0) 'g 1(x1) : 'g 1(xg 2) 'g 1(c) 1 C C C C A = det 0 B B B B @ h0(x0) h0(x1) : h0(xg 2) '0(c) h1(x0) h1(x1) : h1(xg 2) '1(c) : : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 'g 2(c) 0 0 : 0 'g 1(c) 1 C C C C A:
By the Laplace Theorem pertainig to determinants, we get
0 < det 0 B B B B @ h0(x0) h0(x1) : h0(xg 2) '0(c) h1(x0) h1(x1) : h1(xg 2) '1(c) : : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 'g 2(c) 0 0 : 0 'g 1(c) 1 C C C C A = 'g 1(c) det 0 B B @ h0(x0) h0(x1) : h0(xg 2) h1(x0) h1(x1) : h1(xg 2) : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 1 C C A := 'g 1(c) :
Therefore the two real numbers 'g 1(c) and have same sign.
Since 'g 1(c) 6= 0 we deduce that
det 0 B B @ h0(x0) h0(x1) : h0(xg 2) h1(x0) h1(x1) : h1(xg 2) : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 1 C C A 6= 0:
Hence the family f'0; :::; 'g 1g is a Chebyshev system in C ([ 1; 1]) :
In the same way we can prove that it is a Chebyshev system in [ 1; c] :
5.2
The frequencies of the optimal design
Once characterized the points x in supp ; we characterize the values of the (x) ’s:The following Proposition produces a su¢cient condition in order that the measure be optimal, which can be phrased as
min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx)
for any in Md ([ 1; 1]) : Uniqueness might not hold.
Proposition 10 (Kiefer-Wolfowitz) Let B ( ) be de…ned as in (11). If is Chebyshev vector and (B ( )) = 1 and if
Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0; for i = 0; :::; g 2; then is optimal.
Proof. Let 2 Md ([ 1; 1]) with (B ( )) = 1: The hypothesis
Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0;
for i = 0; :::; g 2; indicates that
hg 1(xi) g 2
X
j=0
jhj(xi)
is orthogonal to the linear space W generated by fh0; :::; hg 2g : Thus
Pg 2
j=0 jhj is the orthogonal projection of hg 1 on W: The inner product
is
< v; w >:= Z
[ 1;1]
v(x) w (x) (dx) : By the minimal projection property
A ( ) := min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = X e x2supp hg 1(ex) g 2 X j=0 jhj(ex) !2 (ex) E2 X e x2supp (ex) = E2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) ; max v2Md([ 1;1]) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) =: A ( )
The measure v which appears in lines 5 and followings in the above displays are arbitrary measures in Md([ 1; 1]) :
Since by de…nition := arg max v2Md([ 1;1]) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) (12) i.e. A ( ) A ( ) : Hence A ( ) = A ( ) :
6
Identi…cation of the optimal design
In this Section we provide an explicit solution for the optimal design and prove its uniqueness.
By the Borel-Chebyshev Theorem 4 there exist at least g points x0 < ::: < xg 1
in [ 1; 1] on which the best uniform approximation of hg 1; namely
Pg 2
j=0 jhj; satis…es the following conditions
hg 1(xi) g 2 X j=0 jhj(xi) = ( 1) i E:
We now see that there are exactly g points on which the function hg 1 Pgj=02 jhj equals E:
Since fh0; :::; hg 1g is a Chebyshev system the linear combination g 1
X
i=0
aihi
cannot have more than g roots in [ 1; 1] : Hence the function
hg 1 g 2
X
j=0 jhj
which is the absolute value of a linear combination of the Cheby-shev system fh0; :::; hg 1g cannot have more than g roots. Therefore
hg 1 Pgj=02 jhj cannot have more than g + 1 maximal values.
As seen previously the support of the optimal measure consists in the points of maximal value in [ 1; 1] for the function
hg 1 g 2
X
j=0
jhj :
Applying the Borel-Chebyshev Theorem we now determine the sup-port of :
Since E is known the support is the vector x1; :::; xg which solves
the linear system
hg 1(xi) g 2 X j=0 jhj(xi) = ( 1) i E; i = 1; :::; g:
We apply the su¢cient condition provided by Kiefer and Wolfowitz above, Proposition 10.
This condition states that the values (xi) ; i = 0; :::; g 1; satisfy the system (Pg 1 i=0 hg 1(xi) Pgj=02 jhj(xi) hr(xi) (xi) = 0 r = 0; :::; g 2 : In the xi’s it holds E = hg 1(xi) g 2 X j=0 jhj(xi) and Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0; for i = 0; :::; g 2: Therefore 0 = g 1 X i=0 hg 1(xi) g 2 X j=0 jhj(xi) ! hr(xi) (xi) = E g 1 X i=0 ( 1)ihr(xi) (xi) , for r = 0; :::; g 2:
The optimal extrapolation design f(xi; (xi)) : i = 0; :::g 1g thus
solves 8 > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > : hg 1(x0) Pgj=02 jhj(x0) = +E ::::::::::::::::::::::::::::::::::::::::::::::: hg 1(xi) Pg 2 j=0 jhj(xi) = ( 1) i E :::::::::::::::::::::::::::::::::::::::::::::: hg 1(xg 1) Pj=0g 2 jhj(xg 1) = ( 1)g 1E Pg 1 i=0 ( 1) i h0(xi) (xi) = 0 ::::::::::::::::::::::::::::::::::::: Pg 1 i=0 ( 1) i hr(xi) (xi) = 0 :::::::::::::::::::::::::::::::::: Pg 1 i=0 ( 1) i hg 2(xi) (xi) = 0
In practice we …rst evaluate j for 0 j g 2 through (8). Note that E is known by (10). The above system consits in 2g 1 equations in the 2g unknown quantities f(xi; (xi)) : i = 0; :::g 1g : Add the constraint
to obtain a linear system with a unique solution.
The …rst g equations determine the nodes, by Borel Chebyshev The-orem. The last g 1 ones determine the values of the n0
js by the
Proposi-tion of Kiefer and Wolfowitz 10. Hence there is a unique optimal design solving the minimal variance problem for the extrapolation.
References
[Achieser, 1992] N. I. Achieser. Theory of approximation. Dover Publi-cations, Inc., New York, 1992. Translated from the Russian and with a preface by Charles J. Hyman, Reprint of the 1956 English translation. [Broniatowski and Celant, 2014] M. Broniatowski and G. Celant. Some overview on unbiased interpolation and extrapolation designs. arXiv:1403.5113, 2014.
[Dzyadyk and Shevchuk, 2008] Vladislav K. Dzyadyk and Igor A. Shevchuk. Theory of uniform approximation of functions by polyno-mials. Walter de Gruyter GmbH & Co. KG, Berlin, 2008. Translated from the Russian by Dmitry V. Malyshev, Peter V. Malyshev and Vladimir V. Gorunovich.
[Hoel, 1966] Paul G. Hoel. A simple solution for optimal Chebyshev regression extrapolation. Ann. Math. Statist., 37:720–725, 1966. [Karlin and Studden, 1966] Samuel Karlin and William J. Studden.
Tchebyche¤ systems: With applications in analysis and statistics. Pure and Applied Mathematics, Vol. XV. Interscience Publishers John Wiley & Sons, New York-London-Sydney, 1966.
[Kiefer and Wolfowitz, 1965] J. Kiefer and J. Wolfowitz. On a theorem of Hoel and Levine on extrapolation designs. Ann. Math. Statist., 36:1627–1655, 1965.
[Pukelsheim, 1993] Friedrich Pukelsheim. Optimal design of experi-ments. Wiley Series in Probability and Mathematical Statistics: Prob-ability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1993. A Wiley-Interscience Publication.