Optimal extrapolation design for the Chebyshev regression

(1)

HAL Id: hal-00984010

https://hal.archives-ouvertes.fr/hal-00984010

Submitted on 26 Apr 2014

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Optimal extrapolation design for the Chebyshev

regression

Michel Broniatowski, Giorgio Celant

To cite this version:

Michel Broniatowski, Giorgio Celant. Optimal extrapolation design for the Chebyshev regression. Annales de l’ISUP, Publications de l’Institut de Statistique de l’Université de Paris, 2015, 59 (3), pp.3-22. �hal-00984010�

(2)

Optimal extrapolation design for

the Chebyshev regression

Michel Broniatowski

(1)

, Giorgio Celant

(2; )

(1)

_{LSTA, Université Pierre et Marie Curie, Paris, France}

(2)

_{Dipartmento di Scienze Statistiche, Università degli}

Studi di Padova, Italy

( )

_{Corresponding author: [email protected]}

Abstract

This paper introduces optiomal designs in the context of a regression model when the regression function is assumed to be generated by a Chebyshev system of functions. The criterion for optimality is the variance of a Gauss Markov estimator for an extrapolated value.

Key words: Chebyshev system; optimal design; extrapolation de-sign; Borel-Chebyshev Theorem

1 Introduction

This paper deals with a natural extension of the Hoel Levine optimal extrapolation design, as described in [Hoel, 1966]. We recall that this classical design results as a consequence of the following fact.

A design is de…ned as a discrete probability measure on a set of measurements points x0; ::; xg 1 which for notational convenience belong

to the observable environmental set [ 1; 1] , denoting ni=n := (xi) the

frequency of replications of the experiment to be performed at point xi;

0 i g 1, where the ni’s satisfy n0+ :: + ng 1 = n: The points xi’s

are the nodes of the design, and (xi) is the so-called frequency of the

design at node xi : Recall that the model writes

Y (x) = f (x) + (x)

for x in [ 1; 1] , the real valued function f is unknown but belongs to a speci…ed class of functions, and the random variable (x) is cen-tered, with a …nite variance, in the present context. Observations are performed under the design, with the constraint

(3)

on the global budget of the design. Replications of the ni measurements

Yj(xi) , 1 j ni are independent. Independence also holds from node

to node, which is to assume that all measurements errors due to the r.v.’s (x) are independent. The model is supposed to be homoscedasticity; hence the variance of (x) may not depend on x:

For a given c not in [ 1; 1] consider an estimate of f (c) with smallest variance among all unbiased estimators of f (c) which are linear functions of the observations Yj(xi) , 1 j ni;,0 i g 1; hence under a

given design : An optimal design achieves the minimal variance among all such designs. This design is achieved by the Hoel Levine design when the function f is assumed to belong to the class of all polynomials de…ned on R with degree less or equal g 1, hence to the span of the class of monomials f1; x; ::; xg 1_{g :}

The main mathematical argument in order to obtain the Hoel Levine design lies in the solution of the following basic question: …nd a poly-nomial with equioscillations in g + 1 points in [ 1; 1] which assumes maximal absolute values all equal to 1 at those points. Up to a mul-tiplicative constant such a polynomial results as the best polynomial approximation of the null function on [ 1; 1] by polynomials with de-gree g 1: Existence and uniqueness of this polynomial follows from the Borel-Chebyshev Theorem. We refer to[Dzyadyk and Shevchuk, 2008] for details and derivation of these results.

The aim is now to provide a larger context for similar questions, assuming that the function f may belong to some other functional class, still in a …nitely generated set of functions.

De…nition 1 The system of functions ('0; :::; 'g 1) in C (R) is a

Cheby-shev (or Haar) system on [ 1; 1] when 1) ('0; :::; 'g 1) are linearly independent

2) Any equation

a0'0(x) + ::: + ag 1'g 1(x) = 0

with (a0; ::; ag 1) 6= (0; ::; 0) has at most g roots in [ 1; 1] :

Denote

V := span f'0; :::; 'g 1g C ([ 1; 1])

the linear space generated by the Chebyshev system ('0; :::; 'g 1) :

Haar Theorem (see [Dzyadyk and Shevchuk, 2008] ) states that the two following assertions are equivalent:

(4)

b) for any f in C ([ 1; 1]) there exists a unique best uniform approx-imation in V:

In the sequel we assume that the system f'0; :::; 'g 1g is a Chebyshev

system in C ([ 1; 1]) and in C ([ 1; c]) with c > 1: This implies that no non null linear combination of the 'i’s may have roots in (1; c] :

We also make use of the following result.The following properties are equivalent

Proposition 2 1) f'0; :::; 'g 1g is a Chebyshev system;

2) for any set of g points (x0; :::; xg 1) in [ 1; 1] such that xi 6= xj,

and for any (y0; :::; yg 1) in Rg, there exists a unique function g in V

such that g (xk) = yk;

3) for any g points (x0; :::; xg 1) in [ 1; 1] such that xi 6= xj, the

determinant := det G; G := 0 B B B B @ '0(x0) : '0(xj) : '0(xg 1) : : : : : 'i(x0) : 'i(xj) : 'i(xg 1) : : :: : : 'g 1(x0) : 'g 1(xj) : 'g 1(xg 1) 1 C C C C A:

does not equal 0:

Proof. Assume 3) holds. With the set of g points (x0; :::; xg 1) in [ 1; 1]

such that xi 6= xj, = 0 i¤ the matrix G is not invertible, which is to

say that the system of equations de…ned through 0 = Pg_i=01aigi(xj) ; j =

0; :::; g 1; admits a solution a := a₀; :::; ag 1 di¤erent from (0; :::; 0)

in Rg_{: De…ne g :=} Pg 1

i=0 aigi , an element in V which is not the

func-tion x ! 0: Since , Pg_i=01a_igi(x) = 0 for x in fx0; :::; xg 1g it follows

that g has g distinct roots in [ 1; 1] : It follows that whenever = 0; f'0; :::; 'g 1g is not a Chebyshev system. It follows that 3) is equivalent

to 1): Now 2) is equivalent to 3): Indeed when G is invertible then for any (y0; :::; yg 1) in Rg 1 the systemP_i=0g 1aigi(xj) = yj; j = 0; :::; g 1,

has a unique solution , which means that there is a unique g in V with g (xj) = yj for all j:

We therefore introduce the basic de…nition De…nition 3 A regression model

Y (x) = f (x) + (x)

is a Chebyshev regression model i¤ f belongs to V := span f'0; :::; 'g 1g

where ('0; :::; 'g 1) is Chebyshev system (or Haar system) of functions

(5)

The following result stands as a generalization of the Borel Chebyshev Theorem and improves on the Haar Theorem

Theorem 4 .(Generalization of Chebyshev-Borel Theorem) Let f'0; :::; 'g 1g

be a Chebyshev system on [ 1; 1] , and g is any function in C ([ 1; 1]) : Then there exists a unique function h in V := span f'0; :::; 'g 1g

de…ned on [ 1; 1] , which achieves sup

x2[ 1;1]

jg(x) h(x)j = inf

f 2V _{x2[ 1;1]}sup jg(x) f (x)j :

Furthermore h is the only function in V such that p := g h attains its unique maximal values in at least g + 1 points in [ 1; 1]; the sign of p on those points alternates.

Proof. See [Achieser, 1992].

Remark 5 The above function h plays a similar role as the function Tg 1 (Chebyshev polynomial of the …rst kind) in the polynomial regression

case; see [Broniatowski and Celant, 2014].

The notation Md ([ 1; 1]) designates the class of all discrete

proba-bility measures with support in [ 1; 1] :

The aim of this paper is to present the contribution of Hoel [Hoel, 1966] to the construction of optimal designs for the extrapolated value of the

regression function as treated by Kiefer and Wolfowitz [Kiefer and Wolfowitz, 1965]. The model and the Gauss Markov estimator are de…ned in the next

Sec-tion. An orthogonalization procedure allows to express the extrapolated value as a parameter in an adequate regression model. Finally the sup-port of the optimal design will be obtained through geometrical argu-ments; the number of replications of the experiments on the nodes will then be deduced.

2 The model and Gauss Markov estimator

We consider a Chebyshev system on [ 1; 1] f'0; :::; 'g 1g :

(6)

For any x 2 [ 1; 1] we assume that we may observe a r.v. y (x), such that, denoting := ( 0; ::; g 1) 0 f (x) := E (Y (x)) = g 1 X j=0 j'j(x) = (X (x)) 0 : (1)

We notice that the function

f : R ! R; x 7! f (x)

is continuous on R: Indeed since the system of the g equations in 8 < : f (x0) =Pg_j=01 j'j(x0) :::::::::::::::::::::::::::::::: f (xg 1) = Pgj=01 j'j(xg 1)

has a unique solution whenever (f (x0) ; :::; f (xg 1)) 0

is known, for any (x0; :::; xg 1)

0

2 [ 1; 1]g with 1 x0 < ::: < xg 1 1; the

function f can be extended on R; this extension is continuous since so are the 'i’s.

Recall that the measurements can be performed only on [ 1; 1], and not for jxj > 1:

2.1 Examples of Chebyshev systems

Here is a short list of classical chebyshev systems. We refer to the classi-cal treaties of [Karlin and Studden, 1966] for a extensive study of those systems and their applications in analysis and in statistics.

a) f'0(x) = 1; '1(x) = x3g is a Chebyshev system on whole R,

b) '0(x) = 1; '1(x) = _x13 is a Chebyshev system on (0; +1),

c) f1; cos x; cos 2x; :::; cos nxgis a Chebyshev system on [0; ),

d)f1; sin x; cos x; sin 2x; cos 2x:::; sin nx; cos nxg is a Chebyshev sys-tem on R 2

e) fsin x; sin 2x; :::; sin nxg is a Chebyshev system on [0; ),

f) f'0(x) = x2 x; '1(x) = x2+ x; '2(x) = x2+ 1gis a Chebyshev system on R; g)fxa0_{; :::; x}an_{; where 0 = a} 0 < ::: < an g is a Chebyshev system on [0; +1), h) fea0x_{; :::; e}anx_{; where 0 = a}

0 < ::: < an gis a Chebyshev system on

R,

i) f1; sinh x; cosh x; :::; sinh nx; :::; cosh nxg is a Chebyshev system on R,

(7)

j) (x + a0) 1; :::; (x + an) 1; where 0 = a0 < ::: < an is a

Cheby-shev system on [0; +1),

k) f1; log x; x; x log x; x2_{; x}2_{log x; :::; x}n_{; x}n_{log xgis a Chebyshev}

sys-tem on (0; 1), .... .

Finally note that being a Chebyshev system is a linear property; indeed if ('0; :::; 'g 1) is a Chebyshev system then any other basis of

span f'0; :::; 'g 1g is a Chebyshev system.

2.2 Description of the dataset coming from the

ex-periment

Given the set of nodes 1 x0 < ::: < xg 1 1, the experiment is

described through the following measurements 8 > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > : Y1(x0) =Pgj=01 j'j(x0) + "1(x0) :::::::::::::::::::::::::::::::::::: Yn0(x0) = Pg 1 j=0 j'j(x0) + "n1(x0) :::::::::::::::::::::::::::::::::::: Y1(xi) =Pg_j=01 j'j(xi) + "1(xi) :::::::::::::::::::::::::::::::::::: Yni(xi) = Pg 1 j=0 j'j(xi) + "ni(xi) :::::::::::::::::::::::::::::::::::: Y1(xg 1) = Pg_j=01 j'j(xg 1) + "1(xg 1) :::::::::::::::::::::::::::::::::::: Yng 1(xg 1) = Pg 1 j=0 j'j(xg 1) + "ng 1(xg 1)

or through the more synthetic form, with

Y (xi) := Y1(xi) ; :::; Ynj(xi) 0 , X (xi) := ('0(xi) ; :::; 'g 1(xi)) 0 , " (xi) := ("1(xi) ; :::; "ni(xi)) 0 , Y (xi) = 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A 0 @ :0 g 1 1 A+" (xi) , i = 0; :::; g 1. Denote, Xi := 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A :

The matrix Xi has ni lines and g columns. All lines of X equal

(X (xi)) 0

(8)

Denote

H := Im X (x) := fX (x) 2 Rg : x 2 [ 1; 1]g : (2) The set H is called the regression range.

It may be at time convenient to attribute distinct indices to the same xj when repeated nj times.

The discrete measure de…ned through

z }| {

x0; :::; x0 (n0 times); :::;z }| {xj; :::; xj(nj times ); :::;zxg 1; :::; x}| g {1(ng 1 times)

with

n0+ ::: + ng 1 = n

will hence be written as

t1; :::; tn (3)

with t1 = t2 = :: = tn0 = x0; :::; tn0+::+ng 2+1 = tn0+::+ng 2+2 = ::: =

tn0+::+ng 2+ng 1 = xg 1; hence t1; :::; tn0 indicates the same point x0

re-peated n0 times, etc.

The system which describes the n observations writes therefore as Y = C + " where Y := 0 @ Y:1 Yn 1 A C := 0 B B B B @ '0(t1) ::: 'g 1(t1) :: ::: ::: '0(ti) ::: 'g 1(ti) ::: :: ::: '0(tn) ::: 'g 1(tn) 1 C C C C A, := 0 @ :0 g 1 1 A , " := 0 @ ":1 "n 1 A ; E (Y) = C , var (") = 2In;

and In is the Identity matrix of order n:

The Gauss Markov estimator of f (x) = E (y (x)) is the solution of the linear system

(9)

Xi0Xi = X 0 iY (xi) , i = 0; :::; g 1: It holds X0 i Xi = 0 @ ':::0(xi) ::: '::: :::0(xi) 'g 1(xi) ::: 'g 1(xi) 1 A 0 @ ':::0(xi) ::: '::: :::g 1(xi) '0(xi) ::: 'g 1(xi) 1 A = niMi where Mi := 0 B B B B B B @ ('0(xi))2 ::: '0(xi) 'k(xi) ::: '0(xi) 'g 1(xi) :::::: :::: :::: :::: ::::: 'h(xi) '0(xi) .::: 'h(xi) 'k(xi) .::: 'h(xi) 'g 1(xi) ::::: .::: .::: .::: .::: 'g 2(xi) '0(xi) ::::: 'g 2(xi) 'k(xi) ::::: 'g 2(xi) 'g 1(xi) 'g 1(xi) '0(xi) ::::: 'g 1(xi) 'k(xi) ::::: ('g 1(xi))2 1 C C C C C C A . We have Mi = X (xi) X 0 (xi) : In X (xi) X 0 (xi) = X 0 i Y (xi) , i = 0; :::; g 1

sum both sides with respect to i to obtain

g 1 X i=0 XiXi0 = g 1 X i=0 X0 i Y (xi) : Therefore n g 1 X i=0 ni nMi ! = g 1 X i=0 X0 i Y (xi) . Denote i := (x) := ni n if x = xi 0 if x =2 fx0; :::; xg 1g : The matrix M ( ) := g 1 X i=0 ni nMi = g 1 X i=0 iMi (4)

(10)

is the moment matrix of the measure . By de…nition supp( ) = fx0; :::; xg 1g . Since Mi = X (xi) X 0 (xi) we may write M ( ) = g 1 X i=0 iMi = g 1 X i=0 i X (xi) X 0 (xi) = Z [ 1;1] X (x) X0(x) d (x) .

Speci…c study of this matrix is needed for the estimation of linear forms of the coe¢cients i’s. This area has been developed by Elfving

see e.g. [Pukelsheim, 1993]), out of the scope of the present paper.

3 An expression of the extrapolated value through

an orthogonalization procedure

We will consider an alternative way, developed by Kiefer and Wolfowitz [Kiefer and Wolfowitz, 1965] as follows. It has the main advantage that up to a coe¢cient g 1 which depends on the values of f on the x0js,

the estimate of f (c) is 'g 1(c): It follows that only the coe¢cient g 1

has to be estimated, a clear advantage. Recall that c does not belong to [ 1; 1] :

It is more convenient, at this stage, to introduce the following no-tation. It will be assumed that n measurements of Y are performed, namely

Y (t1); ::; Y (tn)

where the t0

is belong to [ 1; 1] : The points of measurement t1; ::; tnmight

be distinct or not, as de…ned in (3). Obviously when de…ning the optimal design with nodes x0; ::; xg 1, then nj values of the t0is coincide on xj for

0 j g 1: In order to de…ne the estimator, and not the design, it is however more convenient to di¤erentiate between all the measurements Y (ti); 1 i n: This allows to inherit from the classical geometric least

square formalism.

We consider the basis of V de…ned as follows: Set for all j between 0 and g 2

hj(x) := 'j(x)

'j(c)

'g 1(c)

(11)

and

hg 1(x) := 'g 1(x):

Clearly (h0; ::; hg 1) generate V . Also (h0; ::; hg 1) is a Chebyshev system

on [ 1; c] :

Denote ( 0; ::; g 1) the coordinates of f on (h0; ::; hg 1) , namely

f (x) =

g 1

X

j=0

jhj(x):

We evaluate the coe¢cients j with respect to the k0s de…ned in (1). It

holds j := j for j = 0; :::; g 2 and g 1 := Pg 1 j=0 j'j(c) 'g 1(c)

assuming 'g 1(c) 6= 0; and obviously we have

f (x) = g 1 X j=0 jhj(x) = g 1 X j=0 j'j(x) : In x = c we get f (c) := g 1 X j=0 jhj(c) = g 1 X j=0 j'j(c) :

By the de…nition of g 1 we have

g 1 :=

Pg 1

j=0 j'j(c)

'g 1(c)

and therefore we have proved Lemma 6

f (c) = g 1'g 1(c) : (6)

4 The Gauss Markov estimator of the extrapolated

value

It holds f (c) = g 1 X i=1 i'i(c)

(12)

where the i’s are de…ned through g equations of the form f (xj) = g 1 X i=0 i'i(xj)

with 1 xj 1 for all 0 j g 1:

Replace f (xj) by its estimate

[ f (xj) := 1 nj nj X i=1 Yi(xj):

Under the present model, [f (xj) is an unbiased estimate of f (xj):

Deter-mine bi though the system de…ned by

[ f (xj) = g 1 X i=0 b_i'i(xj):

The resulting bi’s are unbiased and so is

d f (c) = g 1 X i=0 b_i'i(c):

The natural optimality criterion associated to this procedure is the vari-ance of the estimate df (c) which depends on the location of the nodes and on the weights nj’s.

We now write the above Gauss Markov estimator of f (c) on the new basis (h0; ::; hg 1) : Substituting the function f by its expansion on the

basis (h0; ::; hg 1) the model write as

8 > > > > < > > > > : Y (t1) = 0h0(t1) + ::: + g 2hg 2(t1) + g 1'g 1(t1) + "1 ::::::::::::::::::::::::::::::::::::::::::::::: Y (ti) = 0h0(ti) + ::: + g 2hg 2(ti) + g 1'g 1(ti) + "i ::::::::::::::::::::::::::::::::::::::::::::::::: Y (tn) = 0h0(tn) + ::: + g 2hg 2(tn) + g 1'g 1(tn) + "n : because of (5), Y (t) = T + " where t := (t1; ::; tn) 0

(13)

T := 0 B B B B @ h0(t1) : : hg 2(t1) 'g 1(t1) : : : : : h0(ti) : : hg 2(ti) 'g 1(ti) : : : : : h0(tn) : : hg 2(tn) 'g 1(tn) 1 C C C C A; := 0 B B @ 0 : g 2 g 1 1 C C A ; " := 0 @":1 "n 1 A :

Recall that we intend to estimate g 1: We make a further change

of the basis of V: We introduce a vector Gg 1, which together with

h0; :::; hg 2 will produce a basis (h0; :::; hg 2; Gg 1) for which the

vec-tor Gg 1 is orthogonal to any of the hj, 0 j g 2: The aim of this

construction is to express f (c) as a linear combination of the components of Gg 1: Since Gg 1 belongs to V = span (h0; ::; hg 1) we write

Gg 1(ti) := hg 1(ti) g 2

X

j=0

jhj(ti)

for some vector := ( 0; ::; g 1) 0

: We impose the following condition

*0 @Gg 1:(t1) Gg 1(tn) 1 A ; 0 @hj(t: 1) hj(tn) 1 A + = 0; for all j = 0; :::; g 2 ;

where the above symbol <; > is the inner product in Rn_:The

j’s in R

are to be chosen now. The linear system

n

X

i=1

G (ti) hj(ti) = 0; for j = 0; :::; g 2

with g 1 equations has g 1 unknown variables j .

Once obtained the solution j; j = 0; :::; g 2 ; and since

hg 1(t) = Gg 1(t) + g 2

X

j=0

jhj(t) ;

we may write f (t) for any t

f (t) = g 1 X j=0 jhj(t) = 0h0(t) + ::: + g 2hg 2(t) + g 1Gg 1(t) + g 1 0h0(t) + ::: + g 1 g 2hg 2(t) = ( 0+ g 1 0) h0(t) + ::: + ( g 2+ g 1 g 2) hg 2(t) + g 1Gg 1(t) = 0h0(t) + ::: + g 2hg 2(t) + g 1Gg 1(t) ;

(14)

where the 0

js are de…ned by

j := j + g 1 j for j = 0; :::; g 2 g 1 for j = g 1 :

The point is that g 1 appears as the coe¢cient of Gg 1 , namely the

last term in the regression of f (t) on the regressors (h0; ::; hg 2; Gg 1) :

Furthermore Gg 1 is orthogonal to the other regressors. The system

which describes the data is now written by Y (t) = eT e + " where e T := 0 B B B B @ h0(t1) : : hg 2(t1) Gg 1(t1) : : : : : h0(ti) : : hg 2(ti) Gg 1(ti) : : : : : h0(tn) : : hg 2(tn) Gg 1(tn) 1 C C C C A; e := 0 B B @ 0 : g 2 g 1 1 C C A :

The minimum least square estimation of g 1 is obtained through

the normal equations imposing

Y (t) T be e 2 V?

where be hence designates the least square estimator of the vector of coe¢cients e; and where V?

is the orthogonal linear space of V:

We have, denoting _dg 1 the least square estimator of g 1; and noting

that V = span fh0; :::; hg 2; Gg 1g *0 B B @ Y (t1) Pgj=02 jhj(t1) dg 1Gg 1(t1) : : Y (tn) Pg 2 j=0 jhj(tn) dg 1Gg 1(tn) 1 C C A 0 ; 0 B B @ hj(t1) : : hj(tn) 1 C C A + = 0; for j = 0; :::g 2 and *0 B B @ Y (t1) Pgj=02 jhj(t1) dg 1Gg 1(t1) : : Y (tn) Pg_j=02 jhj(tn) dg 1Gg 1(tn) 1 C C A 0 ; 0 B B @ Gg 1(t1) : : Gg 1(tn) 1 C C A + = 0 Hence

(15)

X i=1;::;n Y (ti) g 2 X j=0 jhj(ti) dg 1Gg 1(ti) ! Gg 1(ti) = 0: (7)

Inserting the orthogonality condition

n X i=1 G (ti) hj(ti) = 0; for j = 0; :::; g 2; in (7) we have n X j=1 Y (tj) Gg 1(tj) dg 1 n X j=1 G2g 1(tj) = 0; and dg 1 = Pn j=1Y (tj) Gg 1(tj) Pn j=1G2g 1(tj) :

Finally we obtain the explicit form of the estimator of f (c): It holds Proposition 7 The least square estimator (Gauss Markov) of the ex-trapolated value f (c) is d f (c) = 'g 1(c) dg 1 = 'g 1(c) Pn j=1Y (tj) Gg 1(tj) Pn j=1G2g 1(tj) :

5 The Optimal extrapolation design for the

Cheby-shev regression

5.1 The support of the optimal design

We determine the support of the optimal design for the extrapolation of f at point c:

Recall that a design is optimal if and only if it produces a Gauss Markov estimator of f (c) with minimal variance among all such estima-tors built upon other designs.

(16)

var f (c) = ('d g 1(c))2 Pn j=1var (Y (tj)) G2g 1(tj) Pn j=1G2g 1(tj) 2 = ( 'g 1(c)) 2 Pn j=1G2g 1(tj) :

The design is de…ned through a discrete probability measure 2 Md

([ 1; 1]) with support (x0; ::; xg 1) with (xj) := nj=n and nj equals the

number of the t0

is which equal xj, for 0 j g 1:

We now determine the support of the optimal design denoted .

:= arg min 2Md([ 1;1]) 1 Pg 1 j=0njG2g 1(xj) = arg max 2M X g 1 X i=0 njG2g 1(xj) = arg max 2Md([ 1;1]) g 1 X i=0 nj hg 1(xi) g 2 X j=0 jhj(xi) !2 :

The solution can be obtained in a simple way through some analysis of the objective function. By convenience in order to use simple geo-metric arguments and to simplify the resulting expressions it is more convenient to write the derivation of the optimal design in terms of the t0 is: The function n X i=1 hg 1(ti) g 2 X j=0 jhj(ti) !2 = 0 @ hg 1(t1) Pg_j=02 jhj(t1) : hg 1(tn) Pg 2 j=0 jhj(tn) 1 A 2

is the distance from the orthogonal projection of the vector h:= hg 1(t1) ::: hg 1(tn)

0

on the linear space V generated by the family fh0; :::; hg 2; Gg 1g :

Therefore by the minimal projection property

n X i=1 hg 1(ti) g 2 X j=0 jhj(ti) !2 = min 2V dist (h; ) : Let := 0::: g 2 0 :

(17)

The optimal design is obtained through a two steps procedure. Fix the frequencies n0; ::; ng 1 with sum n and determine the discrete

mea-sure on [ 1; 1] which minimizes var f (c) among alld 0

s with sup-port x := (x0; ::; xg 1) and masses (xj) = nj=n; 0 j g 1. The

optimization is performed upon the x0 js:

The optimal design solves therefore the problem

= arg max 2M_d([ 1;1])min 2V dist (h; ) = arg max x2[ 1;1]g 2Rming 1 g 1 X i=0 ni hg 1(xi) g 2 X j=0 jhj(xi) !2 = arg max 2M_d([ 1;1]) 2Rming 1 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) : The integrand hg 1(x) Pgj=02 jhj(x) 2

is always non negative. Henceforth it is enough to minimize its square root w.r.t. x. This optimization turns therefore to be independent of the n0

js:

Denote _j; j = 0; :::; g 2; the values which minimize dist (h; ) w.r.t.

j. The optimality condition writes

max x2[ 1;1] hg 1(x) g 2 X j=0 jhj(x) = min 2Rg 1_{x2[ 1;1]}max hg 1(x) g 2 X j=0 jhj(x) (8) = min p2Wx2[ 1;1]max jhg 1(x) p (x)j where W := span fh0; ::; hg 2g : (9)

If we prove that fh0; :::; hg 2g is a Chebyshev system on [ 1; 1] ; then

clearly the support of the optimal measure consists in the points of maximal value in [ 1; 1] for the function

jhg 1(x) p (x)j

where p is the best uniform approximating polynomial of hg 1 in W:

Indeed the support of consists in the set of points where

hg 1(x) g 2

X

j=0

(18)

in (8) attains its maximal value for p = p the best uniform approxima-tion of hg 1 in W:

This is the major argument of the present derivation, which justi…es all of the uniform approximation theory in this context.

De…nition 8 The vector in Rg 1 _{is a Chebyshev vector i¤ it}

desig-nates the vector of the coe¢cients of p , where p is the best uniform approximating polynomial of hg 1 in W de…ned in (9). It is de…ned

through (8). Now writing

:= ₀; :::; g 2 0

we de…ne the set of all points_{ex in [ 1; 1] where the distance between} hg 1 and its best approximation on the hk, 0 k g 2 is minimal.

These points are precisely the support of the optimal design . Formally we de…ne E := min 2Rg 1_{x2[ 1;1]}max hg 1(x) g 2 X j=0 jhj(x) (10) and B ( ) := ( ex 2 [ 1; 1] : hg 1(ex) g 2 X j=0 jhj(ex) = E ) : (11)

It holds (see Proposition 10 below) (B ( )) = 1:

We prove that fh0; :::; hg 2g is a Chebyshev system on [ 1; 1] :

Proposition 9 (Hoel) The functions h0; :::; hg 2 are a Chebyshev

sys-tem on [ 1; 1] :

Proof. For any choice of fx0; :::; xgg in [ 1; 1], with x0 < ::: < xg 1;

since the family f'0; :::; 'g 1g is a Chebyshev system on [ 1; 1], we have,

by Proposition 2, assuming a positive sign of the determinant, without loss of generality

(19)

0 < det 0 B B B B @ '0(x0) '0(x1) : '0(xg 2) '0(c) '1(x0) '1(x1) : '1(xg 2) '1(c) : : : : : 'g 2(x0) 'g 2(x1) : 'g 2(xg 2) 'g 2(c) 'g 1(x0) 'g 1(x1) : 'g 1(xg 2) 'g 1(c) 1 C C C C A:

For j = 0; :::; g 1;the operations

'j(xi) 7! 'j(xi) 'j(c)

'g(xi)

'g(c)

do not change the value of the determinant. Hence, 0 < det 0 B B B B @ '0(x0) '0(x1) : '0(xg 2) '0(c) '1(x0) '1(x1) : '1(xg 2) '1(c) : : : : : 'g 2(x0) 'g 2(x1) : 'g 2(xg 2) 'g 2(c) 'g 1(x0) 'g 1(x1) : 'g 1(xg 2) 'g 1(c) 1 C C C C A = det 0 B B B B @ h0(x0) h0(x1) : h0(xg 2) '0(c) h1(x0) h1(x1) : h1(xg 2) '1(c) : : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 'g 2(c) 0 0 : 0 'g 1(c) 1 C C C C A:

By the Laplace Theorem pertainig to determinants, we get

0 < det 0 B B B B @ h0(x0) h0(x1) : h0(xg 2) '0(c) h1(x0) h1(x1) : h1(xg 2) '1(c) : : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 'g 2(c) 0 0 : 0 'g 1(c) 1 C C C C A = 'g 1(c) det 0 B B @ h0(x0) h0(x1) : h0(xg 2) h1(x0) h1(x1) : h1(xg 2) : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 1 C C A := 'g 1(c) :

Therefore the two real numbers 'g 1(c) and have same sign.

Since 'g 1(c) 6= 0 we deduce that

det 0 B B @ h0(x0) h0(x1) : h0(xg 2) h1(x0) h1(x1) : h1(xg 2) : : : : hg 2(x0) hg 2(x1) : hg 2(xg 2) 1 C C A 6= 0:

(20)

Hence the family f'0; :::; 'g 1g is a Chebyshev system in C ([ 1; 1]) :

In the same way we can prove that it is a Chebyshev system in [ 1; c] :

5.2 The frequencies of the optimal design

Once characterized the points x in supp ; we characterize the values of the (x) ’s:The following Proposition produces a su¢cient condition in order that the measure be optimal, which can be phrased as

min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx)

for any in Md ([ 1; 1]) : Uniqueness might not hold.

Proposition 10 (Kiefer-Wolfowitz) Let B ( ) be de…ned as in (11). If is Chebyshev vector and (B ( )) = 1 and if

Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0; for i = 0; :::; g 2; then is optimal.

Proof. Let 2 Md ([ 1; 1]) with (B ( )) = 1: The hypothesis

Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0;

for i = 0; :::; g 2; indicates that

hg 1(xi) g 2

X

j=0

jhj(xi)

is orthogonal to the linear space W generated by fh0; :::; hg 2g : Thus

Pg 2

j=0 jhj is the orthogonal projection of hg 1 on W: The inner product

is

< v; w >:= Z

[ 1;1]

v(x) w (x) (dx) : By the minimal projection property

(21)

A ( ) := min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = X e x2supp hg 1(ex) g 2 X j=0 jhj(ex) !2 (_ex) E2 X e x2supp (ex) = E2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) ; max v2Md([ 1;1]) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) = min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) =: A ( )

The measure v which appears in lines 5 and followings in the above displays are arbitrary measures in Md([ 1; 1]) :

Since by de…nition := arg max v2Md([ 1;1]) min 2Rg 2 Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) !2 (dx) (12) i.e. A ( ) A ( ) : Hence A ( ) = A ( ) :

6 Identi…cation of the optimal design

In this Section we provide an explicit solution for the optimal design and prove its uniqueness.

(22)

By the Borel-Chebyshev Theorem 4 there exist at least g points x0 < ::: < xg 1

in [ 1; 1] on which the best uniform approximation of hg 1; namely

Pg 2

j=0 jhj; satis…es the following conditions

hg 1(xi) g 2 X j=0 jhj(xi) = ( 1) i E:

We now see that there are exactly g points on which the function hg 1 Pgj=02 jhj equals E:

Since fh0; :::; hg 1g is a Chebyshev system the linear combination g 1

X

i=0

aihi

cannot have more than g roots in [ 1; 1] : Hence the function

hg 1 g 2

X

j=0 jhj

which is the absolute value of a linear combination of the Cheby-shev system fh0; :::; hg 1g cannot have more than g roots. Therefore

hg 1 Pg_j=02 jhj cannot have more than g + 1 maximal values.

As seen previously the support of the optimal measure consists in the points of maximal value in [ 1; 1] for the function

hg 1 g 2

X

j=0

jhj :

Applying the Borel-Chebyshev Theorem we now determine the sup-port of :

Since E is known the support is the vector x1; :::; xg which solves

the linear system

hg 1(xi) g 2 X j=0 jhj(xi) = ( 1) i E; i = 1; :::; g:

We apply the su¢cient condition provided by Kiefer and Wolfowitz above, Proposition 10.

(23)

This condition states that the values (xi) ; i = 0; :::; g 1; satisfy the system (Pg 1 i=0 hg 1(xi) Pg_j=02 jhj(xi) hr(xi) (xi) = 0 r = 0; :::; g 2 : In the xi’s it holds E = hg 1(xi) g 2 X j=0 jhj(xi) and Z [ 1;1] hg 1(x) g 2 X j=0 jhj(x) ! hi(x) (dx) = 0; for i = 0; :::; g 2: Therefore 0 = g 1 X i=0 hg 1(xi) g 2 X j=0 jhj(xi) ! hr(xi) (xi) = E g 1 X i=0 ( 1)ihr(xi) (xi) , for r = 0; :::; g 2:

The optimal extrapolation design f(xi; (xi)) : i = 0; :::g 1g thus

solves 8 > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > : hg 1(x0) Pg_j=02 jhj(x0) = +E ::::::::::::::::::::::::::::::::::::::::::::::: hg 1(xi) Pg 2 j=0 jhj(xi) = ( 1) i E :::::::::::::::::::::::::::::::::::::::::::::: hg 1(xg 1) Pj=0g 2 jhj(xg 1) = ( 1)g 1E Pg 1 i=0 ( 1) i h0(xi) (xi) = 0 ::::::::::::::::::::::::::::::::::::: Pg 1 i=0 ( 1) i hr(xi) (xi) = 0 :::::::::::::::::::::::::::::::::: Pg 1 i=0 ( 1) i hg 2(xi) (xi) = 0

In practice we …rst evaluate _j for 0 j g 2 through (8). Note that E is known by (10). The above system consits in 2g 1 equations in the 2g unknown quantities f(xi; (xi)) : i = 0; :::g 1g : Add the constraint

(24)

to obtain a linear system with a unique solution.

The …rst g equations determine the nodes, by Borel Chebyshev The-orem. The last g 1 ones determine the values of the n0

js by the

Proposi-tion of Kiefer and Wolfowitz 10. Hence there is a unique optimal design solving the minimal variance problem for the extrapolation.

References

[Achieser, 1992] N. I. Achieser. Theory of approximation. Dover Publi-cations, Inc., New York, 1992. Translated from the Russian and with a preface by Charles J. Hyman, Reprint of the 1956 English translation. [Broniatowski and Celant, 2014] M. Broniatowski and G. Celant. Some overview on unbiased interpolation and extrapolation designs. arXiv:1403.5113, 2014.

[Dzyadyk and Shevchuk, 2008] Vladislav K. Dzyadyk and Igor A. Shevchuk. Theory of uniform approximation of functions by polyno-mials. Walter de Gruyter GmbH & Co. KG, Berlin, 2008. Translated from the Russian by Dmitry V. Malyshev, Peter V. Malyshev and Vladimir V. Gorunovich.

[Hoel, 1966] Paul G. Hoel. A simple solution for optimal Chebyshev regression extrapolation. Ann. Math. Statist., 37:720–725, 1966. [Karlin and Studden, 1966] Samuel Karlin and William J. Studden.

Tchebyche¤ systems: With applications in analysis and statistics. Pure and Applied Mathematics, Vol. XV. Interscience Publishers John Wiley & Sons, New York-London-Sydney, 1966.

[Kiefer and Wolfowitz, 1965] J. Kiefer and J. Wolfowitz. On a theorem of Hoel and Levine on extrapolation designs. Ann. Math. Statist., 36:1627–1655, 1965.

[Pukelsheim, 1993] Friedrich Pukelsheim. Optimal design of experi-ments. Wiley Series in Probability and Mathematical Statistics: Prob-ability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1993. A Wiley-Interscience Publication.