• Aucun résultat trouvé

Classification and regression based on derivatives : a consistency result

N/A
N/A
Protected

Academic year: 2021

Partager "Classification and regression based on derivatives : a consistency result"

Copied!
71
0
0

Texte intégral

(1)

HAL Id: hal-00668212

https://hal.archives-ouvertes.fr/hal-00668212

Submitted on 9 Feb 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Classification and regression based on derivatives : a

consistency result

Nathalie Villa-Vialaneix, Fabrice Rossi

To cite this version:

Nathalie Villa-Vialaneix, Fabrice Rossi. Classification and regression based on derivatives : a consis-tency result. II Simposio sobre Modelamiento Estadístico, Dec 2010, Valparaiso, Chile. �hal-00668212�

(2)

Classification and regression based on derivatives: a consistency result

Nathalie Villa-Vialaneix (Joint work with Fabrice Rossi)

http://www.nathalievilla.org

II Simposio sobre Modelamiento Estadístico

Valparaiso, December, 3rd

1 / 30 Nathalie Villa-Vialaneix N

(3)

Introduction and motivations

Outline

1 Introduction and motivations

2 A general consistency result

(4)

Introduction and motivations

Regression and classification from an

infinite dimensional predictor

Settings

(X,Y)is a random pair of variables where

Y ∈ {−1, 1} (binary classification problem) or Y ∈ R

3 / 30 Nathalie Villa-Vialaneix N

(5)

Introduction and motivations

Regression and classification from an

infinite dimensional predictor

Settings

(X,Y)is a random pair of variables where

Y ∈ {−1, 1} (binary classification problem) or Y ∈ R X ∈(X, h., .iX), an infinite dimensional Hilbert space.

(6)

Introduction and motivations

Regression and classification from an

infinite dimensional predictor

Settings

(X,Y)is a random pair of variables where

Y ∈ {−1, 1} (binary classification problem) or Y ∈ R X ∈(X, h., .iX), an infinite dimensional Hilbert space.

We are given alearning setSn = {(Xi,Yi)}ni=1of n i.i.d. copies

of(X,Y).

3 / 30 Nathalie Villa-Vialaneix N

(7)

Introduction and motivations

Regression and classification from an

infinite dimensional predictor

Settings

(X,Y)is a random pair of variables where

Y ∈ {−1, 1} (binary classification problem) or Y ∈ R X ∈(X, h., .iX), an infinite dimensional Hilbert space.

We are given alearning setSn = {(Xi,Yi)}ni=1of n i.i.d. copies

of(X,Y).

(8)

Introduction and motivations

Regression and classification from an

infinite dimensional predictor

Settings

(X,Y)is a random pair of variables where

Y ∈ {−1, 1} (binary classification problem) or Y ∈ R X ∈(X, h., .iX), an infinite dimensional Hilbert space.

We are given alearning setSn = {(Xi,Yi)}ni=1of n i.i.d. copies

of(X,Y).

Purpose: Findφn : X → {−1,1} or R, that is universally consistent:

Regression case: limn→+∞E

 [φn(X) −Y]2  =L∗where L∗=infφ:X→RE 

[φ(X) −Y]2will also be called the Bayes risk.

3 / 30 Nathalie Villa-Vialaneix N

(9)

Introduction and motivations

(10)

Introduction and motivations

Using derivatives

Practically, X(m)is often more relevant than X for the prediction.

5 / 30 Nathalie Villa-Vialaneix N

(11)

Introduction and motivations

Using derivatives

(12)

Introduction and motivations

Using derivatives

Practically, X(m)is often more relevant than X for the prediction. ButX →X(m)inducesinformation lossand

inf φ:DmX→{−1,1} Pφ(X(m)) ,Y≥ inf φ:X→{−1,1} P(φ(X) ,Y) =L∗ and inf φ:DmX→R Ehφ(X(m)) −Yi2  ≥ inf φ:X→R P[φ(X) −Y]2=L. 5 / 30 Nathalie Villa-Vialaneix N

(13)

Introduction and motivations

Sampled functions

Practically,(Xi)iare not perfectly known; only a discrete sampling

is given:Xτd i = (Xi(t))t∈τd whereτd = {t τd 1 , . . . ,t τd |τd|}.

(14)

Introduction and motivations

Sampled functions

Practically,(Xi)iare not perfectly known; only a discrete sampling

is given:Xτd i = (Xi(t))t∈τd whereτd = {t τd 1 , . . . ,t τd |τd|}.

The sampling can be non uniform...

6 / 30 Nathalie Villa-Vialaneix N

(15)

Introduction and motivations

Sampled functions

Practically,(Xi)iare not perfectly known; only a discrete sampling

is given:Xτd i = (Xi(t))t∈τd whereτd = {t τd 1 , . . . ,t τd |τd|}.

(16)

Introduction and motivations

Sampled functions

Practically,(Xi)iare not perfectly known; only a discrete sampling

is given:Xτd i = (Xi(t))t∈τd whereτd = {t τd 1 , . . . ,t τd |τd|}. Then, Xi(m)isestimatedfromXτd

i , bybX (m)

τd , which also induces information loss: inf φ:DmX→{−1,1} P  φ(bXτ(m)d ) ,Y  ≥ inf φ:DmX→{−1,1} Pφ(X(m)) ,Y≥L∗ and inf φ:DmX→R E  φ(bXτ(m)d ) −Y 2! ≥ inf φ:DmX→R Ehφ(X(m)) −Yi2  ≥L∗. 6 / 30 Nathalie Villa-Vialaneix N

(17)

Introduction and motivations

Purpose of the presentation

Find a classifier or a regression functionφn,τd built frombX (m)

τd such

that the risk ofφn,τd asymptotically reachesthe Bayes risk L ∗: lim |τd|→+∞ lim n→+∞ P  φn,τd(bX (m) τd ) ,Y  =L∗ or lim |τd|→+∞ lim n→+∞E  φn,τd(bXτ(m)d ) −Y 2! =L∗

(18)

Introduction and motivations

Purpose of the presentation

Find a classifier or a regression functionφn,τd built frombX (m)

τd such

that the risk ofφn,τd asymptotically reachesthe Bayes risk L ∗: lim |τd|→+∞ lim n→+∞ P  φn,τd(bX (m) τd ) ,Y  =L∗ or lim |τd|→+∞ lim n→+∞E  φn,τd(bXτ(m)d ) −Y 2! =L∗

Main idea: Use a relevant way to estimate X(m)fromXτd (by smoothing splines) and combine the consistency of splines with the consistency of aR|τd|-classifier or regression function.

7 / 30 Nathalie Villa-Vialaneix N

(19)

A general consistency result

Outline

1 Introduction and motivations

2 A general consistency result

(20)

A general consistency result

Basics about smoothing splines I

Suppose thatXis the Sobolev space

Hm =nh∈L[0,1]2 |∀j=1, . . . ,m,Djhexists (weak sense) andDmh∈L2o

9 / 30 Nathalie Villa-Vialaneix N

(21)

A general consistency result

Basics about smoothing splines I

Suppose thatXis the Sobolev space

Hm =nh∈L[0,1]2 |∀j=1, . . . ,m,Djhexists (weak sense) andDmh∈L2o equipped with the scalar product

hu,viHm = hDmu,DmviL2+ m

X

j=1

BjuBjv

(22)

A general consistency result

Basics about smoothing splines I

Suppose thatXis the Sobolev space

Hm =nh∈L[0,1]2 |∀j=1, . . . ,m,Djhexists (weak sense) andDmh∈L2o equipped with the scalar product

hu,viHm = hDmu,DmviL2+ m

X

j=1

BjuBjv

where B are m boundary conditions such thatKerB∩ Pm−1= {0}.

(Hm, h., .i

Hm)is a RKHS:∃k0 : Pm−1× Pm−1→ Rand

k1 : KerB× KerB → Rsuch that

∀u∈ Pm−1, t ∈[0,1], hu,k0(t, .)iHm =u(t) and

∀u∈ KerB, t ∈[0,1], hu,k1(t, .)iHm =u(t)

See[Berlinet and Thomas-Agnan, 2004]for further details.Nathalie Villa-Vialaneix 9 / 30 N

(23)

A general consistency result

Basics about smoothing splines II

A simple example of boundary conditions: h(0) =h(1)(0) = . . . =h(m−1)(0) =0. Then, k0(s,t) = m−1X k=0 tksk (k!)2 and Z 1(tw)m−1 + (s−w)m−1+

(24)

A general consistency result

Estimating the predictors with

smooth-ing splines I

Assumption (A1)

|τd| ≥ m − 1

sampling points are distinct in[0, 1]

Bj are linearly independent from h → h(t) for all t ∈ τ d

11 / 30 Nathalie Villa-Vialaneix N

(25)

A general consistency result

Estimating the predictors with

smooth-ing splines I

Assumption (A1)

|τd| ≥ m − 1

sampling points are distinct in[0, 1]

Bj are linearly independent from h → h(t) for all t ∈ τ d

[Kimeldorf and Wahba, 1971]: forxτd inR|τd|,x

λ,τd ∈ H m solution of arg min 1 |τd| X (h(t) −xτ )2+ λ Z (h(m)(t))2dt

(26)

A general consistency result

Estimating the predictors with

smooth-ing splines I

Assumption (A1)

|τd| ≥ m − 1

sampling points are distinct in[0, 1]

Bj are linearly independent from h → h(t) for all t ∈ τ d

[Kimeldorf and Wahba, 1971]: forxτd inR|τd|,x

λ,τd ∈ H m solution of arg min h∈Hm 1 |τd| |τd| X l=1 (h(tl) −xτd)2+ λ Z [0,1] (h(m)(t))2dt. andˆxλ,τd = Sλ,τdx τd whereS λ,τd : R |τd|→ Hm.

These assumptions are fullfilled by the previous simple example as long as 0<τd. Nathalie Villa-Vialaneix 11 / 30

(27)

A general consistency result

Estimating the predictors with

smooth-ing splines II

Sλ,τd is given by: Sλ,τd = ωT(U(K1+ λI|τd|)U T)−1U(K 1+ λI|τd|) −1 +ηT(K1+ λI|τd|) −1(I |τd|−U T(U(K 1+ λI|τd|) −1U(K 1+ λI|τd|) −1) = ωTM0+ ηTM1 with {ω1, . . . , ωm} is a basis of Pm−1, ω= (ω1, . . . , ωm)T and U= (ωi(t))i=1,...,m t∈τd;

(28)

A general consistency result

Estimating the predictors with

smooth-ing splines II

Sλ,τd is given by: Sλ,τd = ωT(U(K1+ λI|τd|)U T)−1U(K 1+ λI|τd|) −1 +ηT(K1+ λI|τd|) −1(I |τd|−U T(U(K 1+ λI|τd|) −1U(K 1+ λI|τd|) −1) = ωTM0+ ηTM1 with {ω1, . . . , ωm} is a basis of Pm−1, ω= (ω1, . . . , ωm)T and U= (ωi(t))i=1,...,m t∈τd; η= (k1(t, .))Tt∈τdand K1= (k1(t, t ′)) t,t′∈τd.

The observations of thepredictor X (NIR spectra) are then estimatedfrom their samplingXτd byXb

λ,τd.

12 / 30 Nathalie Villa-Vialaneix N

(29)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

(30)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

2 Easy way to use derivatives:

hSλ,τdu τd, S λ,τdv τdi Hm = hbuλ,τ d,bvλ,τdiHm 13 / 30 Nathalie Villa-Vialaneix N

(31)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

2 Easy way to use derivatives:

(uτd)TMT

(32)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

2 Easy way to use derivatives:

(uτd)TM

λ,τdv

τd = hbu

λ,τd,bvλ,τdiHm

whereMλ,τd is symmetric, definite positive.

13 / 30 Nathalie Villa-Vialaneix N

(33)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

2 Easy way to use derivatives:

(Qλ,τdu

τd)T(Q

λ,τdv

τd) = hbu

(34)

A general consistency result

Two important consequences

1 No information loss inf φ:Hm→{−1,1} Pφ(bXλ,τ d) , Y  = inf φ:R|τd |→{−1,1} P(φ(Xτd) , Y ) and inf φ:Hm→{−1,1} Ehφ(bXλ,τ d) − Y i2 = inf φ:R|τd |→{−1,1} P[φ(Xτd) − Y ]2

2 Easy way to use derivatives:

(Qλ,τdu τd)T(Q λ,τdv τd) = hbu λ,τd,bvλ,τdiHm ≃ hbuλ,τ(m) d,bv (m) λ,τdiL2

whereQλ,τd is the Choleski triangle ofMλ,τd:Q T

λ,τdQλ,τd = Mλ,τd.

Remark:Qλ,τd is calculated only from the RKHS, λ and τd: it does

not depend on the data set.

13 / 30 Nathalie Villa-Vialaneix N

(35)

A general consistency result

Classification and regression based on

derivatives

Suppose that we know aconsistent classifier or regression function inR|τd| that is based onR|τd|scalar product or norm.

Example: Nonparametric kernel regression

Ψ :u∈ R|τd| Pn i=1TiK ku−U ikR|τd | hn  Pn i=1K ku−U ikR|τd | hn 

(36)

A general consistency result

Classification and regression based on

derivatives

Suppose that we know aconsistent classifier or regression function inR|τd| that is based onR|τd|scalar product or norm. Thecorresponding derivative based classifier or regression functionis given by using the norm induced byQλ,τd:

Example: Nonparametric kernel regression

φn,d =Ψ ◦Qλ,τd :x∈ H m Pn i=1YiK kQλ,τdxτd−Qλ,τdXτdi k R|τd | hn ! Pn i=1K kQλ,τdxτd−Qλ,τdXτdi k R|τd | hn ! 14 / 30 Nathalie Villa-Vialaneix N

(37)

A general consistency result

Classification and regression based on

derivatives

Suppose that we know aconsistent classifier or regression function inR|τd| that is based onR|τd|scalar product or norm. Thecorresponding derivative based classifier or regression functionis given by using the norm induced byQλ,τd:

Example: Nonparametric kernel regression

φn,d =Ψ ◦Qλ,τd :x∈ H m Pn i=1YiK kQλ,τdxτd−Qλ,τdXτdi k R|τd | hn ! Pn i=1K kQλ,τdxτd−Qλ,τdXτdi k R|τd | hn ! !

(38)

A general consistency result

Remark for consistency

Classification case(approximatively the same is true for regression): Pφn,τ d(bXλ,τd) ,Y  −L∗= Pφn,τd(bXλ,τd) ,Y  −Ld∗+Ld∗−L∗ where Ld∗=infφ:R|τd |→{−1,1}P(φ(Xτd) ,Y). 15 / 30 Nathalie Villa-Vialaneix N

(39)

A general consistency result

Remark for consistency

Classification case(approximatively the same is true for regression): Pφn,τ d(bXλ,τd) ,Y  −L∗= Pφn,τd(bXλ,τd) ,Y  −Ld∗+Ld∗−L∗ where Ld∗=infφ:R|τd |→{−1,1}P(φ(Xτd) ,Y).

1 For all fixed d,

lim n→+∞ Pφn d(bXλ,τd) , Y  = Ld

as long as the R|τd|-classifier is consistent because there is a

one-to-one mapping betweenXτd and bX

(40)

A general consistency result

Remark for consistency

Classification case(approximatively the same is true for regression): Pφn,τ d(bXλ,τd) ,Y  −L∗= Pφn,τd(bXλ,τd) ,Y  −Ld∗+Ld∗−L∗ where Ld∗=infφ:R|τd |→{−1,1}P(φ(Xτd) ,Y).

1 For all fixed d,

lim n→+∞ Pφn d(bXλ,τd) , Y  = Ld

as long as the R|τd|-classifier is consistent because there is a

one-to-one mapping betweenXτd and bX

λ,τd. 2 L∗ d− L ∗≤ E E(Y|bX λ,τd) − E(Y |X) 

with consistency of spline estimate bXλ,τdand assumption on the

regularity of E(Y |X = .), consistency would be proved.

15 / 30 Nathalie Villa-Vialaneix N

(41)

A general consistency result

Remark for consistency

Classification case(approximatively the same is true for regression): Pφn,τ d(bXλ,τd) ,Y  −L∗= Pφn,τd(bXλ,τd) ,Y  −Ld∗+Ld∗−L∗ where Ld∗=infφ:R|τd |→{−1,1}P(φ(Xτd) ,Y).

1 For all fixed d,

lim n→+∞ Pφn d(bXλ,τd) , Y  = Ld

as long as the R|τd|-classifier is consistent because there is a

one-to-one mapping betweenXτd and bX

λ,τd.

(42)

A general consistency result

Spline consistency

Letλdepends on d and denote(λd)d the series of regularization

parameters. Also introduce

∆τd :=max{t1,t2−t1, . . . ,1−t|τd|}, ∆τd :=min1≤i<|τd|{ti+1−ti} Assumption (A2)

∃ R such that∆τd/∆τd ≤ R for all d;

limd→+∞|τd|= +∞;

limd→+∞λd= 0.

16 / 30 Nathalie Villa-Vialaneix N

(43)

A general consistency result

Spline consistency

Letλdepends on d and denote(λd)d the series of regularization

parameters. Also introduce

∆τd :=max{t1,t2−t1, . . . ,1−t|τd|}, ∆τd :=min1≤i<|τd|{ti+1−ti} Assumption (A2)

∃ R such that∆τd/∆τd ≤ R for all d;

limd→+∞|τd|= +∞;

limd→+∞λd= 0.

(44)

A general consistency result

Bayes risk consistency

Assumption (A3a) EkDmXk2 L2  is finite and Y ∈ {−1,1}. 17 / 30 Nathalie Villa-Vialaneix N

(45)

A general consistency result

Bayes risk consistency

Assumption (A3a) EkDmXk2 L2  is finite and Y ∈ {−1,1}. or Assumption (A3b)

(46)

A general consistency result

Bayes risk consistency

Assumption (A3a) EkDmXk2 L2  is finite and Y ∈ {−1,1}. or Assumption (A3b)

τd ⊂ τd+1for all d andE(Y2)is finite.

Under (A1)-(A3), limd→+∞Ld∗=L∗.

17 / 30 Nathalie Villa-Vialaneix N

(47)

A general consistency result

Proof under assumption (A3a)

Assumption (A3a) EkDmXk2

L2



(48)

A general consistency result

Proof under assumption (A3a)

Assumption (A3a) EkDmXk2

L2



is finite and Y ∈ {−1,1}.

The proof is based on a result of[Faragó and Györfi, 1975]: For a pair of random variables (X,Y) taking their values in X×{−1,1}whereXis an arbitrary metric space and for a series of functions Td : X → Xsuch that

E(δ(Td(X),X))−−−−−−d→+∞→0 then limd→+∞infφ:X→{−1,1}P(φ(Td(X)) ,Y) =L∗.

18 / 30 Nathalie Villa-Vialaneix N

(49)

A general consistency result

Proof under assumption (A3a)

Assumption (A3a) EkDmXk2

L2



is finite and Y ∈ {−1,1}.

The proof is based on a result of[Faragó and Györfi, 1975]:

Tdis the spline estimate based on the sampling;

the inequality of[Ragozin, 1983]about this estimate is exactly the assumption of Farago and Gyorfi’s Theorem.

(50)

A general consistency result

Proof under assumption (A3b)

Assumption (A3b)

τd ⊂ τd+1for all d andE(Y2)is finite.

19 / 30 Nathalie Villa-Vialaneix N

(51)

A general consistency result

Proof under assumption (A3b)

Assumption (A3b)

τd ⊂ τd+1for all d andE(Y2)is finite.

Under (A3b),(E(Y|bXλd,τd))d is a uniformly bounded martingale and thus converges for the L1-norm. Using the consistency of(bXλd,τd)d to X ends the proof.

(52)

A general consistency result

Concluding result (consistency)

Theorem

Under assumptions (A1)-(A3), lim |τd|→+∞ lim n→+∞ Pφn,τd(bXλd,τd) ,Y  =L∗ and lim |τd|→+∞ lim n→+∞E h φn,τd(bXλd,τd) −Y i2 =L∗

Proof: For aǫ >0, fix d0 such that, for all d≥d0, Ld∗−L∗≤ ǫ/2.

Then, by consistency of theR|τd|-classifier or regression function, conclude.

20 / 30 Nathalie Villa-Vialaneix N

(53)

A general consistency result

A practical application to SVM I

Recallthat, for a learning set(Ui,Ti)i=1,...,n inRp × {−1,1},

gaussian SVM is the classifier u∈ Rp → Sign    n X i=1 αiTie−γku−Uik 2 Rp   

where(αi)i satisfy the following quadratic optimization problem:

arg min w n X i=1 1−Tiw(Ui) ++Ckwk2S

(54)

A general consistency result

A practical application to SVM I

Recallthat, for a learning set(Ui,Ti)i=1,...,n inRp × {−1,1},

gaussian SVM is the classifier u∈ Rp → Sign    n X i=1 αiTie−γku−Uik 2 Rp   

where(αi)i satisfy the following quadratic optimization problem:

arg min w n X i=1 1−Tiw(Ui) ++Ckwk2S

where w(u) =Pni=1αie−γku−Uik 2

Rp andSis the RKHS associated with the gaussian kernel and C is aregularization parameter. Under suitable assumptions,[Steinwart, 2002]proves the consistency of SVM classifiers.

21 / 30 Nathalie Villa-Vialaneix N

(55)

A general consistency result

A practical application to SVM II

Additional assumptions related to SVM: Assumptions (A4)

For all d, the regularization parameter depends on n such that limn→+∞nCnd= +∞ and Cnd= On



nβd−1for a 0 < β

d<1/d.

For all d, there is a bounded subset of R|τd|, B

d, such thatXτd

(56)

A general consistency result

A practical application to SVM II

Additional assumptions related to SVM: Assumptions (A4)

For all d, the regularization parameter depends on n such that limn→+∞nCnd= +∞ and Cnd= On



nβd−1for a 0 < β

d<1/d.

For all d, there is a bounded subset of R|τd|, B

d, such thatXτd

belongs to Bd.

Result: Under assumptions (A1)-(A4), the SVMφn,d :x ∈ Hm →

Sign    n X i=1 αiYie−γkQλd ,τdx τd−Qλd ,τdXτdi k2 Rd    ≃ Sign    n X i=1 αiYie−γkx (m)−X(m) i | 2 L 2    is consistent: lim|τd|→+∞limn→+∞P

 φn,τd(bXλd,τd) ,Y  =L∗. 22 / 30 Nathalie Villa-Vialaneix N

(57)

A general consistency result

Additional remark about the link

be-tween n and

d

|

Under suitable (and usual) regularity assumptions onE(Y|X = .) and if n∼ ν|τd| log |τd|, therate of convergenceof this method is of order d−2ν+12ν whereνis either equal to m or to a Lipchitz constant related toE(Y|X = .).

(58)

Examples

Outline

1 Introduction and motivations

2 A general consistency result

3 Examples

24 / 30 Nathalie Villa-Vialaneix N

(59)

Examples

Chosen regression method: Regression

with kernel ridge regression

Recall thatkernel ridge regressioninRp is given by solving

arg min w n X i=1 (Ti−w(Ui))2+Ckwk2S

whereSis a RKHS induced by a given kernel (such as the Gaussian kernel) and(Ui,Ti)iis a training sample inRp × R.

(60)

Examples

Chosen regression method: Regression

with kernel ridge regression

Recall thatkernel ridge regressioninRp is given by solving

arg min w n X i=1 (Ti−w(Ui))2+Ckwk2S

whereSis a RKHS induced by a given kernel (such as the Gaussian kernel) and(Ui,Ti)iis a training sample inRp × R.

In the following examples,Uiis either:

the original (sampled) functionsXi (viewed as R|τd|vectors);

Qλ,τdX

τd

i for derivatives of order 1 or 2.

25 / 30 Nathalie Villa-Vialaneix

(61)

Examples

Example 1: Predicting yellow berry in

durum wheat from NIR spectra

953 wheat samples were analyzed:

NIR spectrometry: 1049 wavelengths regularly ranged from 400 to

2498 nm;

(62)

Examples

Example 1: Predicting yellow berry in

durum wheat from NIR spectra

953 wheat samples were analyzed:

NIR spectrometry: 1049 wavelengths regularly ranged from 400 to

2498 nm;

Yellow berry: manual count (%) of affected grains.

Methodology for comparison:

Split the datainto train/test sets (50 times);

Train50 regression functions for the 50 train sets

(hyper-parameters were tuned by CV);

Evaluatethese regression functions by calculating theMSEfor the

50 corresponding test sets.

26 / 30 Nathalie Villa-Vialaneix

(63)

Examples

Example 1: Predicting yellow berry in

durum wheat from NIR spectra

Kernel (SVM) MSE on test (and sd ×10−3)

Linear (L ) 0.122 (8.77)

Linear on derivatives (L(1)) 0.138 (9.53)

Linear on second derivatives (L(2)) 0.122 (1.71)

Gaussian (G) 0.110 (20.2) Gaussian on derivatives (G(1)) 0.098 (7.92)

Gaussian on second derivatives (G(2)) 0.094(8.35)

(64)

Examples

Comparison with PLS...

MSE (mean) MSE (sd)

PLS 0.154 0.012

Kernel PLS 0.154 0.013

KRR splines (reg. D2) 0.094 0.008 Error decrease: almost 40 %

SVM−D2 KPLS PLS 0.08 0.10 0.12 0.14 0.16 0.18 27 / 30 Nathalie Villa-Vialaneix N

(65)

Examples

Example 2: Simulated noisy spectra

(66)

Examples

Example 2: Simulated noisy spectra

Noisy data: Xb

i (t) =Xi(t) + ǫit,ǫit ∼ N(0,0.01), i.i.d.:

28 / 30 Nathalie Villa-Vialaneix

(67)

Examples

Example 2: Simulated noisy spectra

(68)

Examples

Methodology for comparison

Split the datainto train/test sets (250 times);

Train250 regression functions for the 250 train sets

(hyper-parameters were tuned by CV) with the predictors being

the original (sampled) functionsXi(viewed asR|τd|vectors);

Qλ,τdX

τd

i for derivatives of order 1 or 2:smoothing splines derivatives;

Q0,τdX

τd

i for derivatives of order 1 or 2:interpolating splines derivatives;

derivatives of order 1 or 2 evaluated byXi(tj+1)−Xi(tj)

tj+1−tj :finite differences

derivatives;

Evaluatethese regression functions by calculating theMSEfor the

50 corresponding test sets.

29 / 30 Nathalie Villa-Vialaneix

(69)

Examples

(70)

Examples

Performances

30 / 30 Nathalie Villa-Vialaneix

(71)

References

Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publisher.

Faragó, T. and Györfi, L. (1975).

On the continuity of the error distortion function for multiple-hypothesis decisions. IEEE Transactions on Information Theory, 21(4):458–460.

Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheffian spline functions.

Journal of Mathematical Analysis and Applications, 33(1):82–95.

Ragozin, D. (1983).

Error bounds for derivative estimation based on spline smoothing of exact or noisy data. Journal of Approximation Theory, 37:335–355.

Steinwart, I. (2002).

Support vector machines are universally consistent. Journal of Complexity, 18:768–791.

Références

Documents relatifs

During the second year, the residual effects of the past year’s fertilization and two-year application of the orange waste doses and mineral fertilization on duration of

Christophe Nguyen, Emmanuelle Gourdain, Guénolé Grignon, Bruno Barrier-Guillot, Benoit Meleard. To cite

It is a general review of durum wheat and its products, including the history, production, manufacturing, and economic importance of durum wheat, pasta, couscous,

In spite of the previous constraints, the significant increase in the average yield obtained in the last years was mainly due to the breeding and the expansion of

Key words: Varietal evolution, production, yield, yellow pigment, sedimentation test, vitreousness, test weight, protein content.. Les variétés traditionnelles de

Quality Laboratory in Field Crops Central Research Institute serves as a National Laboratory and is a member of European Durum Wheat Network.. The Laboratory

The poor quality of the harvest had a bearing on Canadian exports so that in the marketing year, Canada exported about million tons of durum wheat as against about

The purpose of this report is twofold: (i) provide a global overview of durum production trends in the world, with special emphasis on the five major dryland durum producing