• Aucun résultat trouvé

multi-task regression

N/A
N/A
Protected

Academic year: 2022

Partager "multi-task regression"

Copied!
87
0
0

Texte intégral

(1)

1/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Data-driven calibration of linear estimators with minimal penalties, with an application to

multi-task regression

Sylvain Arlot (joint works with F. Bach & M. Solnon)

1

Cnrs

2

Ecole Normale Sup´ ´ erieure (Paris), Liens , ´ Equipe Sierra

Cambridge, November, 4th 2011

(2)

2/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Regression: data (x 1 , Y 1 ), . . . , (x n , Y n )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(3)

3/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Goal: find the signal (denoising)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(4)

4/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Statistical framework: regression, least-squares loss

Observations: Y = (Y 1 , . . . , Y n ) ∈ R n

Y i = F i + ε i (e.g., F i = F (x i )) with Y i ∈ R , ( ε i ) 1≤i ≤n i.i.d.

Fixed design: x i deterministic

Least-squares loss of a predictor t ∈ R n (“t i = t (x i )”): 1

n kt − F k 2 = 1 n

X n i=1

( t i − F i ) 2

⇒ Estimator F b (Y ) ∈ R n ?

(5)

4/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Statistical framework: regression, least-squares loss

Observations: Y = (Y 1 , . . . , Y n ) ∈ R n

Y i = F i + ε i (e.g., F i = F (x i )) with Y i ∈ R , ( ε i ) 1≤i ≤n i.i.d.

Fixed design: x i deterministic

Least-squares loss of a predictor t ∈ R n (“t i = t (x i )”): 1

n kt − F k 2 = 1 n

X n i=1

( t i − F i ) 2

⇒ Estimator F b (Y ) ∈ R n ?

(6)

4/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Statistical framework: regression, least-squares loss

Observations: Y = (Y 1 , . . . , Y n ) ∈ R n

Y i = F i + ε i (e.g., F i = F (x i )) with Y i ∈ R , ( ε i ) 1≤i ≤n i.i.d.

Fixed design: x i deterministic

Least-squares loss of a predictor t ∈ R n (“t i = t(x i )”):

1

n kt − F k 2 = 1 n

X n i=1

( t i − F i ) 2

⇒ Estimator F b (Y ) ∈ R n ?

(7)

4/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Statistical framework: regression, least-squares loss

Observations: Y = (Y 1 , . . . , Y n ) ∈ R n

Y i = F i + ε i (e.g., F i = F (x i )) with Y i ∈ R , ( ε i ) 1≤i ≤n i.i.d.

Fixed design: x i deterministic

Least-squares loss of a predictor t ∈ R n (“t i = t(x i )”):

1

n kt − F k 2 = 1 n

X n i=1

( t i − F i ) 2

⇒ Estimator F b (Y ) ∈ R n ?

(8)

5/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimators: example: regressogram

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(9)

6/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Least-squares estimators

Natural idea: minimize an estimator of the risk 1 n kt − F k 2

Least-squares criterion: 1

n kt − Y k 2 = 1 n

X n i=1

( t i − Y i ) 2

∀t ∈ R n , E 1

n kt − Y k 2

= 1

n kt − F k 2 + 1 n E

h kεk 2 i

Model: S ⊂ R n ⇒ Least-squares estimator on S: F b S ∈ arg min

t∈S

1

n kt − Y k 2

= arg min

t∈S

( 1 n

X n i =1

( t i − Y i ) 2 )

so that

F b S = Π S (Y ) (orthogonal projection)

(10)

6/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Least-squares estimators

Natural idea: minimize an estimator of the risk 1 n kt − F k 2 Least-squares criterion:

1

n kt − Y k 2 = 1 n

X n i=1

( t i − Y i ) 2

∀t ∈ R n , E 1

n kt − Y k 2

= 1

n kt − F k 2 + 1 n E

h kεk 2 i

Model: S ⊂ R n ⇒ Least-squares estimator on S: F b S ∈ arg min

t∈S

1

n kt − Y k 2

= arg min

t∈S

( 1 n

X n i =1

( t i − Y i ) 2 )

so that

F b S = Π S (Y ) (orthogonal projection)

(11)

6/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Least-squares estimators

Natural idea: minimize an estimator of the risk 1 n kt − F k 2 Least-squares criterion:

1

n kt − Y k 2 = 1 n

X n i=1

( t i − Y i ) 2

∀t ∈ R n , E 1

n kt − Y k 2

= 1

n kt − F k 2 + 1 n E

h kεk 2 i

Model: S ⊂ R n ⇒ Least-squares estimator on S : F b S ∈ arg min

t∈S

1

n kt − Y k 2

= arg min

t∈S

( 1 n

X n i =1

( t i − Y i ) 2 )

so that

F b S = Π S (Y ) (orthogonal projection)

(12)

7/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Model examples

histograms on some partition Λ of X

⇒ the least-squares estimator (regressogram) can be written F b m (x i ) = X

λ∈Λ

β b λ 1 x

i

∈λ β b λ = 1 Card { x i ∈ λ}

X

x

i

∈λ

Y i

subspace generated by a subset of an orthogonal basis of L 2 (µ) (Fourier, wavelets, and so on)

variable selection: x i =

x i (1) , . . . , x i (p)

∈ R p gathers p variables that can (linearly) explain Y i

∀m ⊂ { 1, . . . , p } , S m = vect n

x (j ) s.t. j ∈ m o

.

(13)

8/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

k -nearest-neighbours estimator (k = 20)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(14)

9/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Nadaraya-Watson estimator (σ = 0.01)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(15)

10/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Linear estimators

OLS: F b m = Π S

m

Y (projection onto S m ) (kernel) ridge regression, spline smoothing:

F b i = b f (x i ) with b f ∈ arg min

f ∈F

K

( 1 n

X n i=1

( Y i − f (x i ) ) 2 + λ kf k 2 F

K

)

⇒ F b λ,K = K (K + λI) −1 Y where K = (K (x i , x j )) 1≤i,j ≤n

k -nearest neighbours

Nadaraya-Watson estimators

F b = AY where A does not depend on Y

(16)

11/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection: regular regressograms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(17)

12/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection: k nearest neighbours

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(18)

13/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection: Nadaraya-Watson

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(19)

14/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection

Estimator collection (b F m ) m∈M ⇒ m(Y b ) ? Example: F b m = A m Y

Goal: minimize the risk, i.e.,

Oracle inequality (in expectation or with a large probability): 1

n b F

m b − F

2 ≤ C inf

m∈M

1 n

b F m − F

2

+ R n

Examples:

model selection

calibration (choosing k or the distance for k -NN, choice of a regularization parameter, choice of a kernel, etc.)

choice between methods different in nature

ex.: k -NN vs. smoothing splines?

(20)

14/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection

Estimator collection (b F m ) m∈M ⇒ m(Y b ) ? Example: F b m = A m Y

Goal: minimize the risk, i.e.,

Oracle inequality (in expectation or with a large probability):

1 n

b F

m b − F

2 ≤ C inf

m∈M

1 n

b F m − F

2

+ R n

Examples:

model selection

calibration (choosing k or the distance for k -NN, choice of a regularization parameter, choice of a kernel, etc.)

choice between methods different in nature

ex.: k -NN vs. smoothing splines?

(21)

14/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimator selection

Estimator collection (b F m ) m∈M ⇒ m(Y b ) ? Example: F b m = A m Y

Goal: minimize the risk, i.e.,

Oracle inequality (in expectation or with a large probability):

1 n

b F

m b − F

2 ≤ C inf

m∈M

1 n

b F m − F

2

+ R n

Examples:

model selection

calibration (choosing k or the distance for k -NN, choice of a regularization parameter, choice of a kernel, etc.)

choice between methods different in nature

ex.: k -NN vs. smoothing splines?

(22)

15/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Bias-variance trade-off

E 1

n b F m − F

2

= Bias + Variance Bias or Approximation error

1

n kF m − F k 2 = 1

n kA m F − F k 2 Variance or Estimation error

σ 2 tr A > m A m

n OLS: σ 2 dim(S m )

n

Bias-variance trade-off

⇔ avoid overfitting and underfitting

(23)

15/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Bias-variance trade-off

E 1

n b F m − F

2

= Bias + Variance Bias or Approximation error

1

n kF m − F k 2 = 1

n kA m F − F k 2 Variance or Estimation error

σ 2 tr A > m A m

n OLS: σ 2 dim(S m )

n

Bias-variance trade-off

⇔ avoid overfitting and underfitting

(24)

16/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Why should the empirical risk be penalized?

0 5 10 15 20 25

−0.1

−0.05 0 0.05 0.1 0.15

Dimension

Biais Exces de risque E[Risque empirique]

(25)

17/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Penalization

b

m ∈ arg min

m∈M

1 n

b F m − Y

2 + pen(m)

Ideal penalty: pen id (m) := 1

n b F m − F

2 − 1

n

b F m − Y

2 = Risk − Empirical risk Mallows’ heuristic: pen(m) ≈ E [ pen id (m) ]

⇒ oracle inequality if Card(M) not too large (+ concentration inequalities)

⇒ OLS: C p : 2σ 2 D m /n (Mallows, 1973)

⇒ Linear estimators: C L : 2σ 2 tr(A m )/n (Mallows, 1973)

(26)

17/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Penalization

b

m ∈ arg min

m∈M

1 n

b F m − Y

2 + pen(m)

Ideal penalty:

pen id (m) := 1 n

b F m − F

2 − 1

n

b F m − Y

2 = Risk − Empirical risk Mallows’ heuristic: pen(m) ≈ E [ pen id (m) ]

⇒ oracle inequality if Card(M) not too large (+ concentration inequalities)

⇒ OLS: C p : 2σ 2 D m /n (Mallows, 1973)

⇒ Linear estimators: C L : 2σ 2 tr(A m )/n (Mallows, 1973)

(27)

17/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Penalization

b

m ∈ arg min

m∈M

1 n

b F m − Y

2 + pen(m)

Ideal penalty:

pen id (m) := 1 n

b F m − F

2 − 1

n

b F m − Y

2 = Risk − Empirical risk Mallows’ heuristic: pen(m) ≈ E [ pen id (m) ]

⇒ oracle inequality if Card(M) not too large (+ concentration inequalities)

⇒ OLS: C p : 2σ 2 D m /n (Mallows, 1973)

⇒ Linear estimators: C L : 2σ 2 tr(A m )/n (Mallows, 1973)

(28)

18/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Oracle inequality

Theorem (Birg´ e & Massart 2007, A. & Bach 2009–2011) Assumptions:

pen(m) = 2C tr(A n

m

) with C σ −2 − 1 ≤ L 0

q ln(n) n

∀m ∈ M, |||A m ||| ≤ L 1 and tr(A > m A m ) ≤ tr(A m ) ≤ n Gaussian noise: ε i ∼ N (0, σ 2 )

Then, with probability at least 1 − 3 Card(M)n −δ , if n ≥ n 0 , for every η ∈ ( 0, 1 ),

1 n

b F m b − F

2 ≤ ( 1 + η ) inf

m∈M

1 n

b F m − F

2

+ K (δ, L 0 , L 1 ) ln(n)σ 2 ηn

Note: ridge, λ ∈ [ 0, +∞ ] ⇔ Card(M) ∝ n

(29)

18/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Oracle inequality

Theorem (Birg´ e & Massart 2007, A. & Bach 2009–2011) Assumptions:

pen(m) = 2C tr(A n

m

) with C σ −2 − 1 ≤ L 0

q ln(n) n

∀m ∈ M, |||A m ||| ≤ L 1 and tr(A > m A m ) ≤ tr(A m ) ≤ n Gaussian noise: ε i ∼ N (0, σ 2 )

Then, with probability at least 1 − 3 Card(M)n −δ , if n ≥ n 0 , for every η ∈ ( 0, 1 ),

1 n

b F m b − F

2 ≤ ( 1 + η ) inf

m∈M

1 n

b F m − F

2

+ K (δ, L 0 , L 1 ) ln(n)σ 2 ηn

Note: ridge, λ ∈ [ 0, +∞ ] ⇔ Card(M) ∝ n

(30)

18/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Oracle inequality

Theorem (Birg´ e & Massart 2007, A. & Bach 2009–2011) Assumptions:

pen(m) = 2C tr(A n

m

) with C σ −2 − 1 ≤ L 0

q ln(n) n

∀m ∈ M, |||A m ||| ≤ L 1 and tr(A > m A m ) ≤ tr(A m ) ≤ n Gaussian noise: ε i ∼ N (0, σ 2 )

Then, with probability at least 1 − 3 Card(M)n −δ , if n ≥ n 0 , for every η ∈ ( 0, 1 ),

1 n

b F m b − F

2 ≤ ( 1 + η ) inf

m∈M

1 n

b F m − F

2

+ K (δ, L 0 , L 1 ) ln(n)σ 2 ηn

Note: ridge, λ ∈ [ 0, +∞ ] ⇔ Card(M) ∝ n

(31)

19/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Motivation (1): penalties known up to a constant factor

Ex.: pen Cp (m) = 2σ 2 D m

n pen CL (m) = 2σ 2 tr(A m ) n

Classical estimators of σ 2 : b

σ

2m

0

:= kY − F b

m0

k

2

/(n − D

m0

) (OLS) problem: choice of m

0

?

b

σ

2m

⇒ FPE (Akaike, 1970) and GCV (Craven & Wahba, 1979) problem: avoiding the largest models

Goals: estimation of σ 2 for model selection, under minimal

assumptions, without overfitting

(32)

19/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Motivation (1): penalties known up to a constant factor

Ex.: pen Cp (m) = 2σ 2 D m

n pen CL (m) = 2σ 2 tr(A m ) n

Classical estimators of σ 2 : b

σ

2m0

:= kY − F b

m0

k

2

/(n − D

m0

) (OLS) problem: choice of m

0

?

b

σ

2m

⇒ FPE (Akaike, 1970) and GCV (Craven & Wahba, 1979) problem: avoiding the largest models

Goals: estimation of σ 2 for model selection, under minimal

assumptions, without overfitting

(33)

19/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Motivation (1): penalties known up to a constant factor

Ex.: pen Cp (m) = 2σ 2 D m

n pen CL (m) = 2σ 2 tr(A m ) n

Classical estimators of σ 2 : b

σ

2m0

:= kY − F b

m0

k

2

/(n − D

m0

) (OLS) problem: choice of m

0

?

b

σ

2m

⇒ FPE (Akaike, 1970) and GCV (Craven & Wahba, 1979) problem: avoiding the largest models

Goals: estimation of σ 2 for model selection, under minimal

assumptions, without overfitting

(34)

20/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Motivation (2): “L-curve” and elbow heuristics?

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

empirical / generalization error

(35)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 0 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 0* pen min]

(36)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 0.8 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 0.8* pen min]

(37)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 0.9 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 0.9* pen min]

(38)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 1.1 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 1.1* pen min]

(39)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 1.2 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 1.2* pen min]

(40)

21/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

E [ Empirical risk ] + 2 × σ 2 D m n −1 (OLS)

0 200 400 600 800 1000

0 0.5 1 1.5

2 x 10

−3

dimension

E[empirical error + 2* pen min]

(41)

22/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

OLS: Dimension jump

0 1 2 3 4 5

0 200 400 600 800 1000

C/sigma

2

dimension of m(C)

(42)

23/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

OLS: algorithm (Birg´ e & Massart 2007)

1

for every C > 0, compute b

m(C ) ∈ arg min

m∈M

n

1 n

b F m − Y

2 + C D m

n

2

find C b min such that D

m(C b ) is “very large” when C < C b min and “reasonably small” when C > C b min

3

select m b = m b

2 C b min

Practical use: Capushe package (Baudry, Maugis & Michel, 2010)

http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html

(43)

24/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Practical qualities of the algorithm

visual checking of existence of a jump

calibration independent from the choice of some m 0 too strong overfitting almost impossible

one remaining parameter: how to localize the jump

(44)

25/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Theorem (1): Dimension jump / Minimal penalty

Theorem (Birg´ e & Massart 2007, A. & Bach 2009–2011) Assumptions:

∃m 0 ∈ M , D m

0

≤ √

n and 1 n kF m

0

− F k 2 ≤ σ 2 p

ln(n)/n . Gaussian noise: ε i ∼ N (0, σ 2 )

Then, with probability at least 1 − 3 Card(M)n −δ , if n ≥ n 0 (δ),

∀C < 1 − L 3 δ

r ln(n) n

!

σ 2 , D

m(C b ) ≥ n 3

∀C > 1 + L 3 δ

r ln(n) n

!

σ 2 , D

m(C) b ≤ n 10 and in the first case, 1 n k F b

m(C) b − F k 2 inf m∈M

n

{ 1 n kb F m − F k 2 }.

(45)

26/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Theorem (2): Oracle inequality

Theorem (Birg´ e & Massart 2007, A. & Bach 2009–2011) Assumptions:

b

m ∈ arg min m∈M { n 1 kb F m − Y k 2 + 2 C b min D n

m

}

∃m 0 ∈ M , D m

0

≤ √

n and 1 n kF m

0

− F k 2 ≤ σ 2 p

ln(n)/n . Gaussian noise: ε i ∼ N (0, σ 2 )

Then, with probability at least 1 − 3 Card(M)n −δ , if n ≥ n 0 (δ), for every η > 0,

1 n

b F

m b − F

2 ≤ ( 1 + η ) inf

m∈M

1 n

b F m − F

2

+ K (δ) ln(n)σ 2

ηn

(46)

27/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Generalization of minimal penalties to linear estimators?

OLS

pen Cp (m) = 2σ 2 D m n

arg min

m∈M

1

n kb F m − Y k 2 + C D m n

⇒ D

m(C b ) “jumps” at C b min ≈ σ 2

⇒ optimal choice with m(2b b C min )

Linear estimators pen CL (m) = 2σ 2 tr(A m )

n

arg min

m∈M

1

n kb F m − Y k 2 + C tr(A m ) n

Does tr(A

m(C b ) ) jump at C b min ≈ σ 2 ?

optimal choice with m(2 b C b min ) ?

(47)

27/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Generalization of minimal penalties to linear estimators?

OLS

pen Cp (m) = 2σ 2 D m n

arg min

m∈M

1

n kb F m − Y k 2 + C D m n

⇒ D

m(C b ) “jumps” at C b min ≈ σ 2

⇒ optimal choice with m(2b b C min )

Linear estimators pen CL (m) = 2σ 2 tr(A m )

n

arg min

m∈M

1

n kb F m − Y k 2 + C tr(A m ) n

Does tr(A

m(C b ) ) jump at C b min ≈ σ 2 ?

optimal choice with m(2 b C b min ) ?

(48)

27/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Generalization of minimal penalties to linear estimators?

OLS

pen Cp (m) = 2σ 2 D m n

arg min

m∈M

1

n kb F m − Y k 2 + C D m n

⇒ D

m(C b ) “jumps” at C b min ≈ σ 2

⇒ optimal choice with m(2b b C min )

Linear estimators pen CL (m) = 2σ 2 tr(A m )

n

arg min

m∈M

1

n kb F m − Y k 2 + C tr(A m ) n

Does tr(A

m(C b ) ) jump at C b min ≈ σ 2 ?

optimal choice with m(2 b C b min ) ?

(49)

28/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

No dimension jump with a penalty ∝ tr(A m )

−6 0 −4 −2 0 2 4 6

200 400 600 800 1000

log(C/sigma

2

)

tr(A λ (C))

optimal penalty / 2

(50)

29/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Minimal penalties for linear estimators

E 1

n b F m − F

2

= 1

n k(I − A m )F k 2 + tr(A > m A m )σ 2

n = bias + variance

E 1

n

b F m − Y 2

= σ 2 + 1

n k(I − A m )F k 2 − 2 tr(A m ) − tr(A > m A m ) σ 2 n

⇒ optimal penalty ( 2 tr(A m ) ) σ 2 n

⇒ minimal penalty 2 tr(A m ) − tr(A > m A m )

σ 2

n

(51)

29/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Minimal penalties for linear estimators

E 1

n b F m − F

2

= 1

n k(I − A m )F k 2 + tr(A > m A m )σ 2

n = bias + variance

E 1

n

b F m − Y 2

= σ 2 + 1

n k(I − A m )F k 2 − 2 tr(A m ) − tr(A > m A m ) σ 2 n

⇒ optimal penalty ( 2 tr(A m ) ) σ 2 n

⇒ minimal penalty 2 tr(A m ) − tr(A > m A m )

σ 2

n

(52)

29/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Minimal penalties for linear estimators

E 1

n b F m − F

2

= 1

n k(I − A m )F k 2 + tr(A > m A m )σ 2

n = bias + variance

E 1

n

b F m − Y 2

= σ 2 + 1

n k(I − A m )F k 2 − 2 tr(A m ) − tr(A > m A m ) σ 2 n

⇒ optimal penalty ( 2 tr(A m ) ) σ 2 n

⇒ minimal penalty 2 tr(A m ) − tr(A > m A m )

σ 2

n

(53)

29/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Minimal penalties for linear estimators

E 1

n b F m − F

2

= 1

n k(I − A m )F k 2 + tr(A > m A m )σ 2

n = bias + variance

E 1

n

b F m − Y 2

= σ 2 + 1

n k(I − A m )F k 2 − 2 tr(A m ) − tr(A > m A m ) σ 2 n

⇒ optimal penalty ( 2 tr(A m ) ) σ 2 n

⇒ minimal penalty 2 tr(A m ) − tr(A > m A m )

σ 2

n

(54)

29/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Minimal penalties for linear estimators

E 1

n b F m − F

2

= 1

n k(I − A m )F k 2 + tr(A > m A m )σ 2

n = bias + variance

E 1

n

b F m − Y 2

= σ 2 + 1

n k(I − A m )F k 2 − 2 tr(A m ) − tr(A > m A m ) σ 2 n

⇒ optimal penalty ( 2 tr(A m ) ) σ 2 n

b

m(C ) ∈ arg min

λ∈Λ

1 n

b F m − Y

2 + C × 2 tr(A m ) − tr(A > m A m ) n

(55)

30/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

“Dimension” jump (ridge regression)

−6 0 −4 −2 0 2 4 6

200 400 600 800 1000

log(C/sigma

2

)

tr(A λ (C))

minimal penalty

optimal penalty / 2

(56)

31/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Penalty calibration algorithm (A. & Bach 2009)

1

for every C > 0, compute b

m min (C ) ∈ arg min

m∈M

( 1 n

Y − F b m

2 + C 2 tr(A m ) − tr(A > m A m ) n

)

2

find C b min such that tr(A

m b

min

(C) ) is “too large” when C < C b min and “reasonably small” when C > C b min ,

3

select b

m ∈ arg min

m∈M

( 1 n

Y − F b m

2 + 2 C b min tr(A m ) n

)

⇒ b C min σ −2 − 1 ≤ L 4

p ln(n)/n and oracle inequality (same

assumptions as before).

(57)

31/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Penalty calibration algorithm (A. & Bach 2009)

1

for every C > 0, compute b

m min (C ) ∈ arg min

m∈M

( 1 n

Y − F b m

2 + C 2 tr(A m ) − tr(A > m A m ) n

)

2

find C b min such that tr(A

m b

min

(C) ) is “too large” when C < C b min and “reasonably small” when C > C b min ,

3

select b

m ∈ arg min

m∈M

( 1 n

Y − F b m

2 + 2 C b min tr(A m ) n

)

⇒ b C min σ −2 − 1 ≤ L 4

p ln(n)/n and oracle inequality (same

assumptions as before).

(58)

32/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Comparison with least-squares

Linear estimators:

pen min (m) = σ 2 2 tr(A m ) − tr(A > m A m ) n

pen opt (m) = σ 2 ( 2 tr(A m ) ) n pen opt (m)

pen min (m) = 2 tr(A m )

2 tr(A m ) − tr(A > m A m ) ∈ (1, 2]

Least-squares case:

A > m A m = A m ⇒ pen opt (m)

pen min (m) = 2 ⇒ Slope heuristics

(59)

32/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Comparison with least-squares

Linear estimators:

pen min (m) = σ 2 2 tr(A m ) − tr(A > m A m ) n

pen opt (m) = σ 2 ( 2 tr(A m ) ) n pen opt (m)

pen min (m) = 2 tr(A m )

2 tr(A m ) − tr(A > m A m ) ∈ (1, 2]

Least-squares case:

A > m A m = A m ⇒ pen opt (m)

pen min (m) = 2 ⇒ Slope heuristics

(60)

33/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

The k -nearest neighbours case

∀i, j ∈ { 1, . . . , n } , A i,j ∈

0, 1 k

∀i ∈ { 1, . . . , n } , A i,i = 1 k and

X n j =1

A i,j = 1

⇒ tr(A) = n

k = tr(A > A)

⇒ pen opt = 2 pen min

(61)

33/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

The k -nearest neighbours case

∀i, j ∈ { 1, . . . , n } , A i,j ∈

0, 1 k

∀i ∈ { 1, . . . , n } , A i,i = 1 k and

X n j =1

A i,j = 1

⇒ tr(A) = n

k = tr(A > A)

⇒ pen opt = 2 pen min

(62)

34/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Simulation study (ridge regression, choice of λ)

4 5 6 7

0.5 1 1.5 2 2.5 3

log(n) mean( error / error oracle )

10−fold CV GCV

min. penalty

(63)

35/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task regression

We want to solve p ≥ 2 regression problems simultaneously

Observations: Y 1 , . . . , Y n ∈ R p

Y i j = F i j + ε j i j = 1, . . . , p (e.g., F i j = F j (x i )) with ( ε i ) 1≤i≤n i.i.d. N (0, Σ), Σ ∈ S p + ( R )

Implicit assumption: the p problems are “similar” Least-squares loss of a predictor t ∈ R np (“t i j = t j (x i )”):

1

np kt − F k 2 = 1 np

X n i=1

X p j =1

t i j − F i j 2

⇒ Estimator F b (Y 1 , . . . , Y n ) ∈ R np ?

(64)

35/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task regression

We want to solve p ≥ 2 regression problems simultaneously Observations: Y 1 , . . . , Y n ∈ R p

Y i j = F i j + ε j i j = 1, . . . , p (e.g., F i j = F j (x i )) with ( ε i ) 1≤i≤n i.i.d. N (0, Σ), Σ ∈ S p + ( R )

Implicit assumption: the p problems are “similar” Least-squares loss of a predictor t ∈ R np (“t i j = t j (x i )”):

1

np kt − F k 2 = 1 np

X n i=1

X p j =1

t i j − F i j 2

⇒ Estimator F b (Y 1 , . . . , Y n ) ∈ R np ?

(65)

35/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task regression

We want to solve p ≥ 2 regression problems simultaneously Observations: Y 1 , . . . , Y n ∈ R p

Y i j = F i j + ε j i j = 1, . . . , p (e.g., F i j = F j (x i )) with ( ε i ) 1≤i≤n i.i.d. N (0, Σ), Σ ∈ S p + ( R )

Implicit assumption: the p problems are “similar”

Least-squares loss of a predictor t ∈ R np (“t i j = t j (x i )”): 1

np kt − F k 2 = 1 np

X n i=1

X p j =1

t i j − F i j 2

⇒ Estimator F b (Y 1 , . . . , Y n ) ∈ R np ?

(66)

35/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task regression

We want to solve p ≥ 2 regression problems simultaneously Observations: Y 1 , . . . , Y n ∈ R p

Y i j = F i j + ε j i j = 1, . . . , p (e.g., F i j = F j (x i )) with ( ε i ) 1≤i≤n i.i.d. N (0, Σ), Σ ∈ S p + ( R )

Implicit assumption: the p problems are “similar”

Least-squares loss of a predictor t ∈ R np (“t i j = t j (x i )”):

1

np kt − F k 2 = 1 np

X n i=1

X p j =1

t i j − F i j 2

⇒ Estimator F b (Y 1 , . . . , Y n ) ∈ R np ?

(67)

35/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task regression

We want to solve p ≥ 2 regression problems simultaneously Observations: Y 1 , . . . , Y n ∈ R p

Y i j = F i j + ε j i j = 1, . . . , p (e.g., F i j = F j (x i )) with ( ε i ) 1≤i≤n i.i.d. N (0, Σ), Σ ∈ S p + ( R )

Implicit assumption: the p problems are “similar”

Least-squares loss of a predictor t ∈ R np (“t i j = t j (x i )”):

1

np kt − F k 2 = 1 np

X n i=1

X p j =1

t i j − F i j 2

⇒ Estimator F b (Y 1 , . . . , Y n ) ∈ R np ?

(68)

36/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Ridge multi-task regression

F b = (b F i j ) 1≤i≤n , 1≤j≤p with F b i j = b f j (x i ) and b f defined by:

If we consider the tasks separately:

arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X p

j =1

λ j f j 2 F

K

 

A possible multi-task approach (Evgeniou et al., 2005): arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ λ X p

j=1

f j 2

F

K

+ µ X

j 6=`

f j − f `

2 F

K

 

More generally: for M ∈ S p + ( R ) , arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X

j ,`

M j,` D f j , f ` E

F

K

 

(69)

36/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Ridge multi-task regression

F b = (b F i j ) 1≤i≤n , 1≤j≤p with F b i j = b f j (x i ) and b f defined by:

If we consider the tasks separately:

arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X p

j =1

λ j f j 2 F

K

 

A possible multi-task approach (Evgeniou et al., 2005):

arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ λ X p

j=1

f j 2

F

K

+ µ X

j 6=`

f j − f `

2 F

K

 

More generally: for M ∈ S p + ( R ) , arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X

j ,`

M j,` D f j , f ` E

F

K

 

(70)

36/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Ridge multi-task regression

F b = (b F i j ) 1≤i≤n , 1≤j≤p with F b i j = b f j (x i ) and b f defined by:

If we consider the tasks separately:

arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X p

j =1

λ j f j 2 F

K

 

A possible multi-task approach (Evgeniou et al., 2005):

arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ λ X p

j=1

f j 2

F

K

+ µ X

j 6=`

f j − f `

2 F

K

 

More generally: for M ∈ S p + ( R ) , arg min

f ∈F

Kp

 

 1 np

X n i=1

X p j=1

Y i j − f j (x i ) 2

+ X

j ,`

M j,` D f j , f ` E

F

K

 

(71)

37/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task estimator selection

⇒ Estimators collection ( F b M ) M ∈M , M ⊂ S p + ( R ) , with F b M = A M Y and A M = M −1 ⊗ K

M −1 ⊗ K

+ npI np −1

Goal: select M b ∈ M such that np 1 kb F

M b − F k 2 is minimal Expectation of the ideal penalty:

E [ pen id (M ) ] = 2

np tr ( A M ( Σ ⊗ I n ) )

Problem: How to estimate Σ ?

(72)

37/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task estimator selection

⇒ Estimators collection ( F b M ) M ∈M , M ⊂ S p + ( R ) , with F b M = A M Y and A M = M −1 ⊗ K

M −1 ⊗ K

+ npI np −1

Goal: select M b ∈ M such that np 1 kb F

M b − F k 2 is minimal

Expectation of the ideal penalty: E [ pen id (M ) ] = 2

np tr ( A M ( Σ ⊗ I n ) )

Problem: How to estimate Σ ?

(73)

37/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task estimator selection

⇒ Estimators collection ( F b M ) M ∈M , M ⊂ S p + ( R ) , with F b M = A M Y and A M = M −1 ⊗ K

M −1 ⊗ K

+ npI np −1

Goal: select M b ∈ M such that np 1 kb F

M b − F k 2 is minimal Expectation of the ideal penalty:

E [ pen id (M ) ] = 2

np tr ( A M ( Σ ⊗ I n ) )

Problem: How to estimate Σ ?

(74)

37/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Multi-task estimator selection

⇒ Estimators collection ( F b M ) M ∈M , M ⊂ S p + ( R ) , with F b M = A M Y and A M = M −1 ⊗ K

M −1 ⊗ K

+ npI np −1

Goal: select M b ∈ M such that np 1 kb F

M b − F k 2 is minimal Expectation of the ideal penalty:

E [ pen id (M ) ] = 2

np tr ( A M ( Σ ⊗ I n ) )

Problem: How to estimate Σ ?

(75)

38/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimating the covariance matrix: idea (p = 2)

σ

2

σ

3

σ

1

Σ

Data-driven calibration of linear estimators with minimal penalties, with an application to multi-task regression Sylvain Arlot

(76)

38/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimating the covariance matrix: idea (p = 2)

σ

c3

σ

c1

σ

c2

σ

2

σ

3

σ

1

Σ

Data-driven calibration of linear estimators with minimal penalties, with an application to multi-task regression Sylvain Arlot

(77)

38/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimating the covariance matrix: idea (p = 2)

σ

c3

σ

c1

σ

c2

σ

2

σ

3

σ

1

Σ

Σ

c

Data-driven calibration of linear estimators with minimal penalties, with an application to multi-task regression Sylvain Arlot

(78)

39/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimating the covariance matrix: algorithm

for every j ∈ { 1, . . . , p }, apply the “minimal penalties”

algorithm to the data set (Y i j ) 1≤i≤n

⇒ estimator a(e j ) of Σ j ,j

for every j 6= ` ∈ {1, . . . , p }, apply the “minimal penalties”

algorithm to the data set (Y i j + Y i ` ) 1≤i≤n

⇒ estimator a(e j + e ` ) of Σ j ,j + Σ `,` + 2Σ j ,`

Recover an estimator Σ of Σ: b

Σ = b J ( a(e 1 ), . . . , a(e p ), a(e 1 + e 2 ), . . . , a(e p−1 + e p ) ) where J is the unique linear application R p(p+1)/2 7→ S p (R) such that

Σ = J ( Σ 1,1 , . . . , Σ p,p , Σ 1,1 + Σ 2,2 + 2Σ 1,2 , . . . , Σ p−1,p−1 + Σ p,p + 2Σ p−1,p )

(79)

39/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Estimating the covariance matrix: algorithm

for every j ∈ { 1, . . . , p }, apply the “minimal penalties”

algorithm to the data set (Y i j ) 1≤i≤n

⇒ estimator a(e j ) of Σ j ,j

for every j 6= ` ∈ {1, . . . , p }, apply the “minimal penalties”

algorithm to the data set (Y i j + Y i ` ) 1≤i≤n

⇒ estimator a(e j + e ` ) of Σ j ,j + Σ `,` + 2Σ j ,`

Recover an estimator Σ of Σ: b

Σ = b J ( a(e 1 ), . . . , a(e p ), a(e 1 + e 2 ), . . . , a(e p−1 + e p ) ) where J is the unique linear application R p(p+1)/2 7→ S p (R) such that

Σ = J ( Σ 1,1 , . . . , Σ p,p , Σ 1,1 + Σ 2,2 + 2Σ 1,2 , . . . , Σ p−1,p−1 + Σ p,p + 2Σ p−1,p )

(80)

40/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Theorem: Estimating the covariance matrix

Theorem (Solnon, A. & Bach, 2011)

If for every j = 1, . . . , p, some λ j > 0 exists such that tr(A λ

j

) ≤ √

n and

1 n

(I n − A λ

j

)F j 2 ≤ Σ j ,j

r ln(n)

n where A λ

j

= K (K +nλ j I n ) −1 , Then, with probability 1 − L 5 p 2 n −δ , if n ≥ n 0 (δ),

(1 − η)Σ Σ b (1 + η)Σ with η := L(2 + δ)c (Σ) 2 p

r ln(n) n where c(Σ) = max(Sp(Σ))/ min(Sp(Σ)).

⇒ sufficient condition for consistency

(81)

41/44

Introduction Lin. estim. selection Penalty calibration (OLS) Linear estimators Multi-task Conclusion

Theorem: Oracle inequality

Theorem (Solnon, A. & Bach, 2011)

If moreover matrices M ∈ M can be diagonalized in the same orthogonal basis, and if

M b ∈ arg min

M∈M

1 np

b F M − Y 2 + 2

np tr

A M

Σ b ⊗ I n

,

Then, with probability 1 − L 5 p 2 n −δ , if n ≥ n 0 (δ), 1

np b F

M b − F 2

1 + 1 ln(n)

2 M∈M inf

1 np

b F M − F 2

+L(2 + δ) 2 c (Σ) 4 tr(Σ) p

p 3 ln(n) 3

n

Références

Documents relatifs

Indeed, it generalizes the results of Birg´e and Massart (2007) on the existence of minimal penalties to heteroscedastic regression with a random design, even if we have to restrict

The purpose of the next two sections is to investigate whether the dimensionality of the models, which is freely available in general, can be used for building a computationally

Comparing our block-coordinate descent algorithm in its kernel version with a gradient descent approach similar to the one used for multiple kernel learning in SimpleMKL.. The

This shows asymptotic optimality of the procedure and extends to the case of the selection of linear models endowed with a strongly localized basis structure, previous

In this work, the first considered approach is a penalized formulation of the Koiter model for a linear elastic shell and its approximation by a “non conforming” finite element

OLS, random-design regression, heteroscedastic noise (regressograms, A. &amp; Massart, 2009; piecewise polynomials, Saumard, 2013). Least-squares density

This paper tackles the problem of selecting among several linear estimators in non- parametric regression; this includes model selection for linear regression, the choice of

We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate