• Aucun résultat trouvé

The degrees of freedom of the group Lasso for a general design

N/A
N/A
Protected

Academic year: 2021

Partager "The degrees of freedom of the group Lasso for a general design"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-00926929

https://hal.archives-ouvertes.fr/hal-00926929

Submitted on 10 Jan 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

The degrees of freedom of the group Lasso for a general design

Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili, Charles-Alban Deledalle, Charles Dossal

To cite this version:

Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili, Charles-Alban Deledalle, Charles Dossal. The degrees

of freedom of the group Lasso for a general design. SPARS’13, Jul 2013, Lausanne, Switzerland. 1

page. �hal-00926929�

(2)

The degrees of freedom of the group Lasso for a general design

Samuel Vaiter and Gabriel Peyr´e Jalal M. Fadili Charles Deledalle and Charles Dossal CEREMADE, CNRS-U. Paris-Dauphine GREYC, CNRS-ENSICAEN-U. Caen IMB, CNRS-U. Bordeaux 1 Place du Mar´echal De Lattre De Tassigny, 6, Bd du Mar´echal Juin 351, cours de la Lib´eration 75775 Paris Cedex 16, France. 14050 Caen Cedex, France. 33405 Talence Cedex, France.

Abstract —In this paper, we are concerned with regression problems where covariates can be grouped in nonoverlapping blocks, from which a few are active. In such a situation, the group Lasso is an attractive method for variable selection since it promotes sparsity of the groups. We study the sensitivity of any group Lasso solution to the observations and provide its precise local parameterization. When the noise is Gaussian, this allows us to derive an unbiased estimator of the degrees of freedom of the group Lasso. This result holds true for any fixed design, no matter whether it is under- or overdetermined. Our results specialize to those of [1], [2] for blocks of size one, i.e. ℓ

1

norm. These results allow objective choice of the regularisation parameter through e.g. the SURE.

I. G

ROUP

L

ASSO AND

D

EGREES OF

F

REEDOM

Consider the linear regression problem Y = Xβ

0

+ ε, where Y is the real n-dimensional response vector, β

0

∈ R

p

is the unknown vector of regression coefficients to be estimated, X ∈ R

n×p

is the design matrix whose columns are the p covariate vectors, and ε is the error term. In this paper, we do not make any specific assumption on n with respect to p.

Let B be a disjoint union of the set of indices i.e. S

b∈B

= {1, . . . , p} such that b, b

∈ B, b ∩ b

= ∅. For β ∈ R

p

, for each b ∈ B, β

b

= (β

i

)

i∈b

is a subvector of β whose entries are indexed by the block b, and |b| is the cardinality of b. The group Lasso amounts to solving

β(y) b ∈ Argmin

β∈Rp

1

2 ||y − Xβ||

2

+ λ X

b∈B

||β

b

||, (P

λ

(y)) from an observation y ∈ R

n

of the regression model, where λ > 0 is the regularization parameter and || · || is the ℓ

2

-norm.

Let y 7→ b µ(y) = X β(y) b be the response or the prediction associated to β(y), and let b µ

0

= Xβ

0

. We recall that µ(y) b is always uniquely defined (by strict convexity of the fidelity term), although β(y) b may not as is the case when X is a rank-deficient or underdetermined design matrix. Suppose that ε ∼ N (0, σ

2

Id

n

).

Following [3], the DOF is given by df = P

n i=1

cov(Yi,bµi(Y)) σ2

. The well-known Stein’s lemma asserts that, if y 7→ µ(y) b is a weakly differentiable function with an essentially bounded gradient, then an unbiased estimator of df is df(Y b ) = div µ(Y b ) = tr(∂ µ(Y b )) and E

ε

( df(Y b )) = df , where ∂ µ(·) b is the Jacobian of µ(·). b

In the sequel, we define the B-support supp

B

(β) of β ∈ R

p

as supp

B

(β) = {b ∈ B | ||β

b

|| 6= 0}. The size of supp

B

(β) is

| supp

B

(β)| = P

b∈B

|b|. The set of all B-supports is denoted I, and X

I

, where I is a B-support, is the matrix formed by the columns X

i

where i is an element of b ∈ I. We also introduce the following block-diagonal operators

δ

β

: v ∈ R

|I|

7→ (v

b

/||β

b

||)

b∈I

∈ R

|I|

and P

β

: v ∈ R

|I|

7→ (Proj

β

b

(v

b

))

b∈I

∈ R

|I|

, where Proj

β

b

= Id − β

b

β

bT

is the orthogonal projector on β

b

. II. M

AIN

C

ONTRIBUTIONS

The first difficulty we need to overcome when X is not full column rank is that y 7→ β(y) b is set-valued. Toward this goal, we are led to impose the following assumption on X with respect to the block structure.

Assumption ( A (β)): Given a vector β ∈ R

p

of B-support I, we assume that the finite subset of vectors {X

b

β

b

| b ∈ I} is linearly independent.

It is important to notice that (A(β)) is weaker than imposing that X

I

is full column rank, which is standard when analyzing the Lasso.

The two assumptions coincide for the Lasso, i.e. |b| = 1, ∀b ∈ I.

Definition 1: Let λ > 0. The transition space H is defined as H = [

I⊂I

[

b6∈I

H

I,b

, where H

I,b

= bd(π(A

I,b

)), where we have denoted

π : R

n

× R

I,∗

× R

I,∗

→ R

n

, R

I,∗

= Y

b∈I

(R

|b|

\ {0}) the canonical projection on R

n

(with respect to the first component), bd C is the boundary of the set C, and

A

I,b

= n

(y, β

I

, v

I

) ∈ R

n

× R

I,∗

× R

I,∗

| ||X

Tb

(y − X

I

β

I

)|| = λ, X

TI

(X

I

β

I

− y) + λv

I

= 0, ∀g ∈ I, v

g

= β

g

||β

g

||

o . We are now equipped to state our main sensitivity analysis result.

Theorem 1: Let λ > 0. Let y 6∈ H, and β(y) b a solution of (P

λ

(y)). Let I = supp

B

( β(y)) b be the B-support of β(y) b such that (A( β(y))) b holds. Then, there exists an open neighborhood of y O ⊂ R

n

, and a mapping β e : O → R

p

such that

1) For all y ¯ ∈ O, β(¯ e y) is a solution of (P

λ

(¯ y)), and β(y) = e β(y). b 2) the B-support of β(¯ e y) is constant on O.

3) the mapping β e is C

1

(O) and its Jacobian is such that ∀¯ y ∈ O,

∂ β e

Ic

(¯ y) = 0 and ∂ β e

I

(¯ y) = d(y, λ) where d(y, λ) = X

TI

X

I

+ λδ

β(y)b

◦ P

β(y)b

−1

X

TI

and I

c

= {b ∈ B | b / ∈ I} .

The next theorem provides a closed-form expression of the local variations of y 7→ µ(y). In turn, when b ε ∼ N (0, σ

2

Id), this will yield an unbiased estimator of the degrees of freedom and of the prediction risk of the group Lasso.

Theorem 2: Let λ > 0. For all y 6∈ H, there exists a solution β(y) b of (P

λ

(y)) with B-support I = supp

B

( β(y)) b such that ( A ( β(y))) b is fulfilled. The mapping y 7→ µ(y) = X b β(y) b is C

1

(R

n

\ H) and,

div( b µ(y)) = tr(X

I

d(y, λ))

where β(y) b is such that (A( β(y))) b holds. Moreover, The set H has Lebesgue measure zero. If Y = Xβ

0

+ ε where ε ∼ N(0, σ

2

Id

n

), then tr(X

I

d(Y, λ)) is an unbiased estimate of the DOF of the group Lasso.

R

EFERENCES

[1] H. Zou, T. Hastie, and R. Tibshirani, “On the “degrees of freedom” of the Lasso,” The Annals of Statistics, vol. 35, no. 5, pp. 2173–2192, 2007.

[2] C. Dossal, M. Kachour, J. Fadili, G. Peyr´e, and C. Chesneau, “The degrees of freedom of penalized ℓ

1

minimization,” to appear in Statistica Sinica, 2012. [Online]. Available: http://hal.archives-ouvertes.fr/hal-00638417 [3] B. Efron, “How biased is the apparent error rate of a prediction rule?”

Journal of the American Statistical Association, vol. 81, no. 394, pp.

461–470, 1986.

Références

Documents relatifs

Later in the clinical course, measurement of newly established indicators of erythropoiesis disclosed in 2 patients with persistent anemia inappropriately low

Households’ livelihood and adaptive capacity in peri- urban interfaces : A case study on the city of Khon Kaen, Thailand.. Architecture,

3 Assez logiquement, cette double caractéristique se retrouve également chez la plupart des hommes peuplant la maison d’arrêt étudiée. 111-113) qu’est la surreprésentation

Et si, d’un côté, nous avons chez Carrier des contes plus construits, plus « littéraires », tendant vers la nouvelle (le titre le dit : jolis deuils. L’auteur fait d’une

la RCP n’est pas forcément utilisée comme elle devrait l’être, c'est-à-dire un lieu de coordination et de concertation mais elle peut être utilisée par certains comme un lieu

The change of sound attenuation at the separation line between the two zones, to- gether with the sound attenuation slopes, are equally well predicted by the room-acoustic diffusion

Si certains travaux ont abordé le sujet des dermatoses à l’officine sur un plan théorique, aucun n’a concerné, à notre connaissance, les demandes d’avis

Using the Fo¨rster formulation of FRET and combining the PM3 calculations of the dipole moments of the aromatic portions of the chromophores, docking process via BiGGER software,