Regularization methods for Sliced Inverse Regression Caroline Bernard-Michel, Laurent Gardes, Stéphane Girard

(1)

HAL Id: hal-00985822

https://hal.inria.fr/hal-00985822

Submitted on 30 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Regularization methods for Sliced Inverse Regression

Caroline Bernard-Michel, Laurent Gardes, Stéphane Girard

To cite this version:

Caroline Bernard-Michel, Laurent Gardes, Stéphane Girard. Regularization methods for Sliced Inverse Regression. 8th International Conference on Operations Research„ Feb 2008, La Havane, Cuba. �hal- 00985822�

(2)

Regularization methods for Sliced Inverse Regression

Stéphane Girard

Team Mistis, INRIA Rhône-Alpes, France http ://mistis.inrialpes.fr/˜girard

February 2008

Joint work with Caroline Bernard-Michel and Laurent Gardes

(3)

Outline

1 Sliced Inverse Regression (SIR)

2 Inverse regression without regularization

3 Inverse regression with regularization

4 Validation on simulations

5 Real data study

(4)

Outline

5 Real data study

(5)

SIR : Goal

[Li, 1991]

Infer the conditional distribution of a response r.v. Y ∈R given a predictor X∈R^p.

Whenp is large, curse of dimensionality.

Sufficient dimension reduction aims at replacingX by its projection onto a subspace of smaller dimension without loss of information on the distribution of Y givenX.

The central subspaceis the smallest subspace S such that, conditionally on the projection of X onS,Y andX are independent.

How to estimate a basis of the central subspace ?

(6)

SIR : Basic principle

Assumedim(S) = 1 for the sake of simplicity,i.e. S=span(b), withb∈R^p =⇒ Single index model :

Y =g(b^tX) +ξ whereξ is independent ofX.

Idea:

Find the directionb such thatb^tX best explains Y. Conversely, when Y is fixed,b^tX should not vary.

Find the directionb minimizing the variations ofb^tX given Y. In practice:

The range of Y is partitioned intoh slicesS_j. Minimize the within slice variance ofb^tX under the normalization constraint var(b^tX) = 1.

Equivalent to maximizing the between slice varianceunder the same constraint.

(7)

SIR : Illustration

(8)

SIR : Estimation procedure

Given a sample{(X1, Y₁), . . . ,(X_n, Y_n)}, the directionb is estimated by

ˆb= argmax

b

b^tΓbˆ u.c. b^tΣbˆ = 1. (1)

whereΣˆ is the estimated covariance matrix and Γˆ is the between slice covariance matrix defined by

Γ =ˆ Xh j=1

n_j

n( ¯X_j−X)( ¯¯ X_j−X)¯ ^t, X¯_j = 1 n_j

X

Yi∈Sj

X_i,

with n_j is proportion of observations in sliceS_j. The optimization problem (1) has an explicit solution :ˆbis the eigenvector of Σˆ⁻¹Γˆ associated to its largest eigenvalue.

(9)

SIR : Limitations

Problem :Σˆ can be singular, or at least ill-conditioned, in several situations.

Since rank( ˆΣ)≤min(n−1, p), ifn≤pthen Σˆ is singular.

Even whenn andpare of the same order, Σˆ is ill-conditioned, and its inversion introduces numerical instabilities in the estimation of the central subspace.

Similar phenomena occur when the coordinates ofX are highly correlated.

(10)

SIR : Numerical experiment (1/2)

Experimental set-up.

A sample {(X1, Y₁), . . . ,(X_n, Y_n)}of size n= 100 where Xi∈R^p with p= 50andYi∈R, fori= 1, . . . , n.

X_i∼ Np(0,Σ) with Σ =Q∆Q^t where

∆ =diag(p^θ, . . . ,2^θ,1^θ),

Qis a matrix drawn from the uniform distribution on the set of orthogonal matrices.

=⇒ The condition number ofΣ isp^θ.(Here,θ= 2).

Y_i=g(b^tX_i) +ξ where

g is the link functiong(t) = sin(πt/2),

bis the true directionb= 5⁻¹^/²Q(1,1,1,1,1,0, . . . ,0)^t, ξ∼ N₁(0,9.10⁻⁴)

(11)

SIR : Numerical experiment (2/2)

Blue : Projectionsb^tX_i on the true directionbversus Yi, Red : Projections ˆb^tX_i on the estimated direction ˆb ver- susYi,

Green : b^tX_i versus ˆb^tX_i.

(12)

Outline

5 Real data study

(13)

Single-index inverse regression model

Model introduced in[Cook, 2007].

X =µ+c(Y)V b+ε, (2)

where

µ andbare non-random R^p− vectors, ε∼ N_p(0, V), independent of Y,

c:R→R is a nonrandom coordinate function.

Consequence :The conditional expectation ofX−µgiven Y is a degenerated random vector located in the directionV b.

(14)

Maximum Likelihood estimation (1/3)

Projection estimator of the coordinate function. c(.) is expanded as a linear combination of h basis functions sj(.),

c(.) = Xh j=1

c_js_j(.) =s^t(.)c,

wherec= (c₁, . . . , c_h)^t is unknown and

s(.) = (s1(.), . . . , s_h(.))^t. Model (2) can be rewritten as X =µ+s^t(Y)cV b+ε, ε∼ Np(0, V),

Definition :Signal to Noise Ratio in the direction b.

ρ= b^tΣb−b^tV b b^tV b ,

(15)

Maximum Likelihood estimation (2/3)

Notations

W : the h×hempirical covariance matrix ofs(Y) defined by

W = 1 n

Xn i=1

(s(Y_i)−s)(s(Y¯ _i)−s)¯^t with s¯= 1 n

Xn i=1

s(Y_i).

M : theh×p matrix defined by

M = 1 n

Xn i=1

(s(Y_i)−¯s)(X_i−X)¯ ^t,

(16)

Maximum Likelihood estimation (3/3)

IfW andΣˆ are regular, then the ML estimators are :

Direction:ˆb is the eigenvector associated to the largest eigenvalue λˆ ofΣˆ⁻¹M^tW⁻¹M,

Coordinate:cˆ=W⁻¹Mˆb/ˆb^tVˆˆb, Location parameter :µˆ= ¯X−s¯^tcˆVˆˆb, Covariance matrix :Vˆ = ˆΣ−ˆλΣˆˆbˆb^tΣ/ˆ ˆb^tΣˆˆb, Signal to Noise Ratio :ρˆ= ˆλ/(1−λ).ˆ The inversion ofΣˆ is still necessary.

(17)

SIR : A particular case

In the particular case ofpiecewise constant basis functions s_j(.) =I{.∈S_j}, j= 1, . . . , h,

standard calculations show that

M^tW⁻¹M = ˆΓ

and thus the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue ofΣˆ⁻¹Γ.ˆ

=⇒SIR method.

(18)

Outline

5 Real data study

(19)

Gaussian prior

Introduction of a prior information on the projection ofX onb appearing in the inverse regression model

(1 +ρ)⁻^1/2(s(Y)−s)¯^tcb∼ N(0,Ω).

(1 +ρ)⁻^1/2 is introduced for normalization purposes,

permitting to preserve the interpretation of the eigenvalue in terms of signal to noise ratio.

Ω describes which directions inR^p are the most likely to contain b.

(20)

Gaussian regularized estimators

IfW andΩ ˆΣ +I_p are regular, the ML estimators are

Direction :ˆb is the eigenvector associated to the largest eigenvalue λˆ of(Ω ˆΣ +I_p)⁻¹ΩM^tW⁻¹M,

Coordinate : ˆc=W⁻¹Mˆb/((1 +η(ˆb))ˆb^tVˆˆb), with η(ˆb) = ˆb^tΩ⁻¹ˆb/ˆb^tΣˆˆb,

ˆ

µ,Vˆ andρˆare unchanged.

=⇒The inversion ofΣˆ is replaced by the inversion ofΩ ˆΣ +I_p.

=⇒For a properly chosen prior matrix Ω, the numerical instabilities in the estimation ofbdisappear.

(21)

Gaussian regularized SIR (1/2)

GRSIR :In the particular case of piecewise constant basis

functions, the ML estimatorˆb ofb is the eigenvector associated to the largest eigenvalue of(Ω ˆΣ +I_p)⁻¹ΩˆΓ.

Links with existing methods

Ridge [Zhong et al, 2005]:Ω =τ⁻¹Ip. No privileged direction for bin R^p.τ >0 is the regularization parameter.

PCA+SIR[Chiaromonte et al, 2002] : Ω =

Xd j=1

1 ˆδ_jqˆjqˆ_j^t,

whered∈ {1, . . . , p} is fixed, ˆδ₁ ≥ · · · ≥δˆ_d are the dlargest eigenvalues of Σˆ andqˆ1, . . . ,qˆd are the associated

(22)

Gaussian regularized SIR (2/2)

Three new methods PCA+ridge :

Ω = 1 τ

Xd j=1

ˆ q_jqˆ_j^t.

No privileged direction in thed-dimensional eigenspace.

Tikhonov :Ω =τ⁻¹Σ. Directions with large variance are mostˆ likely.

PCA+Tikhonov :

Ω = 1 τ

Xd j=1

δˆ_jqˆ_jqˆ_j^t.

In thed-dimensional eigenspace, directions with large variance are most likely.

(23)

Outline

5 Real data study

(24)

Validation on simulations

Experimental set-up :Same as previously.

Proximity criterionbetween the true directionband the estimated onesˆb^(r) onN = 100 replications :

PC= 1 N

XN r=1

(b^tˆb^(r))²

0≤PC≤1,

a value close to 0 implies a low proximity : Theˆb^(r) are nearly orthogonal to b,

a value close to 1 implies a high proximity : Theˆb^(r) are approximatively collinear with b.

(25)

Influence of the regularization parameter

logτ versus PC. The “cut-off” dimension and the condition number are fixed (d= 20and θ= 2).

Ridge andTikhonov : significant improvement ifτ is large, PCA+SIR: reasonable results compared to SIR,

PCA+ridge andPCA+Tikhonov : small sensitivity to τ.

(26)

Sensitivity with respect to the condition number of the covariance matrix

θversus PC. The “cut-off” dimension is fixed tod= 20. The optimal regularization parameter is used for each value ofθ.

Only SIRis very sensitive to the ill-conditioning, ridge andTikhonov : similar results,

PCA+ridge andPCA+Tikhonov : similar results.

(27)

Sensitivity with respect to the “cut-off” dimension

dversus PC. The condition number is fixed (θ= 2) The optimal regularization parameter is used for each value ofd.

PCA+SIR: very sensitive to d.

(28)

Outline

5 Real data study

(29)

Estimation of Mars surface physical properties from hyperspectral images

Context :

Observation of the south pole of Mars at the end of summer, collected during orbit 61 by the French imaging spectrometer OMEGA on board Mars Express Mission.

3D image : On each pixel, a spectra containing p= 184 wavelengths is recorded.

This portion of Mars mainly contains water ice, CO₂ and dust.

Goal :For each spectraX ∈R^p, estimate the corresponding physical parameterY ∈R(grain size of CO2).

(30)

An inverse problem

Forward problem.

Physical modeling of individual spectra with a surface reflectance model.

Starting from a physical parameter Y, simulateX=F(Y).

Generation of n= 12,000synthetic spectra with the corresponding parameters.

=⇒Learning database.

Inverse problem.

Estimate the fonctional relationship Y =G(X).

Dimension reduction assumption G(X) =g(b^tX).

b is estimated by SIR/GRSIR,g is estimated by a nonparametric one-dimensional regression.

(31)

Estimated functional relationship

Functional relationship between reduced spectraˆb^tX on the first

(32)

Estimated CO

2

maps

Grain size of CO2 estimated by SIR (left) and GRSIR (right) on an hyperspectral image observed on Mars during orbit 61.

(33)

References

[Li, 1991] Li, K.C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association,86, 316–327.

[Cook, 2007]. Cook, R.D. (2007). Fisher lecture : Dimension reduction in regression. Statistical Science,22(1), 1–26.

[Zhong et al, 2005]. Zhong, W., Zeng, P., Ma, P., Liu, J.S.

and Zhu, Y. (2005). RSIR : Regularized Sliced Inverse Regression for motif discovery. Bioinformatics,21(22), 4169–4175.

[Chiaromonte et al, 2002]. Chiaromonte, F. and Martinelli, J.

(2002). Dimension reduction strategies for analyzing global gene expression data with a response. Mathematical Biosciences,176, 123–144.

(34)

References

R. Coudret, S. Girard & J. Saracco. A new sliced inverse regression method for multivariate response,Computational Statistics and Data Analysis, to appear, 2014.

M. Chavent, S. Girard, V. Kuentz, B. Liquet, T.M.N. Nguyen & J.

Saracco. A sliced inverse regression approach for data stream, Computational Statistics, to appear, 2014.

C. Bernard-Michel, S. Douté, M. Fauvel, L. Gardes & S. Girard.

Retrieval of Mars surface physical properties from OMEGA hyperspectral images using Regularized Sliced Inverse Regression, Journal of Geophysical Research - Planets, 114, E06005, 2009.

C. Bernard-Michel, L. Gardes & S. Girard. A Note on Sliced Inverse Regression with Regularizations,Biometrics, 64, 982–986, 2008.

C. Bernard-Michel, L. Gardes & S. Girard. Gaussian Regularized Sliced Inverse Regression, Statistics and Computing, 19, 85–98, 2009.

A. Gannoun, S. Girard, C. Guinot & J. Saracco. Sliced Inverse Regression in reference curves estimation,