Auto-Associative models and generalized Principal Component Analysis

(1)

HAL Id: hal-00985337

https://hal.inria.fr/hal-00985337

Submitted on 29 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

To cite this version:

Stéphane Girard, Serge Iovleff. Auto-Associative models and generalized Principal Component Anal-

ysis. Workshop on principal manifolds for data cartography and dimension reduction, Aug 2006,

Leicester, United Kingdom. �hal-00985337�

(2)

St´ephane Girard

⋆ INRIA, Universit´e Grenoble 1

Joint work with Serge Iovleff, Universit´e Lille 1

(3)

Outline

1. Principal Component Analysis, 2 points of view, 2. Generalized PCA, theoretical aspects,

3. Implementation aspects,

4. Illustration on simulated datasets,

5. Illustration on real datasets.

(4)

1. Principal Component Analysis

• Background : Multidimensional data analysis ( n observations in a p − dimensional space)

• Goal : Dimension reduction.

◦ Data visualization (dimension less than 3),

◦ To find which variables are important,

◦ Compression.

• Method: Projection on low d− dimensional linear subspaces.

(5)

• Let X be a centered random vector in R ^p .

• Estimate the d − dimensional linear subspace d ∈ {0 , . . . , p } minimizing the mean distance to X .

• Minimize with respect to a ¹ , . . . , a ^d (orthonormal):

E





X −

d

X

k=1

X, a ^k a ^k

2 

 .

Explicit solution

• a ¹ , . . . , a ^d are the eigenvectors associated to the d largest eigenvalues of E [ X ^t X ], the covariance matrix of X .

• The a ^k ’s are called principal axes, the Y ^k =

X, a ^k

the principal variables.

• The associated residual is defined by

R ^d = X −

d

X

k =1

X, a ^k

a ^k ,

(6)

PCA: Projection Pursuit interpretation

Equivalent problem

• Estimate the d− dimensional linear subspace d ∈ {0, . . . , p} maximizing the projected variance.

• Maximize iteratively with respect to a ¹ , . . . , a ^d (orthonormal):

Var

X, a ¹

, . . . , Var

X, a ^d

.

(7)

Algorithm

• For j = 0, let R ⁰ = X .

• For j = 1 , . . . , d :

[A] Estimation of a projection axis.

Determine a ^j = arg max

x∈ R ^p E h

x, R ^j−1 2 i

such that a ^j

= 1 and

a ^j , a ^k

= 0, 1 ≤ k < j . [P] Projection.

Compute the principal variable Y ^j =

a ^j , R ^j−1 . [R] Linear regression.

Determine b ^j = arg min

x∈ R ^p E h

R ^j ⁻¹ − Y ^j x

2 i

such that

b ^j , a ^j

= 1 and

b ^j , a ^k

= 0, 1 ≤ k < j . The solution is b ^j = a ^j , and let the regression function be s ^j ( t ) = ta ^j . [U] Residual update.

Compute R ^j = R ^j−1 − s ^j ( Y ^j ).

(8)

Algorithm output. After d iterations, we have the following expansion:

X =

d

X

k=1

s ^k ( Y ^k ) + R ^d , (1)

with s ^k ( t ) = ta ^k and Y ^k =

a ^k , X

, or equivalently

X =

d

X

k=1

a ^k , X

a ^k + R ^d .

This equation can be rewritten as

F ( X ) = R ^d (2)

where we have defined

F ( x ) = x −

d

X

k=1

a ^k , x a ^k .

The equation F (x) = 0 defines a d− dimensional linear subspace, spanned by a ¹ , . . . , a ^d .

Equation (2) defines a d− dimensional linear auto-associative model for X .

(9)

Goals of a generalized PCA

1. To keep an expansion similar to (2):

F ( X ) = R ^d ,

but with a non necessarily linear function F , such that the equation F ( x ) = 0 could model more general subspaces.

2. To keep an expansion “principal variables + residual” similar to (1):

X =

d

X

k=1

s ^k (Y ^k ) + R ^d ,

but with non necessarily linear functions s ^k .

3. To benefit from the “nice” theoretical properties of PCA.

4. To keep a simple iterative algorithm.

(10)

2. Generalized PCA, theoretical aspects

We adopt the Projection Pursuit point of view. The steps [A] and [R] are generalized:

[A] Estimation of a projection axis.

Introduction of an index I which measures the quality of the projection axis. For instance:

• Dispersion,

• Deviation from normality,

• Clusters detection,

• Outliers detection,...

[R] Regression.

Estimation of the regression function from R to R ^p in a given set:

• Linear functions,

• Splines, kernels,...

(11)

New algorithm.

• For j = 0, let R ⁰ = X .

• For j = 1 , . . . , d :

[A] Estimation of a projection axis.

Determine a ^j = arg max

x ∈ R ^p I (

x, R ^j−1

) such that a ^j

= 1 and

a ^j , a ^k

= 0, 1 ≤ k < j . [P] Projection.

Compute the principal variable Y ^j =

a ^j , R ^j−1 . [R] Regression.

Determine s ^j = arg min

s∈S( R , R ^p ) E h

R ^j−1 − s ( Y ^j )

2 i

such that P _a j ◦ s ^j = Id _R and P _a k ◦ s ^j = 0, 1 ≤ k < j .

[U] Residual update

Compute R ^j = R ^j ⁻¹ − s ^j ( Y ^j ).

(12)

Remark: At the end of iteration j , the residual is given by R ^j = R ^j−1 − s ^j Y ^j

= R ^j ⁻¹ − s ^j

a ^j , R ^j−1

= R ^j ⁻¹ − s ^j ◦ P _a ^j R ^j ⁻¹

= Id _R ^p − s ^j ◦ P _a j

R ^j−1

= Id _R ^p − s ^j ◦ P _a j

◦ Id _R ^p − s ^j ⁻¹ ◦ P _a j−1

R ^j ⁻²

= . . .

= Id _R ^p − s ^j ◦ P _a j

◦ . . . ◦ Id _R ^p − s ¹ ◦ P _a 1

R ⁰

= Id _R ^p − s ^j ◦ P _a j

◦ . . . ◦ Id _R ^p − s ¹ ◦ P _a 1

( X ) .

Auto-associative composite model.

(13)

y=P s(y) s

a

a s(y)

• Important consequence: At the end of iteration j , the residual is given by R ^j = Id _R ^p − s ^j ◦ P _a ^j

R ^j−1

, and thus is projection on a ^j is P _a ^j R ^j = P _a ^j − P _a ^j ◦ s ^j ◦ P _a ^j

R ^j−1

= ( P _a j − P _a j ) R ^j−1

= 0 .

Thus, iteration (j + 1) can be performed on the linear subspace orthogonal to (a ¹ , . . . , a ^j ),

(14)

Goal 1. After d iterations:

• One always has an auto-associative model

F ( X ) = R ^d ,

with

F = Id _R ^p − s ^d ◦ P _a d

◦ . . . ◦ Id _R ^p − s ¹ ◦ P _a 1

=

1 a

k=d

Id _R ^p − s ^k ◦ P _a k

,

and P _a j ( x ) =

a ^j , x .

• The equation F ( x ) = 0 defines a d − dimensional manifold.

(15)

Goal 2. After d iterations:

• One always as the expansion “principal variables + residual” similar to (1):

X =

d

X

k =1

s ^k ( Y ^k ) + R ^d ,

and the functions s ^k are non necessarily linear.

• For d = p , the expansion is exact: R ^p = 0.

• We can still define principal axes a ^k and principal variables Y ^k .

• The residuals are centered: E R ^k

= 0, k = 0 , . . . , d .

(16)

Goal 3. After d iterations, we have:

• Some orthogonality properties

a ^k , a ^j

= 0 , 1 ≤ k < j ≤ d, a ^k , R ^j

= 0 , 1 ≤ k ≤ j ≤ d, a ^k , s ^j ( Y ^j )

= 0 , 1 ≤ k < j ≤ d.

• Since the norm of the residuals is decreasing, we can define, similarly to the PCA case, the information ratio represented by the d − dimensional model as

Q _d = 1 − E h

R ^d

2 i .

Var h

k X k ² i .

One can show that Q ₀ = 0, Q _p = 1 and ( Q _d ) is increasing.

Remark. Except in particular cases, the non-correlation of the principal variables is lost:

E

Y ^k Y ^j

6= 0 , 1 ≤ k < j ≤ d.

(17)

Goal 4.

• We still have an iterative algorithm. It converges at most in p steps.

• Its complexity depends on the two steps [A] et [R].

[A] Estimation of a projection axis.

Determine a ^j = arg max

x∈ R ^p I (

x, R ^j ⁻¹

) such that a ^j

= 1 and

a ^j , a ^k

= 0, 1 ≤ k < j . [R] Regression.

Determine s ^j = arg min

s∈S( R , R ^p ) E h

R ^j−1 − s ( Y ^j )

2 i

such that P _a j ◦ s ^j = Id _R and P _a k ◦ s ^j = 0, 1 ≤ k < j .

• Note that the above theoretical properties do not depend on these steps.

(18)

3. Implementation aspects, step [A]

• Contiguity index. Measure of the neighborhood preservation. Points which are neighbor in R ^p should stay neighbor on the axis.

I (

x, R ^j−1 ) =

n

X

i =1

D

x, R _i ^j−1

E 2 , _n X

k=1 n

X

ℓ=1

m _kℓ D

x, R ^j _k ⁻¹ − R ^j−1 _ℓ E 2

,

where M = ( m _kℓ ) is the contiguity matrix defined by

m _kℓ = 1 if R ^j−1 _ℓ is the closest neighbor of R _k ^j−1 , m _kℓ = 0 otherwise.

• Optimization. Explicit solution.

[A] a ^j is the eigenvector associated to the largest eigenvalue of V _j ^⋆ V _j ⁻¹ , where

V _j =

n

X

k =1

t R ^j−1 _k R ^j _k ⁻¹ , V _j ^⋆ =

n

X

k =1 n

X

ℓ =1

m _kℓ ^t ( R ^j _k ⁻¹ − R ^j−1 _ℓ )( R _k ^j−1 − R ^j−1 _ℓ )

are proportional to the covariance and local covariance matrices of R ^j−1 .

(19)

• Set of L ² functions . The regression step reduces to estimating the conditional expectation:

[R] s ^j ( Y _j ) = E

R ^j ⁻¹ | Y _j .

• Estimation of the conditional expectation.

◦ Classical problem since the constraints P _a j ◦ s ^j = Id and P _a k ◦ s ^j = Id, 1 ≤ k < j are easily taken into account in the a ^k ’s basis. Step [R] reduces to ( p − j ) independent regressions from R to R .

◦ Numerous estimates are available: splines, local polynomials, kernel estimates, ...

◦ For instance, for the coordinate k ∈ { j + 1 , . . . , p }, the kernel estimate of s ^j ( u ) can be written as

˜

s ^j _k ( u ) =

n

X

i=1

R ˜ ^j−1 _i,k K _h ( u − Y _i ^j )

, _n X

i=1

K _h ( u − Y _i ^j ) ,

where h is a smoothing parameter (the bandwidth).

(20)

4. First illustration on a simulated dataset

• n = 100 points in R ³ randomly chosen on the curve x → (x, sin x, cos x).

• One iteration h = 0 . 3 → Q ₁ = 99 . 97%.

1

−0

−1 Z

−1 0 1

Y

9.4

0.0

−9.4

X

Theoretical curve

1

−0

−1 Z

−1 0 1

Y

9.0

−0.2

−9.4

X + +

+

+ +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+

+ +

+

++

+

+ +

+ + +

++

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+

+ +

+

++

+

+ +

+ + +

++

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ +

+

Estimated 1− dimensional manifold

(21)

Second illustration on a simulated dataset

• n = 1000 points in R ³ randomly chosen on the surface ( x, y ) → ( x, y, cos( π p

x ² + y ² )(1 − exp{−64( x ² + y ² )})) .

• Two iterations: Q ₁ = 84 . 1% et Q ₂ = 97 . 6%.

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X

Theoretical surface

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X +

+

+ +

+

+ +

+

+ +

+ + +

+ +

+ + +

+ +

+

+ +

+

+ + +

+ +

+

+ +

+ + +

+

+ +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+ + + +

+ +

+ + +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+ + +

+

+ +

+

+ + +

+ +

+

+ +

+ + +

+

+ +

+

+ +

+

+ +

+

+ + +

++

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ + +

+ +

++

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ +

+

+ +

+ + +

+ +

+ + +

+ +

+

+ + +

+ +

+

+ + +

+ + + +

+ +

+ + +

+ +

+

+ +

+ + +

+ +

+

+ +

+

+ +

+

+ +

+ + + +

+ + +

+ +

+

+ + +

+ +

+

+ +

+ + +

+ +

+

+ +

++ +

+

+ +

+

++

+ +

+

+ + +

+ +

+ + +

+ +

+ ++

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ +

+

+ + +

+

+ +

+ + +

+ +

+

+ + +

+

+ +

+ + +

+

+ + +

+ +

+

+ +

+

+ +

+

+ +

+ + +

+ +

+

+ +

+ + +

+ + + +

+

+ +

+

+ + +

+

+ + +

+

+ +

+ + +

+

+ +

+

+ +

+

+ ++

+

+ + + +

+ +

+

+ +

+ + + +

+

+ +

+

+ + +

+

+ +

+

+ +

+

+ +

+

+ + ++

+ + +

+ +

+ + +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+ + +

+ +

+ + +

+

+ +

+

+ +

+

+ + + +

+ + +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

++

+ + +

+ +

+

+ +

+ + + + +

+ +

+ + +

+ +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+ + +

+

+ +

+

+ +

+ + +

+ +

+ + +

+ +

+

+ +

+ + +

+ +

+

+ +

+

+ +

+

+ + + +

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ + +

+ +

+

+ +

+

+ +

+ + + +

+ +

+ + +

+

+ + +

+ +

+

+ +

+

+ +

+

+ +

+

+ + + +

+ + +

+ +

+ + +

+ +

+ + +

+ +

+ + + +

+ +

+

+ +

+ + +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+

+ +

+ + +

+

+ + +

+ +

+

+ ++

+

+ +

+

+ + +

+

Simulated points

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X . . . .. . . .. . .. .

. .. .. .....

......

.. . .. .

. . . . . . .

. . . .. . .. . . .. .

. .......

.....

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .. . . .. . . .. .

. .. . . ....

.....

... . .. .. .. . . . . .

. . . .. . .. . . .. .. ......

.....

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .........

.... . .. .

. . . . . . .

. . . .. . .. . . .. .. .....

........

. .. . . . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .........

... . .. .. .. . . . . .

. . . .. . .. . . .. .. .....

.......

. .. .. .. . . . . .

. . . .. . .. .. . . .. ........

.... . .. .. .. . . . . .

. . . .. . .. .. . . .. ....

......

.. . .. .. .. . . . . .

. . . .. . .. . . .. .. ......

......

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .. ....

......

.. . .. .. . .. . . . .. . . .. . . .. .. .

. .. ..... ........

. .. . . . . . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. . .. . . . .

. . . .. . .. .. . . .. .........

... . .. .. . .. . . . .. . . .. . . .. .. .

. .. .... ......

... . .. .

. . . . . . .

. . . .. . .. .. . . .. ......

......

. .. .. . .. . . . .

. . . .. . .. . . .. .. ......

.......

. .. . . . . . . . .. . . .. . . .. .

. .. . . ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. ......

......

. .. . . . .. . . . .

. . . .. . .. . . .. .

. .......

.....

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. .. . . .. .......

.....

. .. . . . .. . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .. . . .. . . .. .

. .. . . .......

.....

. .. .. .. . . . . .

. . . .. . .. .. . . .. .......

.....

. .. .. .. . . . . .

. . . .. . .. .. . . .. ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. .....

.....

... . .. .

. . . . . . .

. . . .. . .. . . .. .. ....

.......

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. .........

.... . .. .

. . . . . . .

. . . .. . .. .. . . .. ........

.... . .. .

. . .. . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .. . . .. . . .. .. .

. .. .....

.....

... . .. .

. . . . . . .. . . .. . . .. .. .

. .. .........

... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .......

.....

. .. . . . .. . . . .. . . .. . . .. .

. .. .. ......

.......

. .. . . . . . . . .

. . . .. . .. . . .. .

. .... .....

... . .. .. .. . . . . .. . . .. . . .. .

. .. . . ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .......

......

. .. .. . . . . . .

. . . .. . .. .. . . .. .....

......

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. .........

.... . .. .. . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. ...... ......

. .. .. .. . . . . .

. . . .. . .. . . .. .. ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. ....

......

... . .. .

. . .

Auto-Associative models and generalized Principal Component Analysis

HAL Id: hal-00985337

https://hal.inria.fr/hal-00985337

Submitted on 29 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

To cite this version:

Stéphane Girard, Serge Iovleff. Auto-Associative models and generalized Principal Component Anal-

ysis. Workshop on principal manifolds for data cartography and dimension reduction, Aug 2006,

Leicester, United Kingdom. �hal-00985337�

St´ephane Girard

⋆ INRIA, Universit´e Grenoble 1

Joint work with Serge Iovleff, Universit´e Lille 1

Outline

1. Principal Component Analysis, 2 points of view, 2. Generalized PCA, theoretical aspects,

3. Implementation aspects,

4. Illustration on simulated datasets,

5. Illustration on real datasets.

1. Principal Component Analysis

• Background : Multidimensional data analysis ( n observations in a p − dimensional space)

• Goal : Dimension reduction.

◦ Data visualization (dimension less than 3),

◦ To find which variables are important,

◦ Compression.

• Method: Projection on low d− dimensional linear subspaces.

• Let X be a centered random vector in R p .

• Estimate the d − dimensional linear subspace d ∈ {0 , . . . , p } minimizing the mean distance to X .

• Minimize with respect to a 1 , . . . , a d (orthonormal):

E





X −

d

X

k=1

X, a k a k

2 

 .

Explicit solution

• a 1 , . . . , a d are the eigenvectors associated to the d largest eigenvalues of E [ X t X ], the covariance matrix of X .

• The a k ’s are called principal axes, the Y k =

X, a k

the principal variables.

• The associated residual is defined by

R d = X −

d

X

k =1

X, a k

a k ,

PCA: Projection Pursuit interpretation

Equivalent problem

• Estimate the d− dimensional linear subspace d ∈ {0, . . . , p} maximizing the projected variance.

• Maximize iteratively with respect to a 1 , . . . , a d (orthonormal):

Var

X, a 1

, . . . , Var

X, a d

.

Algorithm

• For j = 0, let R 0 = X .

• For j = 1 , . . . , d :

[A] Estimation of a projection axis.

Determine a j = arg max

x∈ R p E h

x, R j−1 2 i

such that a j

= 1 and

a j , a k

= 0, 1 ≤ k < j . [P] Projection.

Compute the principal variable Y j =

a j , R j−1 . [R] Linear regression.

Determine b j = arg min

x∈ R p E h

R j −1 − Y j x

2 i

such that

b j , a j

= 1 and

b j , a k

• Let X be a centered random vector in R ^p .

• Minimize with respect to a ¹ , . . . , a ^d (orthonormal):

X, a ^k a ^k

• a ¹ , . . . , a ^d are the eigenvectors associated to the d largest eigenvalues of E [ X ^t X ], the covariance matrix of X .

• The a ^k ’s are called principal axes, the Y ^k =

X, a ^k

R ^d = X −

X, a ^k

a ^k ,

• Maximize iteratively with respect to a ¹ , . . . , a ^d (orthonormal):

X, a ¹

X, a ^d

• For j = 0, let R ⁰ = X .

Determine a ^j = arg max

x∈ R ^p E h

x, R ^j−1 2 i

such that a ^j

a ^j , a ^k

Compute the principal variable Y ^j =

a ^j , R ^j−1 . [R] Linear regression.

Determine b ^j = arg min

x∈ R ^p E h

R ^j ⁻¹ − Y ^j x

b ^j , a ^j

b ^j , a ^k

= 0, 1 ≤ k < j . The solution is b ^j = a ^j , and let the regression function be s ^j ( t ) = ta ^j . [U] Residual update.

Compute R ^j = R ^j−1 − s ^j ( Y ^j ).

s ^k ( Y ^k ) + R ^d , (1)

with s ^k ( t ) = ta ^k and Y ^k =

a ^k , X

a ^k , X

a ^k + R ^d .

F ( X ) = R ^d (2)

a ^k , x a ^k .

The equation F (x) = 0 defines a d− dimensional linear subspace, spanned by a ¹ , . . . , a ^d .

F ( X ) = R ^d ,

s ^k (Y ^k ) + R ^d ,

but with non necessarily linear functions s ^k .

Estimation of the regression function from R to R ^p in a given set:

• For j = 0, let R ⁰ = X .

Determine a ^j = arg max

x ∈ R ^p I (

x, R ^j−1

) such that a ^j

a ^j , a ^k

Compute the principal variable Y ^j =

a ^j , R ^j−1 . [R] Regression.

Determine s ^j = arg min

s∈S( R , R ^p ) E h

R ^j−1 − s ( Y ^j )