Bootstrap of constraint estimators with application to rank estimation.

(1)

Bootstrap of constraint estimators with application to rank estimation.

Fran¸cois Portier

IRMAR-University of Rennes 1

April 17, 2012

Fran¸cois Portier (IRMAR) Bootstrap of constraint estimators April 17, 2012 1 / 20

(2)

Introduction to Bootstrap

Goal: To reproduce the asymptotic behavior of some estimators.

Means: Creation of a new sample which “look like” the previous one.

•Bootstrap of Efron [Efron(1982)] :

Suppose that (X₁, ...,X_n) i.i.d. with lawP. We draw (X₁^∗, ...,X_n^∗) with respect to the law

Pb=n⁻¹

n

X

i=1

δX_i.

Defineθ0=E[X], X =¹_nPn

i=1Xi and X^∗= 1

n

X

i=1

X_i^∗= 1 n

n

X

i=1

N_iX_i withN_i∼ mult(1/n)

•Another bootstrap method:

X^∗=X+1 n

n

X

i=1

iXi= 1 n

n

X

i=1

(i+ 1)Xi

with_i any i.i.d. sequence standard random variable.

(4)

Introduction to Bootstrap

Example of results with both previous bootstrap methods

Ifφis continuously differentiable on a neighborhood ofθ₀=E[X], ifP has a finite second order moment 2, then

√n(φ(X^∗)−φ(X)) bootstrap √

n(φ(X)−φ(θ0)), i.e.

L(√

n(φ(X^∗)−φ(X))|bP)^n→∞= L(√

n(φ(X)−φ(θ0))|P).

Why the bootstrap ? Alternative to the use of the asymptotic law ([Hall(1992)]) for

Building confidence interval

Hypothesis testing (for the choice of quantile)

(5)

Test of equal means: classical bootstrap works

Assumeθ₀∈R,

H0: θ0=θ against H1: θ06=θ To arbitrate:

k√

n(X−θ)k²is compared to

q^∞_α a quantile of the limiting distribution q_α^∗ a quantile of the bootstrap statistic

Level and power

PH0(k√

n(X−θ)k²>q^∞_α orq^∗_α) and PH1(k√

n(X −θ)k²>q_α^∞or q_α^∗) Forq^∞_α: OK.

Forq^∗_α: √

n(X^∗−X)

| {z }

do not depend onH₀orH₁

bootstrap √

n(X−θ0) ⇒OK.

(6)

In general classical bootstrap fails

Assumeθ0∈R²andCis the unit circle,

H0: θ0∈ C against H1: θ0∈ C/

⇒Constraint estimators :

Tb_n=n min

g(θ)=0kX−θk²

Does the classical Bootstrap works ?

UnderH₀, Tb_n=|√

n(φ(X)−φ(θ₀))|² withφ:x → min

g(θ)=0kx−θk Bootstrap candidate:

T_n^∗=|√

n(φ(X^∗)−φ(X))|² Can not work becauseφis notC¹.

⇒Even if we can bootstrap√

n(X−θ0), it is not clear we are able to bootstrap some constraint estimators.

(7)

From nowθ0∈R^p (parameter of interest), it existsθbsome consistent estimators ofθ0. Define the random function

Qbn(θ) = (bθ−θ)^TSb(bθ−θ).

Question

If we can bootstrap√

n(bθ−θ0), does theunder H0-law of

√n(bθc−θ0) with bθc= argmin

g(θ)=0

Qb(θ) can be bootstrapped ?

Applications

Statistics of the kind min

g(θ)=0

Q(θ) to arbitrate between

H0: g(θ0) = 0 and H1:g(θ0)6= 0

(8)

Intuitively

Define

θ_c^∗= argmin

g(θ)=0

Q^∗(θ) and Q^∗(θ) = (θ^∗−θ)^TS^∗(θ^∗−θ).

As traditional bootstrap: we expect results such as

√n(θ^∗_c−θbc) bootstrap√

n(bθc−θ0)

The idea : A good choice of θ

^∗

We try to ”reproduce”H0with

θ^∗=θbc+ ”something going to 0 with good speed and variance”

(9)

θbc = argmin

g(θ)=0

(bθ−θ)^TbS(bθ−θ) and θ^∗_c = argmin

g(θ)=0

(θ^∗−θ)^TS^∗(θ^∗−θ) Assumptions :

1 Sb−→^P S andS^∗−→^P S.

2 S is full rank.

3 g :R^p→R^q isC¹on a neighborhood ofθ0andJg(θ0) is of full rank.

Theorem

Under H0, if√

n(θ^∗−θbc)bootstrap√

n(bθ−θ0)→^d X (Gaussian) we have

√n(bθ_c^∗−θbc)bootstrap √

n(bθc−θ0)under H0.

| {z }

(Gaussian limit)

(1.1)

Under H1, we additionally need to assume the existence ofθc such asθbc a.s.→θc

with g(θc) = 0, to get (1.1).

⇒ θ^∗=θb_c+ (θ^∗_classical−θ) withb θ^∗_classical comes from any methods of classical bootstrap.

(10)

What about T

^∗

?

Corollary

Under the previous set of assumptions under H0 and H1, T^∗= argmin

g(θ)=0

(θ^∗−θ)^TS^∗(θ^∗−θ) bootstrap Tb under H₀

| {z }

weighted Chi-squared limit

Problem

The assumption for convergence underH₁: θb_c ^a.s.→ θ_c need to be check for each case.

(11)

Assumptions :

1 √

n(θ^∗−bθc) bootstrap√

n(bθ−θ0)→^d X (Gaussian)

2 Sb−→^P S andS^∗−→^P S

3 g :R^p→R^q isC¹on a neighborhood ofθ₀andJ_g(θ₀) is of full rank.

4 S is full rank.

Corollary 2

The test with null hypothesis

H0: g(θ0) = 0 againstH1: g(θ0)6= 0

and associated statisticTb with bootstrap calculation of quantile is consistent.

For the test procedure, one can draw

T₁^∗, ...,T_B^∗ to estimateq^∗_α and we do not rejectH₀ifTb ≤q^∗, or rejectH₀ if not.

In other words Corollary 2 means:

⇒The asymptotic level of the test isα.

⇒The power of the test goes to 1.

Rk: This kind of test is pivotal (Chi-squared) whenS =Var(X)⁻¹.

(12)

Application to rank estimation

(13)

Framework and notation

Goal: Estimation of the rank of a matrix Means: Hypothesis testing.

Assumptions

Mb andM are matricesR^p×H such that

√n(~(M)b −vec(M))−→ N^d (0,Γ) bΓ−→^P Γ

Nothing moreor Γ invertibleorΓ =FF^T⊗GG^T invertible.

Notations: rank(M) =d₀, SVD ofM andM:b

M= (U1U0)

D1 0

0 0

V₁^T V₀^T

and Mb = (bU1Ub0) Db1 0 0 Db0

! Vb₁^T Vb₀^T

!

P₁=U₁U₁^T,Q₁=U₀U₀^T,P₂=V₁V₁^T,Q₂=V₀V₀^T,Pb₁,Pb₂,Qb₁Qb₂.

(bλ1, ...,bλp), (resp. (λ1, ..., λp)) singularvalues ofMb (resp. M) in ascending order.

(14)

Short review

Ford= 0, ...,d0, we test

H0: d0=d against H1: d0>d

Some statistics

[Li(1991)] Tb1=n

p−d

X

k=1

bλ²_k (=nkvec(Qb1MbQb2)k²) [Bura and Yang(2011)] Tb₂=nvec(Qb₁MbQb₂)^TbΓ⁺vec(Qb₁MbQb₂) [Cragg and Donald(1997)] Tb₃=n min

rank(M)=d

vec(Mb −M)^TbΓ⁻¹vec(Mb −M) By noting that

Lemma (From PCA)

Pb₁MbPb₂= argmin

rank(M)=d

kMb −Mk²_F = argmin

rank(M)=d

kvec(Mb −M)k²

(15)

we get

[Li(1991)] Tb₁=n min

rank(M)=dkvec(Mb −M)k² [Bura and Yang(2011)] Tb2=nvec(Mb −Mbc)^TbΓ⁺vec(Mb −Mbc) [Cragg and Donald(1997)] Tb3=n min

rank(M)=dvec(Mb −M)^TbΓ⁺vec(Mb −M) withMb_c = argmin

rank(M)=d

kvecv(Mb −M)k².

Application of the results

{rank(M) =d}is a smooth submanifold.

Example of sufficient conditions for bootstrap: It existsξ₁, ..., ξ_n i.i.d. with E[kξ₁k²_F]<+∞such that Mb = ¹_nPn

i=1ξ_i.

⇒Example for Tb1andTb2:M^∗=Pb1MbPb2+1 n

n

X

i=1

iξi

(16)

Example under H

₁

i.e. d < d

₀

when θ b

_c

does not converge

We need to ensure the a.s. convergence of

θbc = argmin

rank(M)=d

kvec(M)b −vec(M)k²=Pb1MbPb2

⇒problem of convergence of eigenprojectors Riesz formula: P_λ=H

C_λ(Iz−M)⁻¹dz.

Suppose thatM andMb are symetric withMb ^a.s.→M, then Pb=H

Cb(Iz−M)b ⁻¹dz

ifλp−d+16=λp−d

=

from a certain rank

H

C(Iz−M)b ⁻¹dz Ifλ_p−d+1=λ_p−d thenP does not exists. Rk: Application of the previous results toMbMb^T andMb^TM.b

(17)

Example under H

₁

i.e. d < d

₀

when θ b

_c

does not converge

We need to ensure the a.s. convergence of

θbc = argmin

rank(M)=d

Cb(Iz−M)b ⁻¹dz ^if^λ^p−d+1=^6=λ^p−d

from a certain rank

H

C(Iz−M)b ⁻¹dz

Ifλ_p−d+1=λ_p−d thenP does not exists. Rk: Application of the previous results toMbMb^T andMb^TM.b

(18)

Example under H

₁

i.e. d < d

₀

when θ b

_c

does not converge

We need to ensure the a.s. convergence of

θbc = argmin

rank(M)=d

Cb(Iz−M)b ⁻¹dz ^if^λ^p−d+1=^6=λ^p−d

from a certain rank

H

C(Iz−M)b ⁻¹dz Ifλ_p−d+1=λ_p−d thenP does not exists.

Rk: Application of the previous results toMbMb^T andMb^TM.b

(19)

Conclusion

Concluding remarks

We provide a general bootstrap procedure for constraint estimator associate to a quadratic function.

The test procedure associate is consistent.

Large application thanks to hypothesis testing.

As an example, it can easily be applied to rank estimation.

Work in progress

Alleviate the underH₁ assumptionθ_c^a.s.→θ_c for theTb stat.

Possibility to extend such results toM−estimator,Z−estimator.

Simulation study : bootstrap vs asymptotic,

and also constraint bootstrap vs traditional bootstrap.

(20)

E. Bura and J. Yang.

Dimension estimation in sufficient dimension reduction: a unifying approach.

J. Multivariate Anal., 102(1):130–142, 2011.

John G. Cragg and Stephen G. Donald.

Inferring the rank of a matrix.

J. Econometrics, 76(1-2):223–250, 1997.

Bradley Efron.

The jackknife, the bootstrap and other resampling plans, volume 38 of CBMS-NSF Regional Conference Series in Applied Mathematics.

Society for Industrial and Applied Mathematics (SIAM), Philadelphia, Pa., 1982.

Peter Hall.

The bootstrap and Edgeworth expansion.

Springer Series in Statistics. Springer-Verlag, New York, 1992.

Ker-Chau Li.

Sliced inverse regression for dimension reduction.

J. Amer. Statist. Assoc., 86(414):316–342, 1991.

(21)

SIR

Sufficient dimension reduction (SDR) introduit par [Li(1991)]: on suppose le mod`ele de r´egression suivant,

Y =g(PZ, ε), Z ⊥⊥ε

o`uY ∈R,Z ∈R^p, P est un projecteur orthogonal de rangd0etg est inconnue.

But de la SDR : Estimation de P .

Enjeux : Obtenir une meilleur vitesse lors de l’estimation de g .

L’inf´erence surP se base sur

E[Z|Y]∈Im(P) p.s.

(22)

SIR

On partitionne l’image deY enH tranches appel´eesI(h)

Enjeux de SIR

Estimer l’espace engendr´e par les vecteurs

E[Z|Y ∈I(1)], . . . ,E[Z|Y ∈I(H)]

Procedure de SIR:

1/ Estimation de

C_h=E[Z1{Y∈I(h)}]∈E_c pour h= 1, ...,H.

2/ Extraire une base de span(bC1, ...,CbH) : Elements propres de la matrice MbSIR =X

h

bp⁻¹_h CbhCb_h^T avecph=P(Y ∈I(h)).

(23)

Trouver la dimension

En notantηb₁, ...,ηb_p les vecteurs propres deMb_SIR dans l’ordre croissant des v.p., on peut estimerPde mani`ere consistante par

Pb =

d₀

X

k=1

ηbkbη_k^T,

maisd₀ est inconnu.

Importance de bien estimer d

₀

Perte dans la valeur explicative du mod`ele.

Vitesse non-param´etrique mauvaise.

Bootstrap of constraint estimators with application to rank estimation.