D Additional proofs

n{bµj−bb^T_jfb(X₂)} −√

n{µbj−b^T_jf¯}| ≤√

nkbbjk₂kfb(X₂)−f¯k₂+√

nkf¯k₂kbbj −bjk₂. This, together with (28), Theorem 4 and (38), implies that with probability at least 1− C2n⁻¹,

1≤j≤pmax |√

n{bµ_j −bb^T_jfb(X₂)} −√

n{µb_j−b^T_jf¯}|

.(Knlogn)^1/2p^−1/2+ (K+ logn)^1/2(w⁻¹_n,p+p^−1/2).

Following the proof of Theorem 5, it can be shown that with probability at least 1−C₃n⁻¹, maxj∈H0

|T_j−T_j^◦|.(Knlogn)^1/2p^−1/2+{K+ log(np)}n^−1/2.

The rest of the proof is almost identical to that of Theorem 1 and therefore is omitted.

In this section, we prove Propositions 2 and 3 in the main text, and Lemmas 33–C.6 in SectionC.

D.1 Proof of Proposition 2

By Weyl’s inequality and the decomposition thatΣb =BB^T+ (Σb −Σ) +Σ_ε, we have

1≤`≤Kmax |bλ`−λ`| ≤ kΣb −Σk₂+kΣ_εk₂ and max

K+1≤`≤p|bλ`| ≤ kΣb −Σk₂+kΣ_εk₂,

where λb1, . . . ,λbp are the eigenvalues of Σb in a non-increasing order. Thus, (20) follows immediately. Next, applying Corollary 1 in Yu et al. (2015) to the pair (Σ,b BB^T) gives that, for every 1≤`≤K,

kvb_`−v_`k₂≤ 2^3/2k(Σb −Σ) +Σ_εk₂ min(λ`−1−λ_`, λ_`−λ_`+1),

where we put λ₀ =∞ and λ_K+1= 0. Under Assumption 2, this proves (21).

D.2 Proof of Proposition 3

To begin with, we introduce the following notation. Define the loss function L_γ(w) = p⁻¹Pp

j=1`γ( ¯Xj−b^T_jw) for w ∈R^K, w^∗ = ¯f and wb = argmin_w∈_R^KLγ(w). Without loss of generality, we assume kBk_max≤1 for simplicity.

Define an intermediate estimatorwb_η =w^∗+η(wb−w^∗) such thatkwb_η−w^∗k₂ ≤r for some r > 0 to be specified below (72). We take η = 1 if kwb−w^∗k₂ ≤ r; otherwise, we choose η ∈ (0,1) so that kwbη−w^∗k₂ =r. Then, it follows from Lemma A.1 in Sun et al.

(2017) that

h∇L_γ(wbη)− ∇L_γ(w^∗),wbη−w^∗i ≤ηh∇L_γ(w)b − ∇L_γ(w^∗),wb−w^∗i, (66) where ∇L_γ(w) =b 0 according to the Karush-Kuhn-Tucker condition. By the mean value theorem for vector-valued functions, we have

∇L_γ(wbη)− ∇L_γ(w^∗) = Z 1

∇²Lγ((1−t)w^∗+twbη)dt(wbη−w^∗).

If, there exists some constanta_min>0 such that

w∈R^K:kw−wmin ^∗k₂≤rλmin(∇²Lγ(w))≥amin, (67) then it follows aminkwbη −w^∗k²₂ ≤ −ηh∇L_γ(w^∗),wb −w^∗i ≤ k∇L_γ(w^∗)k₂kwbη −w^∗k₂, or equivalently,

aminkwbη−w^∗k₂ ≤ k∇L_γ(w^∗)k₂, (68) where∇L_γ(w^∗) =−p⁻¹Pp

j=1ψ_γ(µ_j + ¯ε_j)b_j.

First we verify (67). Write S=p⁻¹B^TBand note that

∇²L_γ(w) = 1 p

j=1

b_jb^T_jI(|X¯_j−b^T_jw| ≤γ),

where ¯X_j−b^T_jw=b^T_j(w^∗−w) +µ_j+ ¯ε_j. Then, for anyu∈S^K−1 andw∈R^K satisfying

kw−w^∗k₂≤r, To bound the last term on the right-hand side of (69), it follows from Hoeffding’s inequality that for any t >0, with probability at least 1−e^−t. This, together with (69) and the inequality

implies that, with probability greater than 1−e^−t, min Taking expectation on both sides gives

E(e^ψ^j`)≤1 +γ⁻¹|µ_j|+γ⁻²(µ²_j +n⁻¹σ_ε,jj).

Moreover, by independence and the inequality 1 +t≤e^t, we get

For anyt >0, it follows from Markov’s inequality that P(pΨj ≥2t)≤e^−2tE(e^pΨ^`)≤exp

Under the constraint (71), it can be similarly shown that P(−pΨ_j ≥ 2t) ≤ e^1−t. Putting the above calculations together, we conclude that

P With the above preparations, now we are ready to prove the final conclusion. It follows from (70) that with probability greater than 1−e^−t, (67) holds withamin=c_l/4, provided

note thath(θ)≥E{(τ|X_j−θ| −τ²/2)1(|X_j−θ|> τ)}for allθ∈R. Combining these upper and lower bounds onh(µe_j,τ) with Markov’s inequality gives

τE{|X_j −µe_j,τ|I(|X_j−µe_j,τ|> τ)}

≤ 1

2τ²P(|X_j −µej,τ|> τ) +1

2σjj ≤ 1

2τE{|X_j−eµj,τ|I(|X_j −µej,τ|> τ)}+1 2σjj, which further implies that for every 0≤λ≤1,

P(|X_j−µe_j,τ|> τ)≤τ⁻¹E{|X_j−µe_j,τ|1(|X_j−µe_j,τ|> τ)} ≤σ_jjτ⁻². Together with (73) and (74), this proves (33).

D.4 Proof of Lemma C.3

Throughout the proof, we let 1≤j ≤p, a≥σ_jj^1/2,t≥1 be fixed and write τ =a(n/t)^1/2 with n ≥ 8t. The dependence of τ on (a, n, t) will be assumed without displaying. First we introduce the following notations. Define functions L(θ) = −P_n

i=1`_τ(X_ij −θ), ζ(θ) = L(θ)−EL(θ) andw²(θ) =−^d²

dθ²EL(θ), such thatµb_j = argmax_θ∈_RL(θ). Moreover, we write w²₀ :=w²(µj) =ατn with ατ =P(|X_j−µj| ≤τ). (75) For everyr >0, define the parameter set

Θ0(r) ={θ∈R:|w₀(θ−µj)| ≤r}. (76) Then, it follows from Lemma C.2that

P{bµj ∈Θ0(r0)} ≥1−2 exp(−t), (77) wherer₀ = 4a(α_τt)^1/2. Based on this result, we only need to focus on the local neighborhood Θ₀(r₀) ofµ_j. The rest of the proof is based on Proposition 3.1 inSpokoiny (2013). To this end, we need to check Conditions (L₀) and (ED2) there.

Condition (L₀): Note that, for every θ∈Θ₀(r),

|w₀⁻¹w²(θ)w⁻¹₀ −1|=|α⁻¹_τ −1−α⁻¹_τ P(|X_j−θ|> τ)|

≤α⁻¹_τ max[1−α_τ,{σ_jj+ (θ−µ_j)²}τ⁻²].

By Chebyshev’s inequality, we have 1≥ατ ≥1−σjjτ⁻² ≥7/8. Therefore,

|w₀⁻¹w²(θ)w⁻¹₀ −1| ≤α⁻¹_τ {σ_jj+ (α_τn)⁻¹r²}τ⁻². This verifies Condition (L₀) by taking

δ(r) =α⁻¹_τ σ_jjτ⁻²+α⁻²_τ τ⁻²n⁻¹r², r >0.

Condition (ED2): Note thatζ⁰⁰(θ) = −Pn

i=1{1(|X_ij −θ| ≤τ)−P(|X_ij −θ| ≤τ)}. For everyλ∈Rsatisfying|λ| ≤ατ

√n, using the inequalities 1+u≤eûandeû≤1+u+u²eû∨0/2

we deduce that Eexp{λ√

nζ⁰⁰(θ)/w²₀}=

i=1

Eexp[−λw₀⁻²√

n{I(|X_ij −θ| ≤τ)−P(|X_ij−θ| ≤τ)}]

≤

i=1

{1 + (1/2)λ²w₀⁻⁴nexp(|λ|w₀⁻²√ n)}

≤

i=1

{1 + (e/2)α⁻²_τ λ²n⁻¹} ≤exp{(e/2)α⁻²_τ λ²}.

This verifies Condition (ED₂) by takingω=n^−1/2,ν₀=e^1/2α⁻¹_τ andg(r) =α_τ√

n,r >0.

Now, using Proposition 3.1 inSpokoiny (2013) we obtain that as long as α²_τn≥4 + 2t, sup

θ∈Θ₀(r)

|α_τ√

n(θ−µj) +n^−1/2{L⁰(θ)−L⁰(µj)}|

≤α^1/2_τ δ(r)r+ 6α^−1/2_τ e^1/2(2t+ 4)^1/2n^−1/2r

with probability greater than 1−e^−t. Under the conditions that n ≥ 8t and t ≥1, it is easy to see that α²_τn≥(7/8)²·8t≥6t≥4 + 2t. Moreover, observe that

sup

θ∈Θ0(r)

|(α_τ−1)√

n(θ−µ_j)| ≤α^−1/2_τ σ_jjτ⁻²r.

The last two displays, together with (77) and the fact thatL⁰(µb_j) = 0 prove (34) by taking r =r₀. The proof of Lemma C.3is then complete.

D.5 Proof of Lemma C.4

Under model (1), we have ξ_j =b^T_jf +ε_j, where E(ε_j) = 0 andε_j and f are independent.

Therefore,

Efψ_τ(ξ_j)−b^T_jf

=−Ef(εj+b^T_jf −τ)I(εj > τ −b^T_jf) +Ef(−ε_j −b^T_jf −τ)I(εj <−τ −b^T_jf).

Therefore, as long as τ >|b^T_jf|, we have for any q∈[2, κ] that

|Efψ_τ(ξ_j)−b^T_jf| ≤Ef{|ε_j|I(|ε_j|> τ− |b^T_jf|)} ≤(τ − |b^T_jf|)^1−qE(|ε_j|^q) almost surely. This proves (35) by takingq to be 2 and κ.

For the conditional variance, observe that

Ef{ψ_τ(ξj)−b^T_jf}²= varf{ψ_τ(ξj)}+{Efψτ(ξj)−b^T_jf}² (78) and thatψ_τ(ξ_j)−b^T_jf can be written as

ε_jI(|b^T_jf +ε_j| ≤τ) + (τ −b^T_jf)I(b^T_jf+ε_j > τ)−(τ +b^T_jf)I(b^T_jf+ε_j <−τ),

which further implies {ψ_τ(ξ_j)−b^T_jf}²

=ε²_jI(|b^T_jf +εj| ≤τ) + (τ −b^T_jf)²I(b^T_jf+εj > τ) + (τ+b^T_jf)²I(b^T_jf +εj <−τ).

Taking conditional expectation on both sides yields Ef{ψ_τ(ξj)−b^T_jf}²

=E(ε²_j)−Ef{ε²_jI(|b^T_jf+εj|> τ)}

+ (τ −b^T_jf)²Pf(ε_j > τ−b^T_jf) + (τ +b^T_jf)²Pf(ε_j <−τ−b^T_jf).

Using the equality u² = 2R_u

0 t dtforu >0 we deduce that as long as τ >|b^T_jf|, Efε²_jI(b^T_jf +ε_j > τ)

= 2Ef

Z ∞

I(ε_j > t)I(ε_j > τ −b^T_jf)t dt

= 2Ef

Z τ−b^T_jf

I(ε_j > τ −b^T_jf)t dt+ 2Ef

Z ∞

τ−b^T_jf

I(ε_j > t)t dt

= (τ −b^T_jf)²Pf(ε_j > τ −b^T_jf) + 2 Z ∞

τ−b^T_jf

P(ε_j > t)t dt.

Analogously, it can be shown that

Ef{ε²_jI(b^T_jf +ε_j <−τ)}= (τ +b^T_jf)²Pf(ε_j <−τ −b^T_jf) + 2 Z ∞

τ+b^T_jf

P(−ε_j > t)t dt.

Together, the last three displays imply 0≥Ef{ψ_τ(ξ_j)−b^T_jf}²−E(ε²_j)

≥ −2 Z ∞

τ−|b^T_jf|

P(|ε_j|> t)t dt≥ −2E(|ε_j|^κ) Z ∞

τ−|b^T_jf|

t^1−κdt=− 2 κ−2

E(|ε_j|^κ) (τ − |b^T_jf|)^κ−2. Combining this with (78) and (35) proves (36).

Finally, we study the covariance cov_f(ψ_τ(ξ_j), ψ_τ(ξ_k)) forj6=k. By definition, covf(ψτ(ξj), ψτ(ξk))

=Ef{ψ_τ(ξ_j)−b^T_jf +b^T_jf−Efψ_τ(ξ_j)}{ψ_τ(ξ_k)−b^T_kf+b^T_kf −Efψ_τ(ξ_k)}

=Ef{ψ_τ(ξ_j)−b^T_jf}{ψ_τ(ξ_k)−b^T_kf}

| {z }

Π1

− {Efψ_τ(ξ_j)−b^T_jf}{Efψ_τ(ξ_k)−b^T_kf}

| {z }

Π2

Recall that ψ_τ(ξ_j)−b^T_jf = ε_jI(|ξ_j| ≤ τ) + (τ −b^T_jf)I(ξ_j > τ)−(τ +b^T_jf)I(ξ_j < −τ).

Hence,

Π₁ =Efε_jε_kI(|ξ_j| ≤τ,|ξ_k| ≤τ) + (τ −b^T_kf)Efε_jI(|ξ_j| ≤τ, ξ_k> τ)

−(τ +b^T_kf)EfεjI(|ξ_j| ≤τ, ξ_k<−τ) + (τ−b^T_jf)Efε_kI(ξj > τ,|ξ_k| ≤τ)

+ (τ −b^T_jf)(τ−b^T_kf)EfI(ξj > τ, ξk> τ)−(τ−b^T_jf)(τ +b^T_kf)EfI(ξj > τ, ξk<−τ)

−(τ +b^T_jf)EfεkI(ξj <−τ,|ξ_k| ≤τ)−(τ+b^T_jf)(τ −b^T_kf)EfI(ξj <−τ, ξ_k> τ) + (τ +b^T_jf)(τ+b^T_kf)EfI(ξ_j <−τ, ξ_k <−τ). (79) Note that the first term on the right-hand side of (79) can be written as

Efε_jε_kI(|ξ_j| ≤τ,|ξ_k| ≤τ)

= cov(ε_j, ε_k)−Efε_jε_kI(|ξ_j|> τ)−Efε_jε_kI(|ξ_k|> τ) +Efε_jε_kI(|ξ_j|> τ,|ξ_k|> τ), where

|Efε_jε_kI(|ξ_j|> τ)| ≤ |τ −b^T_jf|^2−κE(|ε_j|^κ−1|ε_k|)≤2^κ−2τ^2−κ(E|ε_j|^κ)^(κ−1)/κ(E|ε_k|^κ)^1/κ and

|Efεjε_kI(|ξ_j|> τ,|ξ_k|> τ)|

≤ |τ −b^T_jf|^2−κE(|ε_j|^κ/2|ε_k|^κ/2)≤2^κ−2τ^2−κ(E|ε_j|^κ)^1/2(E|ε_k|^κ)^1/2 almost surely on G_jk. The previous three displays together imply

|Efε_jε_kI(|ξ_j| ≤τ,|ξ_k| ≤τ)−cov(ε_j, ε_k)|.τ^2−κ

almost surely on G_jk. For the remaining terms on the right-hand side of (79), it can be similarly obtained that, almost surely on G_jk,

|Efε_jI(|ξ_j| ≤τ, ξ_k> τ)| ≤ |τ −b^T_kf|^1−κE(|ε_j||ε_k|^κ−1),

|Efε_jI(|ξ_j| ≤τ, ξ_k<−τ)| ≤ |τ +b^T_kf|^1−κE(|ε_j||ε_k|^κ−1), and EfI(ξj > τ, ξk <−τ)≤ |τ −b^T_jf|^−κ/2|τ+b^T_kf|^−κ/2E(|ε_jεk|^κ/2).

Putting together the pieces, we get|Π₁−cov(ε_j, ε_k)| .υ_jkτ^2−κ almost surely on G_jk. For Π₂, it follows directly from (35) that |Π₂| . υ²_jkτ^2−2κ almost surely on G_jk. These bounds, combined with the fact that cov_f(ψτ(ξj), ψτ(ξ_k)) = Π1−Π2, yield (37).

D.6 Proof of Lemma C.5

For anyu∈R^K, by independence we have

Eexp(u^Tf_i)≤exp(C₁kfk²_ψ

2kuk²₂) for alli= 1, . . . , n, (80) and Eexp(√

nu^Tf¯) =

i=1

Eexp(u^Tf_i/√

n)≤exp(C₁kfk²_ψ

2kuk²₂),

where C1 >0 is an absolute constant. From Theorem 2.1 in Hsuet al. (2012) we see that, for any t >0,

P{k√

nf¯k²₂ >2C₁kfk²_ψ₂(K+ 2√

Kt+ 2t)} ≤e^−t and P{kf_ik²₂ >2C₁kfk²_ψ

2(K+ 2√

Kt+ 2t)} ≤e^−t, i= 1, . . . , n.

This proves (38) and (39) by the union bound.

For Σb_f, applying Theorem 5.39 in Vershynin (2012) yields that, with probability at least 1−2e^−t,kΣb_f −I_Kk ≤max(δ, δ²), whereδ =C₂kfk²_ψ

2n^−1/2(K+t)^1/2 and C₂ >0 is an absolute constant. Conclusion (40) then follows immediately.

D.7 Proof of Lemma C.6

For each 1≤`≤K, asλ_` >0 and by Weyl’s inequality, we have|eλ_`−λ_`| ≤ |λ_`(Σb_H)−λ_`| ≤ kΣbH−Σk+kΣ_εk. Moreover, note that for any matrixE∈R^d¹^×d²,

kEk² ≤ kEk₁∨ kEk_∞≤(d₁∨d₂)kEk_max. Putting the above calculations together proves (41).

Next, note that

Σb_H=Σb_H−Σ+BB^T+Σ_ε=

`=1

λ_`v_`v^T_` +Σb_H−Σ+Σ_ε.

Under Assumption 2, it follows from Theorem 3 and Proposition 3 inFanet al.(2018) that

1≤`≤Kmax kvb_`−v_`k∞≤ C

p^3/2(kΣb_H−Σk∞+kΣ_εk∞)≤C(p^−1/2kΣb_H−Σk_max+p⁻¹kΣ_εk), where we use the inequalitieskΣb_H−Σk_∞≤pkΣb_H−Σk_max and kΣ_εk_∞≤p^1/2kΣ_εk in the last step and C >0 is a constant independent of (n, p). This proves (42) .

Dans le document FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control∗ (Page 41-49)