• Aucun résultat trouvé

Suppose that Θ is convex, and that the loss function`: Θ→RO is a1-Lipschitz,

|`(θ1)−`(θ2)| ≤a11−θ2| (20)

and a2-strongly convex: for alls∈[0,1] andθ1, θ2∈Θ,

`(sθ1+ (1−s)θ2)−a22(sθ1+ (1−s)θ2)2≤s

`(θ1)− a22θ1

2

+ (1−s)

`(θ2)− a22θ2

2

(both inequalities above are understood pointwise). Then the modulus of continuity of ` is lower-bounded by ρ7→ a82ρ2 in the sense that, for allθ1, θ2∈Θ,

1

2(`(θ1) +`(θ2))−` 1212)

a821−θ2)2 (21)

(pointwise). Let P be a law on O such that P `(θ) is well defined for all θ ∈ Θ, where we note P f :=R

f dP. Chooseθ ∈Θ such thatP `(θ)≤P `(θ) for allθ∈Θ. Then, for all θ∈Θ,

1

2P(`(θ) +`(θ))≥P `(12(θ+θ)) + a82P(θ−θ)2

≥P `(θ) +a82P(θ−θ)2

≥P `(θ) +8aa22 1

P(`(θ)−`(θ))2,

where the first inequality follows from (21), the second holds by convexity of Θ and choice of θ, and the third one follows from (20). Therefore,

P(`(θ)−`(θ))24aa21

2 P(`(θ)−`(θ)), which concludes the argument.

B Proofs

B.1 Proof of Theorem 1

The proof unfolds in three steps.

Step 1: an algebraic decomposition. For allj ∈JJK,t≥1 andθ∈Θ, let us define

Hej,t:=Rej,t−Ret), Hbj,t :=Rbj,t−Rbt), and

`(θ)( ¯¯ Ot,Z¯t) := ¯`(θ)( ¯Ot,Z¯t)−`(θ¯ )( ¯Ot,Z¯t)

(¯`(θ) is defined in (2)). Fix arbitrarilya >0. An algebraic decomposition at the heart of all studies of the Super Learner [see,e.g,Dudoit and van der Laan,2005,van der Laan et al.,2007,Benkeser et al., 2018]) states that the excess risk of the Super Learner (that is, He

bjt,t) can be bounded by (1 + 2a) times the excess risk of the oracle (that is,He

ejt,t), plus some remainder terms:

He

bjt,t≤(1 + 2a)He

ejt,t+A

bjt,t(a) +B

ejt,t(a)

≤(1 + 2a)Heej

t,t+ max

j∈JJK

{Aj,t(a)}+ max

j∈JJK

{Bj,t(a)} (22)

where

Aj,t(a) := (1 +a)

Hej,t−Hbj,t

−aHej,t and Bj,t(a) := (1 +a)

Hbj,t−Hej,t

−aHej,t.

The first terms in the definitions ofAj,t(a) andBj,t equal±(1 +a) times 1

t

t

X

τ=1

`(θ¯ j,τ−1)( ¯Oτ,Z¯τ)−E

`(θ¯ j,τ−1)( ¯Oτ,Z¯τ)

τ, Fτ−1 ,

that is as the average of the t first terms of a martingale difference sequence. As for the shared second term in the definitions of Aj,t(a) and Bj,t, it satisfies −aHej,t ≤ 0. The second step of the proof consists in exploiting two so-called Bernstein’s inequalities to control the probabilities P[Aj,t(a)≥x] and P[Bj,t(a)≥x] forx≥0.

Step 2: Bounding positive deviations of Aj,t(a) and Bj,t(a). Set arbitrarily two integers N, N0 ≥2 and a real number x≥0. The analysis of P[Bj,t(a) ≥x] is exactly the same as that of P[Aj,t(a) ≥ x], so we present only the latter. The key to the analysis is a so-called stratification argument inspired by Cesa-Bianchi and Gentile[2008].

For every j ∈ JJK and t ≥ 1, recall the definitions (15) and (16) of varj,t and varfj,t. On the one hand, byA4 and because the functions of a real variableu7→u2 and u7→uβ are respectively

convex and concave, it holds that

almost surely. Moreover, it also holds that varfj,t ≤v2 almost surely by Theorem 3. The previous upper bound and (23) play a key role in the first version of Step 2 (Step 2 (v1)) presented below.

On the other hand, byA2,A4, and because the functionu7→uβ is concave it holds almost surely that, for all τ ∈JtK,

Consequently if Hej,t ≤ B (an inequality that holds almost surely when B =b1, by A3), then it also holds that second version of Step 2 (Step 2 (v2)) presented below.

Step 2 (v1). Setv(−1)2 := 0 and, for alli∈JN−1K,v(i)2 := 2i+1−N×v2. In view of (23) and since

≤ A3 and the Fan-Grama-Liu concentration inequality for martingales [Theorem 3.10 inBercu et al.,2015], (25) implies

P[Aj,t(a)≥x]≤

• Moreover if x≥xi, then v(i)2 ≤(x+a(v(i−1)2 /γ)1/β)×b2/(3(1 +a)) hence

Therefore, in light of (26), (28), (29) and the definitions ofC1(a), C2(a) given in the theorem, for all x≥x(a, N), it holds that

By A3 and A4, Theorem 3applies and (31) yields

P[Aj,t(a)≥x]≤

Set arbitrarily i∈JN0−1K∪ {0}and definex0i:= 32e(1 +a)γ(tb(i)1 )β/(15b2)−ab(i−1)1 .

Step 3: end of the proof. In view of (22), a union bound implies that from (36) and the above inequality. This completes the proof of Theorem 1.

B.2 Proof of Corollary 2

Corollary2follows from the straightforward application, twice, of the next technical lemma, based on (7) on the one hand and on (8) on the other hand.

Lemma 4. Let a, b, c > 0, β ∈]0,1] be some constants and (x(N))N≥2 be a sequence of positive numbers that decreases to 0. Let U be a real valued random variable such thatE[|U|]<∞ and, for all integer N ≥2 and all x≥x(N)>0, Proof of Lemma 4. It is well known that

E[U]≤E[U1{U ≥0}] =

Let us denote by L and R the above left-hand side and right-hand side integrals. Choose N ≥ min{n≥2 :x(n)≤b1/(2−β),log(an)≥1}.

Bounding L. LetxLbe chosen so thataNexp(−x2−βL /b) = 1,i.e.,xL:= (blog(aN))1/(2−β). Now, thanks to the change of variable u=x2−β/band because u7→u1/(2−β)−1 is nonincreasing,

L=xL+aN Z

xL

exp(−x2−β/b)dx

=xL+aN b1/(2−β) Z

log(aN)

exp(−u)u1/(2−β)−1du

≤xL+aN(blog(aN))1/(2−β) log(aN)

Z

0

exp(−u)du

=xL(1 + 1/log(aN))≤2(blog(aN))1/(2−β). (40)

Bounding R. Let xR be chosen so that aNexp(−xR/c) = 1, i.e., xR := clog(aN). It is readily seen that

R=xR+aN Z

xR

exp(−x/c)dx=xR+acNexp(−xR/c) =c(1 + log(aN)). (41)

In view of (39), (40), (41), and by choice ofN, we obtain

E[U]≤b1/(2−β)+ 2(blog(aN))1/(2−β)+c(1 + log(aN))

≤3(blog(aN))1/(2−β)+ 2clog(aN).

This completes the proof.

Set t≥1 and a∈]0,1]. In view of (7), Lemma4 yields (9) under the sufficient condition that N ≥ 2 also satisfy (10). Moreover, in view of (8), Lemma 4 also yields (11) under the sufficient condition thatN ≥2 also satisfy (12). This completes the proof of the corollary.

B.3 Proof of Theorem 3

The proof of Theorem 3 hinges on a Bernstein-like concentration inequality for sums of partly dependent random variables shown by Janson [2004, Theorem 2.3]. Janson emphasizes that his

theorem uses the independence of suitable (large) subsets of (ζα)α∈A, not any other information on the dependencies, so that the result must be suboptimal when the dependencies that exist are weak. We recall the theorem for completeness.

Theorem 5 (Janson [2004]). Let (ζα)α∈A be a collection of random variables with dependency graph G such that ζα−E[ζα]≤B for some B >0 and all α∈ A. DefineZ:=|A|−1P

Note that (18) from Theorem 3also writes as

P

Following the line of proof of the Rosenthal inequality byPetrov[1995, page 59] (see also the proof of Theorem 5.2 in [Baraud,2000]), we use (42) to controlE[|Z−E[Z]|p] henceE[|Hbj,t−Hej,t|p] (by convexity) for all p≥2. The bound (18) follows as in [Dedecker, 2001, proof of Corollary 3(b)], a method inspired by the proof of Theorem 6 in [Doukhan et al.,1984].

We first prove this corollary of Theorem 5. The constants are in no way optimal.

Corollary 6. In the context of Theorem 5, for all p≥2,

E[|Z−E[Z]|p]≤ 3π any nonnegative random variable U. Let r > 0 be a constant that we will carefully choose later on. Set arbitrarilys≥0, definem:=s/r, and introduce

m:=|A|−1X

α∈A

α−E[ζα])1{|ζα−E[ζα]|< m}.

It holds that and (44), the rightmost term in (44), say Ip, satisfies

Ip

Because h(u)≥ u2 log(1 +u) for allu≥0, (45) yields bound ˜V ≤V thus imply that the rightmost term in (46) is smaller than

2p

We now chooser:= 5(p+ 1) deg(G)/|A|to guarantee the convergence of the above integral, toπ/2, and conclude that (44) and (46) imply

E[|Z−E[Z]|p]≤rpE[max

α∈Aα−E[ζα]|p] +π(rV)p/2(2p/2+p(5/4)p/2)

≤(rB)p+π(p+ 1)(2rV)p/2. (47)

Finally, since (p+ 1)/p≤3/2 andp2/p≤e2/e ≈2.61, we can simplify (47) to (43), thus completing the proof of Corollary6.

Fix arbitrarilyj ∈JJK,t≥1,V >0, andp≥2. To save space introduce, for eachτ ∈JtK, Zj,τ := ∆`(θ¯ j,τ−1)( ¯Oτ,Z¯τ)−E

`(θ¯ j,τ−1)( ¯Oτ,Z¯τ)

τ, Fτ−1

.

In view of A1, A2, A3 and A5, Corollary 6 applies and guarantees that almost surely, for all τ ∈JtK,

It is now easy to show that varft,j ≤v2 almost surely (see (6) for the definition of v2). By A5,

Therefore Markov’s inequality implies that, for allx >0,

P

By the technical Lemma 7, there existspx >0 such that

x|A|= 15eb2deg(G)px/2 +p

Otherwise, P[|Hbj,t −Hej,t| ≥ x,FeV] ≤ exp(2−qx) holds trivially. This completes the proof of

Theorem3.

Lemma 7. For anya, b, c >0, there existsp >0such thatc=b√

p+ap. Moreover,c2 ≤(b2+2ac)p.

Proof of Lemma 7. The quadratic equation c = bX+aX2 has a positive solution, so there does exist p > 0 such that c = b√

p+ap. Moreover, c2/p = b2+ 2ab√

p+a2p on the one hand and 2ac = 2ab√

p+ 2a2p ≥ 2ab√

p+a2p on the other hand, implying that c2/p ≤ b2 + 2ac. This completes the proof.

Documents relatifs