• Aucun résultat trouvé

CONVERGENCE PROPERTIES OF OPTIMAL PREDICTORS

N/A
N/A
Protected

Academic year: 2022

Partager "CONVERGENCE PROPERTIES OF OPTIMAL PREDICTORS"

Copied!
19
0
0

Texte intégral

(1)

CONVERGENCE PROPERTIES OF OPTIMAL PREDICTORS

GH. ZB ˘AGANU

In our paper [8] we studied conditions which imply that the optimal predictor is unique. Here we study this type of optimal predictions. We answer two questions:

(i) when the fact thatXnXimplies thatAL(Xn)AL(X)?, and (ii) is it true that, as in the martingale case,AL(X|Fn) converges almost surely toAL(X|F) if (Fn)n is a filtration? We also make a conjecture concerning the iterativity of the optimal predictors.

AMS 2000 Subject Classification: 60A05, 62A99.

Key words: optimal predictor, iterativity, conditioning.

1. DEFINITIONS AND STATING OF THE PROBLEMS

The loss functions and optimal predictors were studied by several au- thors (see references). However, it seems that the continuity properties of the optimal predictors did not receive attention.

LetL:R2 [0,) be a loss function. We shall often writeLx(y) instead ofL(x, y). Precisely, we accept thatLhas the following properties:

a. Lis measurable and L(x, x) = 0 for anyx;

b. Lx ∈C1(R) ∀x R. The derivative ∂L∂y(x, y) is denoted by F(x, y) orFx(y) (differentiability);

c. (y −x)Fx(y) 0 ∀x, y and (y −x)Fx(y) = 0 iff x = y (strict unimodality);

d. Lx(±∞) =∞ ∀x∈R(unboundedness);

e. ∀y1 < y2 ∃C=C(y1,y2)>0 such that

y1 ≤y≤y2⇒ |Fx(y)| ≤C(|Fx(y1)|+|Fx(y2)|);

f. for any a < bthe mapping Λa,b : (a, b)(−∞,0) given by Λa,b(y) =

F(a,y)

F(b,y) is decreasing and 11 (unicity condition).

Let us denote byLL=LL(Ω,K, P) the set of random variables X with the property that hX(y) < ∞ ∀y R, where, hX(y) = E L(X, y). The notation is inspired from the fact that if L(x, y) = |x−y|p for some p > 1,

MATH. REPORTS9(59),2 (2007), 223–241

(2)

thenLL(Ω,K, P) =Lp(Ω,K, P) is the usual space of thep-integrable random variables.

It was proved in [7] (Theorems 3.6 and 3.7) that ifLfulfills the conditions a−f, then hX :R R is a strictly unimodal function: it decreases on the interval (−∞, α] for someα∈R and increases on the interval [α,∞). This α is unique and is called the optimal predictor of X given by L. We denote ir byα=AL(X). Or, in short,

(1.1) α= arg minhX.

Definition 1.1. A function L which fulfills the properties a.−f. above is called a UDULF (Uniqueness Property Differentiable Loss Function). A functionf :R R is called strictly unimodal at α if in decreases on the interval (−∞, α] and increases on [α,∞). Therefore, x < α < y f(x) >

f(α)< f(y).

Remark 1.1. At this stage, we do not know whether condition d. is really necessary. We were able to produce examples of DULFs and of lotteries X∈ LL such that hX is decreasing/increasing, hence has no minimizer, but all of them failed to fulfill the unicity condition e. We believe that hX is strictly unimodal even without condition d., but we were unable to prove that.

In this paper we study convergence properties of the optimnal predictors.

Namely, we focus on three questions:

1. Law of large numbers. How can one estimate AL(X) from a sample of i.i.d. random variables (Xn)n ?

2. When Xn→X does imply thatAL(Xn) converge toAL(X) ? 3. Is it true that if (Fn)n is a filtration andF =σ(∪Fn) then, as in the martingale case,AL(X|Fn) converges a.s. toAL(X|F) ?

2. THE LAW OF LARGE NUMBERS FOR OPTIMAL PREDICTORS

We start with

Theorem 2.1. Let L be a UDULF and let (Xn)n be a sequence of i.i.d. lotteries from LL. Let hn(y, ω) = [L(X1(ω), y) + L(X2(ω), y) +· · · + L(Xn(ω), y)]/n. Then arg minhn(·, ω) converges almost surely to AL(X). Or, in statistical terms: the random variables αn := arg minhn(·, ω) are a consistent estimator of AL(X).

The proof will rely on

Lemma 2.2. Let hn, hbe real functions defined on R. Suppose that hn are strictly unimodal at αn and his strictly unimodal at α. Suppose also that hn(x)→h(x) for any x∈Γ,where ΓRis a dense subset. Then

(3)

(i) αn→α;

(ii) if hnn) h(α) and, moreover, h is continuous, then hn h uniformly on compact sets.

Proof of Lemma 2.2. (i) So, hn are strictly unimodal atαn,h is strictly unimodal atα and hn(x) h(x) for any x from some dense subset Γ R.

Suppose that αn does not converge to α. Then αn < α−2ε infinitely often or αn > α+ 2ε (i.o.). Let us examine the first variant. Let s < t Γ be such that αn < s < α−2ε < t < α−εThen hn(s)< hn2ε)< hn(t) <

hn−ε) < hn(α) (i.o.) since on [αn,∞) the mappings hn are increasing.

As hn(s) h(s) and hn(t) h(t), this would imply the absurd inequality h(s) h(t) – it contradicts the definition of α. The second posibility (i.e.

αn > α+ 2ε infinitely often) implies the absurd inequality h(s) h(t) if we choose propersand t.

(ii) Notice that ifhn andhare monotonous or convex, this is a standard piece of knowledge, used to prove Glivenko’s theorem. (See, for instance [5].) Our proof is similar. The task is to prove that sup{|hn(x)−h(x)|;a ≤x b} →0 as n→ ∞for any a < b. As Γ is dense, this is obviously equivalent to (2.1) sup{|hn(x)−h(x)|; a≤x≤b} →0 as n→ ∞for any a < b, a, b∈Γ.

Let then a, b Γ, a < b. If α < a < b or a < b < α then, according to (i), αn< a < b forngreat enough or a < b < αn forngreat enough; the proof is standard. Ifa≤α≤b it is obviously enough to prove that

(2.2) sup{|hn(x)−h(x)|; α ≤x≤b} →0 asn→ ∞ and

(2.3) sup{|hn(x)−h(x)|; a≤x≤α} →0 as n→ ∞.

As the proofs are similar, we shall check only (2.2). Let ε > 0 be arbitrary and letδ=δ(ε)>0 be small enough to ensure that if x, x [α, b], |x−x|<

δ ⇒ |h(x)−h(x)|< ε. Let x−1 ≤α ≤x0 < x1 <· · ·< xk ≤b≤xk+1 be a division of the interval [x−1, xk+1] such thatxi−xi−1 < δ, 0≤i≤k+ 1 and x1−x−1 < δ. Letnε be great enough in order that

(2.4) n > nε⇒αn< x1,|hn(xi)−h(xi)|< ε∀1≤i≤k,|hnn)−h(α)|< ε.

Now, letn > nε andx∈[α, b]. There are two cases.

Case 1. xi ≤x xi+1, 1 i≤ k−1. On this small interval bothhn andh are increasing, thus

hn(x)−h(x) ≤hn(xi+1)−h(xi) = [hn(xi+1)−h(xi+1)] + [h(xi+1)−h(xi)]

and

hn(x)−h(x)hn(xi)−h(xi+1) = [hn(xi)−h(xi)] + [h(xi)−h(xi+1)]

(4)

thus

|hn(x)−h(x)| ≤max{|hn(xi+1)−h(xi+1)|+|h(xi+1)−h(xi)|,

|hn(xi)−h(xi)|+|h(xi)−h(xi+1)|}<2ε.

Case 2. x−1 ≤x ≤x1. Now, we have no monotonicity. However, both hn and h are unimodal, hence

hn(x)−h(x)≤hn(x1)∨hn(x−1)−h(α)≤h(x1)∨h(x−1) +ε−h(α)≤2ε and

hn(x)−h(x)hnn)−h(x1)∨h(x−1)h(α)−ε−h(x1)∨h(x−1) =

= min[h(α)−h(x1), h(α)−h(x−1)]−ε−ε−ε.

It comes that|hn(x)−h(x)| ≤ 2εin this case, too. As a result, n > nεsup{|hn(x)−h(x)|;a≤x≤b} ≤ε The proof is complete.

Proof of Theorem 2.1. First, notice that the claim makes sense: for any fixed ω Ω the function hn(·, ω) are indeed strictly unimodal (by Theorem 3.7 from [7]) at some αn(ω). According to the strong law of large numbers, the random variables hn(y,·) converge almost surely to h(y) := EL(X1, y) for any fixedy R. Notice that the function h is strictly unimodal at α = AL(X). Let Ω0 = Ω : hn(y, ω) h(y) for any rational number y}. Then P(Ω0) = 1 and, by Lemma 2.1 αn(ω) α(ω) for any ω 0. Or, in other words,

(2.6) ω∈0 arg minhn(·, ω)→arg minh:=AL(X).

This completes the proof.

Remark. Usually, this estimator is biased. For instance, if L(x, y) = (f(y) f(x))2, with some continuous monotone f, we have AL(X) = f−1(Ef(X)) and αn = f−1([f(X1) +f(X2) +· · ·+f(Xn)]/n); the expected value Eαnhas no reasons to coincide withAL(X). Or, ifL(x, y) =f(x)(x−y)2, with f > 0, then AL(X) = E(Xf(X))/Ef(X) and αn = [X1f(X1) +· · ·+ Xnf(Xn)]/[X1f(X1) +· · ·+Xnf(Xn)].

3. LOSS FUNCTIONS WITH THE CONTINUITY PROPERTY

Here we answer the second question: when Xn X does imply that AL(Xn) converge toAL(X)?

In the usual case, when the loss function is quadratic, thus the optimal predictor is the expectation, it is well known that the necessary and sufficient

(5)

condition in order that the above problem has a positive answer is that the sequence of lotteries (Xn)n be uniformly integrable, namely,

(3.1) ∀ε >0∃C =C(ε) such thatE(|Xn|;|Xn|> C)< ε ∀n1.

The necessity is to be understood in the sense that we always can produce examples of sequencesXn→Xfor which EXndoes nor converge to EX; these sequences always are not uniformly integrable. It is also well known that a sufficient condition of integrability is the Lp-boundedness of the sequence (Xn)n: if EXnp < M < ∞ ∀n for some p > 1, then (Xn)n is uniformly integrable. The strongest condition is the L-boundedness: ess sup|Xn| ≤ M <∞ ∀nfor someM.

Definition 3.1. Let L be a UDULF. We say that L has the continuity property if

(3.2) Xn→X and (Xn)nbounded in LimplyAL(Xn)→AL(X).

Remark 3.2. Notice that the continuity property implies that L is in- cluded inLL.

We shall prove

Theorem 3.1. Let L be a UDULFand F its derivative. Then

(i) if the mapping L is continuous in x for any fixed y R, then L has the continuity property;

(ii) if L has the continuity property, then the mapping F is continuous in x for any fixed y∈R;

(iii) if x Fx(y) is continuous for any y R, then x→ Lx(y) is also continuous for any fixed y R.

In other words, the assertions “L is continuous in x”, “F is continuous inx” and “L has the continuity property” are equivalent.

The proof of (i) relies on Lemma 2.2. Let Xn X such that −M Xn M (a.s.). Then M X M (a.s.), too. On the compact inter- val [−M, M] the mapping x Lx(y) is uniformly continuous and bounded:

L(x, y) C(y) ∀x [−M, M] for some C(y) <∞. Then L(Xn, y) C(y).

AsXn X (a.s.), it follows that L(Xn, y) L(X, y) (a.s.). By Lebesgue’s theorem we see that EL(Xn, y) EL(X, y). By Theorem 3.7 from [7]. the mappings hn(y) := EL(Xn, y) and h(y) = EL(X, y) are strictly unimodal at αn := AL(Xn) and α := AL(X). By (i) from Lemma 2.2, it appears that αn α, thusL has the convergence property.

Another consequence of Lemma 2.2 is

(6)

Corollary 3.2. Let L be a UDULF with the property that x→Lx(y) is continuous for anyy. Then Lis continuous.

Notation. We shall abbreviate a UDULF L which has the continuity property by “L is a CUDULF”. According to Remark 3.2, we see that any CUDULF has the property that L is included in LL, provided that Theo- rem 3.1 is true.

Proof of Corollary 3.2. We know that the mappings x Lx(y) are continuous for any givenyand that xn→x implies thatLxn →Lx uniformly on compact sets (here hn = Lxn and h = Lx). The task is to prove that xn→x,yn→y implyL(xn, yn) L(x, y). Letε >0. Remark that

(3.3) |L(x, y)−L(xn, yn)| ≤ |L(x, y)−L(x, yn)|+|L(x, yn)−L(xn, yn)|. LetM >0 be such thatyn[−M, M] for anyn. Then y is in [−M, M], too.

On the compact interval [−M, M] the sequence (Lxn)nconverges uniformly to Lx; forngreater than some n1 the inequality |L(x, y)−L(xn, y)|< ε/2 holds for any y∈[−M, M], hence

(3.4) n > n1⇒ |L(x, yn)−L(xn, yn)|< ε/2.

On the other hand, the function Lx is continuous; thus, for n greater than somen2 we have |L(x, y)−L(x, yn)|< ε/2. It means that

n > n1∨n2 ⇒ |L(x, y)−L(xn, yn)|< ε/2 +ε/2, thereforeL is continuous.

Theproof of (ii) relies on the following fact.

Lemma 3.3. Let fn, f : I → R be strictly monotonic functions, where I is some interval. Let Jn = Range(fn) and J = Range(f). Suppose that fn→f.

(i)Let(xn)nbe a sequence of reals with the property thatfn(xn)→f(x).

Thenxn→x.

(ii)Moreover, if all the mappings fnandf are continuous(hence Jnand J are intervals, too), then

a. Int (J)lim infnJn :=

m

nJm+n (hencef−1(y)makes sense for any y∈Int(J) ifn is great enough),and

b. fn−1(y)→f−1(y) for any y∈ Int(J); here we agree that we take into account only those ngreat enough for which fn−1(y) makes sense.

(iii) In the particular case when J

n=1Jn, the convergence fn−1(y) f−1(y) holds for any y∈J without any proviso.

Accept for the moment the truth of Lemma 3.3.

(7)

The task is to prove that the mapping x Fx(y) is continuous. Let us choose a non-atomic probability space (Ω,K, P): here we can find random variables with any distribution on the real line.

Let (xn)n be a convergent sequence of reals and x = limxn. Let also a R and p (0,1) be arbitrary. Obviously, there exist lotteries Xn and X with distributionsXn

xn a

p 1−p

and X

x a

p 1−p

such that Xn→X (a.s) (for instance Xn=xn1A+a1B where B =Ac and P(A) =p).

This sequence is bounded in L, thus AL(Xn) AL(X), since we agreed thatL has the convergence property. Recall that

(3.6) AL(Xn) = arg min(pL(xn, y) + (1−p)L(a, y)), AL(X) = arg min(pL(x, y) + (1−p)L(a, y)).

AsL is a UDULF,AL(Xn) and AL(X) are the unique solutions of the equa- tions

(3.7) pF(xn, y) + (1−p)F(a, y) = 0, pF(x, y) + (1−p)F(a, y) = 0.

Letαn=AL(Xn),α=AL(X), Λn(y) = FF(x(a,y)n,y) and Λ(y) = FF(x,y)(a,y). There are two cases.

Case 1. The sequence (xn)n is increasing. Choosea > x. The mappings Λn : [xn, a) (−∞,0] and Λ : [x, a) (−∞,0] are continuous and 11.

Then

(3.8) αn= Λ−1n

p−1

p

, α= Λ−1

p−1

p

. So, our hypothesis becomes

(3.9) Λ−1n Λ−1.

According to Lemma 3.3 (iii) with Jn= [xn, a) and J = [x, a), relation (3.9) implies that Λn Λ, hence FF(x(a,y)n,y) F(x,y)F(a,y) ∀y [x, a) ⇒F(xn, y) F(x, y) ∀y∈[x, a). If we leta→ ∞we conclude that

(3.10) if the sequence (xn)nis increasing,thenF(xn, y)→F(x, y) ∀yx.

Now, choosea < x1. This time Λn(y) = F(xF(a,y)

n,y) and Λ(y) = FF(a,y)(x,y), thus Λ−1n : (−∞,0] [a, xn) and Λ−1 : (−∞,0] [a, x) are decreasing and onto. As Λn and Λ are continuous, these functions are continuous, too. According to Lemma 3.3 (ii), their inverses converge, too. Precisely,

(3.11) Λn(y)Λ(y) ∀y∈[a, x), n great enough.

(8)

Or,

(3.12) if n is great enough, then F(xn, y)

F(a, y) F(x, y)

F(a, y) ∀y∈[a, x).

Lettinga→ −∞, we arrive at

(3.13) if the sequence (xn)n is increasing,thenF(xn, y)→F(x, y) ∀y < x.

Corroborate with (3.10), the result is that

(3.14) if the sequence (xn)n is increasing, thenF(xn, y)→F(x, y) ∀y∈R.

Case 2. The sequence (xn)n is decreasing. This time (3.10) and (3.13) become

(3.15) if the sequence (xn)n is decreasing,

thenF(xn, y)→F(x, y) ∀y≤xand ∀y > x.

We proved that

(3.16) if (xn)n is monotonous andxn→x, thenF(xny)→F(x, y) ∀y∈R.

Now, suppose that (xn)n is not monotonous. But any sub-sequence of it contains a sub-sub-sequence which is monotonous; thus from any sub-sequence of (F(xn, y))n we can extract a sub-sub-sequence which converges to F(x, y).

This means that the sequence (F(xn, y))n itself converges toF(x, y).

Proof of ii) of Theorem 3.1. The task is to prove that if Lis an UDULF such thatx→F(x, y) is continuous ∀y thenL is continuous. The connection betweenF and Lis

(3.17) L(x, y) =

y

x F(x,t)dt.

Lety be greater than x. Letxn→x.

Case 1. (xn)n is decreasing. Let a > x1, Λn(y) = F(xF(a,y)n,y) and Λ(y) =

F(x,y)

F(a,y). We know that Λn Λ. Let ε > 0 be arbitrary, x = x+ε and nε such that

(3.18) n > nε⇒xn< x, |Λn(x)| ≤ |Λ(x)|+ 1.

Let alsoA=|Λ(x)|+ 1, C = max(sup{F(a, t); x≤t≤a}, sup{F(x, t); x≤ t≤a}). Let now y∈[x, a) be arbitrary. Then

|L(x, y)−L(xn, y)|=

y

x F(x,t)dt

y

xn

F(xn,t)dt

x

x |F(x, t)|dt+ x

xn

|F(xn, t)|dt+

y

x|F(x, t)−F(xn, t)|dt.

(9)

The first integral is smaller thanC(x−x) =Cε. For the second x

xn

|F(xn, t)|dt= x

xn

n(t)| |F(a, t)|dt

≤C x

xn

|Λn(t)|dt≤C x

xn

|Λn(x)|dt

(sincet→Λn(t) is decreasing and negative !)=Cε|Λn(x)| ≤ACε(asn > nε).

For the third one

y

x|F(x, t)−F(xn, t|dt=

y

x|Λ(t)Λn(t)| |F(a, t)|dt

≤C

y

x|Λ(t)Λn(t)|dt.

Now, remark that on the compact interval Λnconverges uniformly to Λ (since Λ is continuous and decreasing and the Λn are decreasing and converge to Λ). Thus, the third integral vanishes asn → ∞. As a consequence of these evaluations,

(3.20) lim sup

n→∞|L(x, y)−L(xn, y)| ≤C(1 +A)ε

and, as ε is arbitrary, it means that L(xn, y) L(x, y) for any y (x, a).

Lettinga→ ∞, we conclude that

(3.21) if xn↓x, y > x, thenL(xn, y)→L(x, y).

Case 2. (xn)nisincreasing. Now, things are even simpler, since on the compact interval [x, y] the mappings Λn converge uniformly to Λ. We do not need anyx. Let us prove that

(3.22) if xn→x monotonically and y > x, then L(xn, y)→L(x, y).

If we use the same trick as in the proof of (ii), we see that (3.22) is equivalent to the (aparently stronger) assertion

(3.23) xn→x and y > x⇒L(xn, y)→L(x, y).

The proof is complete ify > x. Ify < x, the proof is similar. Ify=x we have to prove thatx

xnF(xn, t)dt=x

xnF(a, t)Λn(t)dtconverges to 0 and the proof is similar: two cases.

Corollary 3.4. Let L be a UDULF and let F be its derivative. The following assertions are equivalent:

(i) x→L(x, y) is continuous for any fixed y;

(ii)L is continuous;

(iii) Lhas the continuity property;

(iv)x→F(x, y) is continuous for any fixed y.

(10)

Proof. This is a reformulation of Theorem 3.1 corroborated with Corol- lary 3.2.

To be on the safe side we have to prove Lemma 3.3.

Proof of Lemma 3.3. (i) Suppose that fn and f are increasing and that xndoes not converge tox in spite of the fact thatfn(xn)→f(x). Then there exists ε > 0 such that|xn−x| ε (i.o.). It follows that either xn < x−ε (i.o.) or xn > x+ε (i.o.). In the first case, our hypothesis would imply that fn(xn) < fn(x−ε) (i.o.) hence, passing to limit, that f(x) ≤f(x−ε); this contradicts the fact thatf is increasing. In the second case we find a similar contradiction, thatf(x)f(x+ε).

(ii) We prove thatJ lim infnJn. Choose, for instance, the case whenfn andf are all increasing. Letak < bkbe such thatI =

k=1[ak, bk]. Take the two sequences such that the union be increasing, i.e. (ak)k is non-increasing and (bk)k is non-decreasing. Then Jn =

k=1

[fn(ak), fn(bk)], J =

k=1

[f(ak), f(bk)]

and Int(J) =

k=1(f(ak), f(bk)). Lety∈Int(J). Then there existsk such that y (f(ak), f(bk)). Let ε be such that y (f(ak) +ε, f(bk)−ε). We know thatfn(ak)→f(ak) and that fn(bk)→f(bk) as n→ ∞. Letnεbe such that fn(ak)< f(ak)+εand thatfn(bk)> f(bk)−εfor anyn > nε. Then (f(ak)+ε, f(bk)−ε) is included in (fn(ak), fn(bk)) for anyn > nε. Otherwise stated,y∈ (f(ak)+ε, f(bk)−ε)⊂limninf(fn(ak), fn(bk))limninfJn. Now, let us prove the second statement. Lety int(J). Letny be such thatnny ⇒y ∈Jn. Let xn, x be such that y =fn(xn) ∀n > ny and y =f(x). According to (i), xn x, since of course fn(xn) f(y). But xn = fn−1(y) and x = f−1(y).

Now, (iii) is obvious: we do not need to worry aboutfn−1(y); it makes sense for any n.

So, we have a partial answer to the question of convergence of optimal predictors. In the weakest form, L must be a CUDULF. How much can we extend this result?

Theorem 3.5. Let Lbe a CUDULF.Thus L⊂ LL. Let Xnand X be lotteries such thatXn→X. Let alsohn(y) = EL(Xn, y)andh(y) = EL(X, y).

Suppose that Xn∈ LL∀n.

(i) If X∈ LL and hn→h,then AL(Xn) AL(X).

(ii)If (L(Xn, y))n is uniformly integrable for any y∈R,then AL(Xn)

→AL(X).

(iii) If for any y R there exists p(y) > 1 such that the sequence (L(Xn, y))n is bounded in Lp(y),then AL(Xn)→AL(X).

(11)

(iv) If the sequence (L(Xn, y))n is bounded in L2, then AL(Xn) AL(X).

Proof. (i) If X ∈ LL the h is strictly unimodal at AL(X). The claim is a consequence of Lemma 10.2 (i); (ii) is a consequence of the following fact:

if (Yn)n is uniformly integrable and Yn Y (a.s.), then Y L1 and the convergence takes place even in L1. Thus, h(y) <∞ ∀y hence X ∈ LL. So, (ii) is a consequence of (i); (iii) is a consequence of (ii) since any sequence of random variables bounded in Lp,for some p > 1, is uniformly integrable.

Finally, (iv) is an obvious consequence of (iii). It can serve as a rule of thumb:

maybe we are lucky and the sequence (L(Xn, y))nis bounded inL2; if so, then the optimal predictors converge.

Remark 3.3. Actually, AL(X) does not depend on X, but of its dis- tribution µ. In terms of distributions, we can write AL(µ) = arg minhµ, hµ(y) =

L(x, y)dµ(x). The continuity property ofL can then be stated as Theorem 3.7. Let L be a UDULF. Then Lis a CUDULFif and only if for any sequence of distributions on the real linen)n such that Supp(µn) is included in some compact K R for any n,the weak convergence µn⇒µ implies ALn)→AL(µ).

Proof. Suppose thatLis a CUDULF, i.e., ifXn→Xand supXn<

M <∞ thenAL(Xn)→AL(X). Let µn⇒µand Supp(µn)⊂K⊂[−M, M] for some M great enough. On the standard probability space Ω = (0,1), K = B((0,1)), P = λ := Lebesgue measure, the random variables Xn(ω) = Fn−1(ω) are uniformly bounded byM and converge to X(ω) =F−1(ω). Here Fn(x) = µn((−∞, x]) and F(x) = µ((−∞, x]) are the distribution functions ofµn and µ and Fn−1,F−1 are their pseudoinverses. It is a standard piece of knowledge thatXn ∼µn, X ∼µ (see, for instance [6]). Conversely, suppose thatL is a UDULF which has the stated property. LetXn X be lotteries such that|Xn|< M <∞a.s. for everyn.Letµnandµbe the distributions of Xn, X.Then Supp(µn) is included in the compactK= [−M, M] andµn⇒µ.

Moreover, AL(Xn) =ALn) converge toAL(µ) = AL(X). It means that L is a CUDULF.

4. CONDITIONED OPTIMAL PREDICTORS AND BAYESIAN ANALYSIS

In this section we solve the last question. The main question is

Problem. A decision maker with loss function L has some previous in- formation about a lotteryX. How does this fact change his optimal predictor AL(X)?

(12)

Or: suppose that the random variableXdepends somehow by an observ- able radom variableZ. What meaning should have theconditioned optimal predictor AL(X|Z =z)?

First, the quantityAL(X|Z =z) should be some functionα(z). Second, α(z) should be the minimizer of theconditioned expected losshX(y|Z =z) defined by

(4.1) hX(y|Z=z) =E(L(X, y)|Z =z).

It is known that the meaning ofE(Y|Z =z) isE(Y|Z)(ω) whenZ(ω) = z, whereE(Y|Z) is the conditioned expectation ofY givenσ(Z) – the smallest σ-algebra with respect to which Z is measurable. It follows that the natural question is to give a meaning to the random variable AL(X|F), where it is understood that we are in a complete probability space (Ω,K, P) and F is a complete sub-σ-algebra of K. (We need the completeness hypothesis in order to be sure that if X is a random variable and Y =X (a.s.), then Y is measurable, too.)

So, we arrive at

Definition 4.1. LetL a loss function,X ∈ LL be a lottery and F a sub- σ-algebra of F. The random variable Y is called theconditioned optimal predictor of X given F (and it will be denoted byAL(X|F) !) iff

a.Y isF-measurable;

b. E(L(X, Y)|F) E(L(X, Z)|F) for any F-measurable random vari- ableZ.

It is not clear at all why such a mathematical object should exist. That is why we prove the following result.

Theorem 4.1. Suppose the L is a UDULF, i.e., a loss function which satisfies the conditions a.- f. from Section 1. Then AL(X|F) does exist and is unique (modP) for any X from LL.

It will be more convenient to work in terms of distributions, rather than random variables. If µis a distribution on real line, then we shall denote by hµ(y) the integral

L(x, y)dµ(x) and by AL(µ) its minimizer. We shall say that µ ∈ LL if hµ(y) < ∞ ∀y R. We shall also need the following piece of standard knowledge: the existence of the regular conditioned distribution of a random variable. The reader could consult any advanced handbook of probability theory; however, since we shall need the technique in the sequel, we shall sketch the proof in the most simple case.

Theorem RCD (existence of regular conditioned distributions). Let (Ω,K, P) be a complete probability space, X a random variable and let F a complete sub-σ-algebra of K.Then there exists a transition probabilityQfrom

(13)

(Ω,L) to (R, B(R)) such that the equality

(4.2) E(f(X)|F)(ω) =

f(x)Qω(dx) holds forP-almost allω Ω.

The precise meaning is that Qω is a probability measure on (R,B(R)) for every ω and that the mapping ω Qω(B) is F-measurable for every B∈B(R).

Notice that in the particular case when f = 1B, with B a Borelian set on the real line, (4.2) becomes

(4.3) P(X ∈B|F)(ω) =Qω(B).

This is an explanation of the name “distribution”.

Sketch of proof of Theorem RCD. Let

(4.4) G(x, ω) =P(X≤x|F)(ω).

It is known that the conditioned expectation is only definedP-almost surely;

so, the meaning of (4.4) is that we choose an arbitrary version ofE(1(−∞,x]|F).

The random variables Gx defined by Gx(ω) = G(x, ω) have the following properties:

(i) Gx areF-measurable for avery x∈R;

(ii)x < y⇒Gx ≤Gy (a.s.) (since 1(−∞,x]1(−∞,y] !);

(iii) xn ↑ ∞ ⇒ Gxn 1 (a.s.) and xn ↓ −∞ ⇒Gxn 0 (Conditioned Beppo Levi theorem !);

(iv)xn↓x⇒Gxn →Gx (a.s.) (since 1(−∞,xn]1(−∞,x] !).

We shall now consider only the random variablesGx wherex∈Qis a rational number. For rationalsx, ysuch that x < y consider the null sets Ax,y ={ω∈|Gx(ω) > Gy(ω)}. Let A be their union. Let also Bx = |Gx+1

n(ω) does not converge to Gx(ω)} and let B =

x∈QBx. Finally, let C = |Gn(ω) does not converge to 1 or G−n(ω) does not converge to 0}. The set N :=A∪B∪C is a null set, too. Let Ω0 = Ω\N. For ω 0 the mappings fromQ to Rdefined byx⇒G(x, ω) have then the following properties:

(i) x < y⇒Gx(ω)≤Gy(ω);

(ii) xn ↑ ∞, xn Q ∀n Gxn(ω) 1 and xn ↓ −∞, xn Q ∀n Gxn(ω)0;

(iii) xn↓x,xn∈Q∀n, x∈Q⇒ Gxn(ω) Gx (ω).

Now, let

(4.5) Fx(ω) =F(x, ω) =

inf{G(t, ω|t∈[x,)∩Q} if ω∈0

1[0,∞)(x) if ω /∈0

(14)

if ω 0. Let ω 0. If x Q then F(x, ω) = G(x, ω) (x Fx(ω) is an extension ofx→Gx(ω) !), the mappingx→Fx(ω) is right-continuous (there is a small proof here !) and F(−∞, ω) = 0, F(∞, ω) = 1. It means that x F(x, ω) is a distribution function. By Carath´eodory’s theorem, there exists a unique probability distribution on real line , denoted byQω, such that F(x, ω) =Qω((−∞, x]).

If ω /∈0 then obviously F, ω) is the distribution function of Dirac’s distributionε0. The claim is that the equality (10.3) holds for every Borelian set on the real line,B. By (4.4), this is true of setsB of the form (−∞, x] with x a rational number. By standard arguments (either with monotone classes or with Dynkin systems) the claim holds for any Borelian sets. It is again a standard thing that (4.2) and (4.3) are equivalent (the so called “transport formula”).

Now, we can prove our Theorem 4.1.

Let X from LL and let Q be its regular conditioned distribution given F. The transport formula points out that the equation

(4.6) E(L(X, y)|F)(ω) =

L(x, y)Qω(dx)

holds for P-almost all ω Ω. In terms of distributions, (4.6) can also be written as

(4.7) E(L(X, y)|F)(ω) =hQω(y).

We also know thatX ∈ LL,i.e. L(X, y) ∈L1 for any y. Then hQω(y) <∞ for any fixed y.The set

(4.8) D= |hQω(y) = for somey∈Q}

is a null set. Thus, ifω 1 := Ω\D thenhQω(y)<∞ forally∈Q. But we know that the mappingshQω are unimodal. So,

(4.9) ω 1 ⇒hQω(y)<∞ for all y∈R

or, in other terms,Qω are in LL forP-almost all ω Ω. As L is a UDULF, the mappings hQω have a unique minimizer α(ω) for any ω 1. It follows that we can define

(4.10) AL(X|F)(ω) =

α(ω) if ω∈1

0 elsewhere.

We have to prove that this random variable satisfies conditions a. and b. from Definition 10.1.

Measurability. Notice that | AL(X|F)(ω) x} = | AL(X|F)(ω) ≤x} (a.s.) = |α(ω) ≤x} = |hQω is increasing

(15)

on the interval [x,)}(ashQω is strictly unimodal atα(ω) !) =

s,t∈Q, s<t s,t≥t

{ω∈| hQω(s) < hQω(t)} and the last set is in F (because for any fixed s the mappingω→hQω(s) is a version of E(L(X, s)|F) hence it isF-measurable.

Optimum property. We shall use the following result, with a standard proof, left to the reader.

Lemma 4.2.Let Lbe non-negative and measurable. Let Y be F-measu- rable, X a random variable and Q its regular conditioned distribution given F. Then the equation

(4.11) E(L(X, Z)|F)(ω) =

L(x, Z(ω))Qω(dx) holdsP-a.s.

(Hint. Check first (4.11) for L(x, y) = f(x)g(y), then for f = 1C, C a Borel subset in the plane–here you need an argument of monotone class or Dynkin systems–then forLwith finite values (La simple function) and then, by Beppo-Levi, for non-negative L.)

In our case it goes as follows, for ω∈1 we have E(L(X, Z)|F)(ω) =

L(x, Z(ω))Qω(dx)

L(x, α(ω))Qω(dx) (since α(ω) is the minimizer of hQω!) = E(L(X, α)|F)(ω) = E(L(X, AL(X|F)|F)(ω) and the proof is complete.

Suppose now that (Fn)n is an increasing sequence of σ-algebras and F =σ(∪Fn) a so calledfiltration. Is it true (as in the martingale case) that AL(X|Fn) converges a.s. toAL(X|F) ?

Yes, sometimes it is true.

Theorem 4.3.Let L be a UDULF, X∈ LL a lottery, (Fn)n a filtration and F=σ(∪Fn). Then AL(X|Fn)converges a.s. to AL(X|F).

Proof. Let Q[n] be the conditioned regular distributions of X given Fn andQ its regular distribution givenF. Let

hn(y)(ω) =

L(x, y)Q[n]ω (dy) and h(y)(ω) =

L(x, y)Qω(dy).

Then hn(y) = E(L(X|Fn) (a.s.) and h(y) = E(L(X|F) (a.s.). These are good versions of the conditioned expectations because they are continuous and strongly unimodal at αn := AL(X|Fn) and at α := AL(X|F) for any fixedω∈Ω.

Moreover, the usual martingale theorem says that for any fixed y the sequence (hn(y))n converges almost surely to h(y). Let Ω2 = |

(16)

hn(y)(ω) h(y)(ω) for any rationaly}. Then P(Ω2) = 1 and for any fixed ω 2 the sequence of strongly unimodal real functions fn(y) := hn(y)(ω) converges to f(y) := h(y)(ω). According to Lemma 2.2 their modes con- verge too: arg min fn arg min f as n → ∞. But arg min fn = αn(ω) = AL(X|Fn)(ω) and arg min f =α(ω) =AL(X|F)(ω). Thus AL(X|Fn) (ω) AL(X|F)(ω) ∀ω∈2.

Corollary 4.3. Let (Yn)n be a sequence of random variables, L be a UDULFand X a lottery from LL. Then the sequence

(4.12) (AL(X|Y1, . . . , Yn))n converges a.s.

Proof. Standard arguments: if Fn is the σ-algebra σ(Y1, . . . , Yn), then F =σ(Y1, Y2, . . .).

Examples. a. If L(x, y) = f(x)(x−y)2 with f > 0, measurable, then AL(X|F) = E(XfE(f(X(X)|F))|F) ; this is the conditioned generalized Esscher premium.

b. If L(x, y) = (f(y) f(x))2 with f increasing continuous, then AL(X|F) =f−1(E(f(X|F)).

Remark. The loss functions of the form b. have the following

Iterativity property. IfF,G are twoσ-algebras such thatF ⊂ G, then AL(AL(X|G)|F) =AL(X|F). Indeed,

AL(AL(X|G)|F) =f−1(E(f(AL(X|G))|F)) =f−1(E(f(f−1(E(f(X|G))|F)) =

=f−1(E(E(f(X|G)|F )) =f−1(E(f(X|F)).

We believe that the following converse holds

Iterativity conjecture. If a UDULF L has the iterativity property, thenLis equivalent to someL1 which has the formL1(x, y) = (f(x)−f(y))2 for some strictly monotonous differentiablef.

We were unable to prove or to disprove that in the general case. However, we shall check that in the class of the predictors with the additivity property – called by us the generalized Esscher predictors in [7] – our iterativity conjecture holds.

Theorem 4.4. Let λ 0, α R. Define, as in Sections 7 and 8 from [7],

(4.13) Aα,λ(X) = 1

2λln mX−α)

mX(−λ−α) if λ >0, Aα,0(X) = E(Xe−αX) E(e−αX) . Then Aα,λ has the iterativity property if and only if α = ±λ. In that case, the loss functions can be chosen as Lα(x, y) = (fα(y)−fα(x))2 where fα(θ) = eαθ if α = 0, f0(θ) = θ. Or, corroborating with Theorem 7.1: any

Références

Documents relatifs

(iv) It is obvious that the logarithmic type of any non-constant polynomial P equals its degree deg(P ), that any non-constant rational function is of nite logarithmic type, and

Of the most significant and important among those is the ensemble of random trigonometric polynomials, as the distribution of it zeros occurs in a wide range of problems in science

Relaxed formulation for the Steklov eigenvalues: existence of optimal shapes In this section we extend the notion of the Steklov eigenvalues to arbitrary sets of finite perime- ter,

This ceiling shall apply to the set of files comprising nominations to the List of Intangible Cultural Heritage in Need of Urgent Safeguarding and to the Representative List of

semigroup; Markov chain; stochastic gradient descent; online principle component analysis; stochastic differential equations.. AMS

Now, for the conditioned two-dimensional walk S b the characterization of recurrent and transient sets is particularly simple:..

(Heller 1998 defines counterpart relations for properties in terms of similarities of their roles, but has to take these similarities as primitive. But similarities between the roles

Following Proposition 2.2 the set S [0,20] contains a set of principal points and consequently, could be used as starting region of a global optimization method.. Instead of this,