• Aucun résultat trouvé

Associated natural exponential families and elliptic functions

N/A
N/A
Protected

Academic year: 2021

Partager "Associated natural exponential families and elliptic functions"

Copied!
31
0
0

Texte intégral

(1)

HAL Id: hal-01977671

https://hal.archives-ouvertes.fr/hal-01977671

Submitted on 10 Jan 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Associated natural exponential families and elliptic functions

Gérard Letac

To cite this version:

Gérard Letac. Associated natural exponential families and elliptic functions. Conference on Proba-

bility, statistics and their applications, Jun 2015, Aarhus, Denmark. �hal-01977671�

(2)

Associated natural exponential families and elliptic functions

G´erard Letac

To Ole Barndorff-Nielsen for his 80th birthday.

Abstract This paper studies the variance functions of the natural exponential fam- ilies (NEF) on the real line of the form(Am4+Bm2+C)1/2where m denoting the mean. Surprisingly enough, most of them are discrete families concentrated on

λ

Z for some constant

λ

and the Laplace transform of their elements are expressed by elliptic functions. The concept of association of two NEF is an auxilliary tool for their study: two families F and G are associated if they are generated by symmet- ric probabilities and if the analytic continuations of their variance functions satisfy VF(m) =VG(m√

−1). We give some properties of the association before its applica- tion to these elliptic NEF. The paper is completed by the study of NEF with variance functions m(Cm4+Bm2+A)1/2.They are easier to study and they are concentrated on aN.

Primary: 62E10 , 60G51 Secondary: 30E10

Key words: Variance functions, exponential dispersion models, function

of Weierstrass.

1 Foreword

Ole and I met for the first time in the Summer School of Saint Flour in 1986. Having been converted to statistics by V. Seshadri two years before, I had learnt about ex- ponential families through Ole’s book (1978) and I had fought with cuts and steep- ness. Marianne Mora had just completed her thesis in Toulouse and was one of G´erard Letac

Equipe de Statistique et Probabilit´es, Universit´e de Toulouse, 118 route de Narbonne, 31062 Toulouse, France,

[email protected]

1

(3)

the Saint Flour participants: she was the first from Toulouse to make the pilgrim- age to Aarhus the year after, followed by many others researchers from Toulouse:

Evelyne Bernadac, C´elestin Kokonendji, Dhafer Malouche, Muriel Casalis, Abdel- hamid Hassairi, Angelo Koudou and myself. Over the years all of us were in contact with the everflowing ideas of Ole. During these Aarhus days (and Ole’s visits to Toulouse) we gained a better understanding of the L´evy processes, of generalized inverse Gaussian distributions and their matrix versions, of differential geometry ap- plied to statistics. Among all the topics which have interested Ole, the choice today is the one for which he may be the least enthusiastic (see the discussion of Letac 1991), namely the classification of exponential families through their variance func- tions: Ole thought correctly that although the results were satisfactory for the mind, one could not see much real practical applications: in the other hand Mendeleiev is universally admired for its prophetic views of chemical elements which had not been yet discovered. Descriptions of natural exponential families with more and more sophisticated variance functions V have been done: when V is a second degree polynomial in the mean (Morris 1982), a power (Tweedie 1984, Jørgensen 1987), a third degree polynomial (Letac-Mora 1990), the Babel class P+Q

R where poly- nomials P,Q,R have degrees not bigger than 2,1,2 (Letac 1992, Jørgensen 1997).

This is for univariate NEF: even more important works have been done for mul- tivariate NEF, but the present paper will confine to one dimensional distributions only.

Having forgotten variance functions during the last twenty years and having turned to random matrices and Bayesian theory, our interest for the topic has been rejuvenated by the paper by S. Bar-Lev and F. Van de Duyn Schouten (2004). The authors consider exponential dispersion models G such that one of the transforma- tions

T(P)(dx) = xP(dx) R

RxP(dx), T2(P)(dx) = x2P(dx) R

Rx2P(dx).

maps G into itself or one of its translates. For T(P) they obtain exactly the ex- ponential dispersion models concentrated on the positive line with quadratic vari- ance functions: gamma, Poisson, negative binomial and binomial and no others. For T2(P)they obviously obtain the previous ones, but they observe that new variance functions appear, in particular(m4+c2)1/2, without being able to decide whether these natural exponential families (NEF) exist after all (it should be mentioned here that their formula (11) is not correct and this fact greatly invalidates their paper). As a result, our initial motivation was to address this particular question: is(m4+c2)1/2 a variance function? As we shall see, the answer is yes for a discrete set of c.To solve this particular problem, we have to design methods based on elliptic functions, and these methods appear to have a wider domain of applicability. For this reason, the aim of the present paper is the classification of the variance functions of the form (Am4+Bm2+C)1/2and their reciprocals in the sense of Letac-Mora (1990), namely the variance functions of the form m(Cm4+Bm2+A)1/2.

Section 2 recalls general facts and methods for dealing with NEF’s. Section 3 opens a long parenthesis on pairs of associated NEF: if F and G are NEF generated by symmetric probabilities, we say that they are associated roughly if we can write

(4)

VF(m) =f(m2)and VG(m) =f(−m2).This definition seems to be a mere curiosity of distribution theory, but appears to be illuminating when applied to our elliptic NEF. Since this concept of association has several interesting aspects, I have pro- vided here several detailed examples (Section 3.1) that the reader should skip if he is only interested in elliptic NEF.

Section 4 rules out trivial values of the parameters for elliptic NEF. Sections 5-7 investigate the various cases according to the parameters(A,B,C), Section 8 con- siders the reciprocal families of the previous elliptic NEF: these reciprocal NEF are interesting distributions of the positive integers. Section 9 makes brief comments on the variance functions(

α

m+

β

)p

P(m)where P is an arbitrary polynomial of degree≤4,actually a complete new field of research. While the statements of the present paper are understandable without knowledge of elliptic functions, the proofs of Sections 5-7 make heavy use of them, and we shall constantly refer to the mag- nificent book by Sansone and Gerretsen (1960) that we frequently quote by SG.

2 Retrieving an NEF from its variance function

The concept of exponential family is obviously the backbone of Ole’s book (Barn- dorff -Nielsen (1978)) or of Brown (1986), but the notations for the simpler object called natural exponential family are rather to be found in Morris (1982), Jørgensen (1987) and Letac and Mora (1990).

If

µ

is a positive non Dirac Radon measure onRconsider its Laplace transform Lµ(

θ

) =

Z

−∞eθx

µ

(dx)≤∞.

Assume that the interior

Θ

(

µ

)of the interval D(

µ

) ={

θ

∈R; Lµ(

θ

)<∞}is not empty. The set of positive measures

µ

onRsuch that

Θ

(

µ

)is not empty and such that

µ

is not concentrated on one point is denoted byM(R). We denote byM1(R) the set of probabilities

µ

contained inM(R).In other terms, the elements ofM1(R) are the probability laws onRwhich have a non trivial Laplace transform.

Write kµ=log Lµ.Then the family of probabilities F=F(

µ

) ={P(

θ

,

µ

);

θ

Θ

(

µ

)} where

P(

θ

,

µ

)(dx) =eθxkµ(θ)

µ

(dx)

is called the natural exponential family generated by

µ

.Two basic results are m=k0µ(

θ

) =

Z

xP(

θ

,

µ

)(dx)

and the fact that k0µ is increasing (or that kµ is convex). The set k0µ(

Θ

(

µ

)) =MF

is called the domain of the means. We denote by

ψ

µ : MF

Θ

(

µ

)the reciprocal

(5)

function of m=k0µ(

θ

). Thus F=F(

µ

)can be parametrized by MFby the map from MFto F which is

m7→P(

ψ

µ(m),

µ

) =P(m,F).

In other terms an element of F(

µ

)can be identified by the value of the mean m rather than by the value of the canonical parameter

θ

. One can prove that the variance VF(m)of P(m,F)is

VF(m) =k00µ(

ψ

µ(m)) = 1

ψ

µ0(m). (1) The map m7→VF(m)from MFto(0,∞)is called the variance function and character- izes F.The Jørgensen set of

µ

is the set

Λ

(

µ

)of positive numbers t such that there exists a positive measure

µ

t such that

Θ

(

µ

t) =

Θ

(

µ

)and such that Lµt = (Lµ)t. Obviously,

Λ

(

µ

)is an additive semigroup which contains all positive integers. If t

Λ

(

µ

)we denote Ft=F(

µ

t)and it is easily checked that MFt =tMFand that

VFt(m) =tVFm t

. (2)

The union G=G(

µ

) =∪t∈Λ(µ)F(

µ

t) is called the exponential dispersion model generated by

µ

.

If F is a NEF and if h(x) =ax+b (with a6=0)then the family h(F)of images of elements of F by h is still a NEF with Mh(F)=h(MF)and

Vh(F)(m) =a2VF mb

a

. (3)

In spite of the similarity between (2) and (3), the last formula is much more useful for dealing with a NEF F which is known only by its variance function: the reason is that the Jørgensen set

Λ

(F)of F is unknown in many circumstances. In fact

Λ

(F)is a closed additive semi group of[0,∞)which can be rather complicated (see Letac, Malouche and Maurer (2002) for an example). In the other hand an affinity is always defined, and this fact can be use to diminish the number of parameters of a family of variance functions. For instance, if we consider the variance function

Am4+Bm2+C such that C>0, without loss of generality we could assume that C=1 by using the dilation x7→x/

C.

An important fact for the sequel is that VFis real analytic, that means that for any m0MFthere exists a positive number r such that for m0r<m<m0+r we have

VF(m) =

n=0

(m−m0)n

n! VF(n)(m0)

which implies that VF is analytically extendable to a connected open set of the complex plane containing the real segment MF.If

µ

M1(R),the Laplace trans- form Lµ defined on the open interval

Θ

(

µ

)is extendable analytically in a unique way to the strip

Θ

(

µ

) +iRof the complex plane. This extension is also denoted Lµ and

θ

7→Lµ(i

θ

)is the Fourier transform of the probability

µ

. The function

(6)

kµ(

θ

) =log Lµ(

θ

)could be also extendable to an analytic function on the same strip, but it would be a multivalued function if Lµ(

θ

)has zeros in the strip.

To conclude this section, recall the four steps allowing us to pass from the vari- ance function VF of a NEF F to a measure

µ

such that F=F(

µ

).

1. Writing d

θ

=

ψ

µ0(m)dm=Vdm

F(m), we compute

θ

=

ψ

µ(m)as a function of m by a quadrature;

2. we deduce from this the parameter m as a function m=k0µ(

θ

)(this is generally a difficult point);

3. we compute kµ(

θ

)by a second quadrature and obtain Lµ =ekµ;

4. we use dictionary, creativity, or inversion Fourier formulas to retrieve

µ

from its Laplace transform.

We keep these four steps in mind for dealing with VF(m) =√

Am4+Bm2+C in the sequel. It is worthwhile to sketch here an example with

V(m) =p

1+4m4.

For 0<w<1 we do the change of variable and perform the first step:

m=

√1−w4

2w ,w2=−2m2+p

1+4m4,d

θ

= dw

√1−w4,

θ

= Z 1

w(m)

dw 1−w4. The second step introduces a function C(

θ

)defined on the interval [0,K] where K=R01dw

1w4 =1.3098..as

θ

=

Z 1 C(θ))

dw

1−w4. (4)

Since w(m) =C(

θ

), up to the knowledge of C(

θ

),and taking derivative of both sides of (4), the second step is performed since

m=k0(

θ

) =

p1−C(

θ

)4

2C(

θ

) =−C0(

θ

) 2C(

θ

).

The third step is easy and we get k(

θ

) =−12logC(

θ

)and the Laplace transform L(

θ

) = √C(θ)1 .The fourth step needs to be explicit about C(

θ

)and the theory of elliptic functions becomes necessary: details about this particular example are in Theorem 4.1 when doing k2=−1.The function L will be the Laplace transform of a discrete distribution concentrated on set on numbers of the form n/a where n is a relative integer and where a is the complicated number 2Kπ =0.8338... If we use formula (3) we get the following surprizing result: the function√

a4+4m4is the variance function of a NEF concentrated onZ.

(7)

3 Associated natural exponential families

The source of this concept is the pair of identities (5) below: if

µ

(dx) = dx

2 coshπ2x

and

ν

=12(

δ

1+

δ

1)is the symmetric Bernoulli distribution then Z +∞

eθx

µ

(dx) = 1

cos

θ

(for|

θ

|<

π

/2), Z +∞

eiθx

ν

(dx) =cos

θ

, (5) which could be as well presented by reversing the roles of Fourier and Laplace transforms:

Z +∞

−∞ eiθx

µ

(dx) = 1 cosh

θ

,

Z +∞

−∞ eθx

ν

(dx) =cosh

θ

. (6) This is an example of what we are going to call an associated pair(

µ

,

ν

)of proba- bilities onR. Here is the definition:

Definition 3.1 Let

µ

and

ν

be inM1(R)such that

µ

and

ν

are symmetric. We say that(

µ

,

ν

)is an associated pair if for all

θ

Θ

(

µ

)the Fourier transform of

ν

is 1/Lµ(

θ

).In other terms for

θ

Θ

(

µ

)we have

Z +∞

−∞ eθx

µ

(dx) =Lµ(

θ

), Z +∞

−∞ eiθx

ν

(dx) = 1

Lµ(

θ

). (7) The corresponding natural exponential families F=F(

µ

)and G=F(

ν

)are also said to be associated.

We describe now the easy consequences of this definition:

Proposition 3.1 Let(

µ

,

ν

)inM1(R)be an associated pair. Then 1. (Symmetry) The pair(

ν

,

µ

)is also associated;

2. (Uniqueness) If(

µ

,

ν

1)is also associated, then

ν

1=

ν

.

3. (Convolution) If(

µ

0,

ν

0)is an associated pair then(

µ

µ

0,

ν

ν

0)is also an asso- ciated pair.

4. (Zeros) Denote zµ=inf{

θ

>0 ; Lµ(i

θ

) =0}.Then

Θ

(

ν

) = (−zµ,zµ).

5. (Variance functions) Consider the associated pair F=F(

µ

)and G=F(

ν

)of NEF. If VF and VGare extended as analytic functions to the complex plane in a neighborhood of zero, then VF(m) =VG(im).

Comments

(8)

1. Clearly, since the Fourier transform L1

µ(θ) is real, the probability

ν

must be sym- metric.

2. Symmetry of

ν

implies that

Θ

(

ν

)is a symmetric interval, as well as the mean domain of F(

ν

).

3. Because of the uniqueness of Part 2, we shall also write

µ

for indicating that (

µ

,

µ

)is an associated pair. In this case

µ

is called the associated probability to

µ

(when it exists). We also observe that

(

µ

)=

µ

, (

µ

µ

0)=

µ

∗(

µ

0).

4. It is not correct to think that if

µ

is inM1(R)then

µ

always exists. An example is given by the first Laplace distribution (also called the bilateral exponential distribution)

µ

(dx) =1

2e−|x|dx,

Θ

(

µ

) = (−1,1),Lµ(

θ

) = 1 1−

θ

2.

Suppose that

ν

=

µ

exists. Then its Fourier transform on(−1,1)is 1−

θ

2. This implies that its Laplace transform is Lν(

θ

) =1+

θ

2and therefore

Θ

(

ν

) =R. But if kν=log Lµ it is easy to see that the sign of kν00(

θ

)is the sign of 1−

θ

2, which implies that kνis not convex, a contradiction.

A more complicated example is given by

µ

t(dx) = 2t2

πΓ

(t)

Γ

(t+ix

2 )

2

dx (8)

for t>0.We will see in Section 3.1 that

µ

t∈M1(R)satisfies

Θ

(

µ

t) = (−π2,π2) and

2t2

πΓ

(t)

Z +∞

eθx|

Γ

(t+ix

2 )|2dx= 1

(cos

θ

)t. (9) If t is not an integer, then

µ

t=

ν

does not exist (Proposition 3.2). An obvious case is t=1/2 : if X,Y are iid such that Pr(X+Y =±1) =1/2 then Pr(X =

±1/2) =1/4 and Pr(X+Y =0)≥1/16>0,a contradiction.

5. In Definition 3.1, suppose that we relax the constraint on

ν

to have a Laplace transform. Consider the example

µ

(dx) = dx 2 cosh(

π

x)/2

with Laplace transform 1/cos

θ

defined on

Θ

(

µ

) = (−π2,π2).A possible asso- ciated

ν

is the Bernoulli 12(

δ

1+

δ

1)which satisfiesR+∞eiθx

ν

(dx) =cos

θ

in particular on|

θ

|<

π

/2)However it is not excluded that there exists other proba- bilities

ν

fulfilling the same property on|

θ

|<

π

/2).Imposing

ν

∈M1(R)rules out this phenomenon, from Part 2 of Proposition 3.1.

(9)

6. Here is the simplest example illustrating Part 5 of Proposition 3.1. We use once more the associated pair (5). In this case MF=R, aF=∞,VF(m) =1+m2, MG= (−1,1),aG=1, VG(m) =1−m2. See Proposition 3.5 below.

7. SELF ASSOCIATED PAIRS ANDNEF: A trivial example is

µ

=N(0,1)since

µ

=

µ

.More generally, VFis a function of m4if and only if

µ

=

µ

. An other impor- tant example will be found in Theorem 5.1 below, which is VF(m) =√

1+4m4. Note that the symmetry of

µ

is essential: if VF(m) =m4, with MF= (0,∞),we have VF(m) =VF(im)but the concept of association does not make sense here.

8. This Part 5 provides also a way to decide quickly from the examination of the variance function that

µ

does not exist. For instance, if

µ

XY where X and Y are iid with the Poisson distribution of mean 1, then F=F(

µ

)has variance function VF(m) =√

1+m2.Would G=F(

µ

)exist, its variance function would be VG(m) =√

1−m2. The domain of the mean of G would be(−1,1),from the principle of analytic continuation of variance functions (Theorem 3.1 in Letac and Mora (1990)). However on around the point m=1,the function VGwould be equivalent to 2(1−m)1/2. This is forbidden by the principle of Jørgensen, Martinez and Tsao (1993): this principle says that if MG= (a,b)with b<∞and if

VG(m)∼mbA×(b−m)p (10) then p∈/(0,1).

Similarly consider the variance function VF(m) = (1+m2)3/2defined on MF= R.One can consult Letac(1991) chapter 5 example 1.2 for a probabilistic inter- pretation. It is generated by a

µ

such that

Θ

(

µ

) = (−1,1)and kµ=√

1−

θ

2−1.

For seeing that VG(m) = (1−m2)3/2cannot be a variance function we observe the following. If

ν

=

µ

exists then

kν(

θ

) =p

1+

θ

2−1.

Therefore, by using the principle of maximal analytic continuation (see Proposi- tion 3.2 below), we have

Θ

(

ν

) =R.As a consequence Lν(

θ

) =e

1+θ21is an entire function, which is clearly impossible.

Proof of Proposition 3.1. 1) Suppose that

ν

is in M1(R). Then the knowledge of the Fourier transform of

ν

on the interval

Θ

(

µ

) gives the knowledge of the Laplace transform Lν on

Θ

(

ν

).Now the Fourier transform of

µ

restricted to

Θ

(

ν

) is Lµ(i

θ

) =1/Lν(i

θ

)from the relation (7) extended by analyticity.

2) If

ν

1exists, its Fourier transform coincides with the Fourier transform of

ν

on the interval

Θ

(

µ

).By analyticity, the two coincide everywhere and

ν

=

ν

1.

3) is obvious.

4) Since the Fourier transform of

ν

restricted to

Θ

(

µ

)is 1/Lµ(

θ

)then in a neigh- borhood of

θ

=0, the Laplace transform of

ν

satisfies Lν(

θ

) =1/Lµ(i

θ

).Now we use the following result:

(10)

Proposition 3.2. (Principle of maximal analyticity) If

ν

∈M(R)and if

Θ

(

ν

) = (a,b)suppose that there exists(a1,b1)⊃(a,b)and a real analytic function f on (a1,b1)which is strictly positive and such that f(

θ

) =Lν(

θ

)for a<

θ

<b.Then a=a1and b=b1.

Proof. Use the method of proof of Theorem 3.1 of Letac and Mora (1990) or Kawata (1972), chapter 7.

We now return to the proof of Proposition 3.1, Part 4). Write

Θ

(

ν

) = (−b,b).

Clearly b>zµ is impossible since it would imply that Lν(zµ)would be finite, a con- tradiction with Lµ(zµ) =0.We apply Proposition 2.2 to the present

ν

, to(a1,b1) = (−zµ,zµ)and to the positive analytic function on this interval f(

θ

) =1/Lµ(

θ

).As a consequence b=b1=zµ and the result 4) is proved.

5) Consider the functions Lµ and Lν.They are analytic on the strips

Θ

(

µ

iR and

Θ

(

ν

iR, and from Part 4)

Θ

(

µ

) +i

Θ

(

ν

)is the open square with vertices

±zν±izν.Let Z be the set of zeros of the analytic function

θ

7→Lµ(

θ

)restricted to the square

Θ

(

µ

) +i

Θ

(

ν

).From the principle of isolated zeros, Z contains only a finite number of points in the compact set[−a,a]×[−b,b]when a<zνand b<

z

µ

.Also Z has no zeros on the set S= (−zν,zν)∪(−izµ,izµ).Consider now the part Z++contained in the first quadrant, and its closed convex hull C++.Similarly consider C±,±, the closed set C=C++C+C−+∪C−−and the open set U=

Θ

(

µ

) +i

Θ

(

ν

)\C.Then U is a simply connected set and is a neighborhood of S.

We are in position to define log Lµ=kµon the open set U as an analytic function.

On this set U we have

kµ(

θ

) =−kν(i

θ

), k0µ(

θ

) =−ikν(i

θ

), k00µ(

θ

) =k00ν(i

θ

). (11) Since

VF(k0µ(

θ

)) =k00µ(

θ

),VG(k0ν(

θ

)) =kν00(

θ

) we get finally

VG(ik0µ(i

θ

)) =k00µ(i

θ

)

and this is saying that for m in the open set k0µ(U)we have VF(m) =VG(im),which is the desired result.

Proposition 3.3. (Convolution of Bernoulli’s). Let(an)n=1be a real sequence such that∑n=1a2n<∞.Let(Xn)n=1and(Yn)n=1be two iid sequences such that

Xn∼ 1

2 cosh(

π

x/2), Yn∼1

2(

δ

1+

δ

1).

Then the distributions

µ

ofn=1anXnand

ν

ofn=1anYnare associated.

Proof. Easy, from (5) and Part 3) of Proposition 2.1. Note that for an=1/3nthen

ν

is the purely singular Cantor distribution on(−1/2,1/2), while

µ

has a density.

(11)

3.1 Examples of associated probabilities

Here are 3 groups of examples. It can be observed that they offer three different generalizations of (5). We start with the classical formula for t>0 correct for

θ

∈ (−t,t):

Z +∞

e dx

(cosh x)t =2t1

Γ

(t)

Γ

(t+

θ

2 )

Γ

(t

θ

2 ) (12)

with

Θ

(

µ

t) = (−t,t).In particular using the duplication formula√

π Γ

(t) =2t1

Γ

(t2)

Γ

(t+12 ) we get the Laplace transform of the probability

α

tbelow:

α

t(dx) =

Γ

(t+12 )

π Γ

(t2dx

(cosh x)t, Lαt(

θ

) = 1

Γ

(2t)2×

Γ

(t+

θ

2 )

Γ

(t

θ

2 ) (13) with

Θ

(

α

t) = (−t,t).It is worthwhile to mention that if X and Y are iid with distri- bution

β

(t

2,1)(dx) = t

2x2t11(0,1)(x)dx and if U=p

X/Y then logU

α

t.

Formula (12) is easily proven by the change of variable u=e2x and the formula R

0 up−1du

(1+u)p+q =B(p,q)for p,q>0.The Fourier version of (12) is Z +∞

eixθ dx

(cosh x)t = 2t1

Γ

(t)

Γ

(t+i

θ

2 )

2

(14) leading by Fourier inversion to

2t1 2

πΓ

(t)

Z +∞

eiθx

Γ

(t+ix

2 )

2

dx= 1

(cosh

θ

)t (15) and by analyticity to (9). For a while, let us specialize these formulas to t=2p−1 and to t =2p where p is a positive integer. From the complements formula

Γ

(z)

Γ

(1−z) =

π

/sin(

π

z)and

Γ

(z+1) =z

Γ

(z)we have for t=1,2

Γ

(1+

θ

2 )

Γ

(1−

θ

2 ) =

π

cosπθ2 ,

Γ

(1+

θ

2)

Γ

(1−

θ

2) =

πθ

2 sinπθ2 and more generally

Γ

(2p+1+

θ

2 )

Γ

(2p+1−

θ

2 ) = 1

2p(1−

θ

2)(9−

θ

2). . .((2p−1)2

θ

2

π

cos(16)πθ2 ,

Γ

(p+

θ

2)

Γ

(p−

θ

2) = 1

2p(4−

θ

2)(16−

θ

2). . .(4p2

θ

2

πθ

2 sinπθ2 .(17) Proposition 3.4. If

α

t is defined by (13) then

α

t exists if and only if t≥1.In

(12)

particular

α

1=12(

δ

−π/2+

δ

π/2)is a Bernoulli distribution and for t>1 we have

α

t(dx) =

Γ

(t)

πΓ

(t21)(cos x)t21(−π/2,π/2)(x)dx.

In particular for t=2p+1 and t=2p+2 where p is an non negative integer, then (16) and (17) give(

ϕ

t)1when

ϕ

tis the Fourier transform of

α

t.

Comments. For this example, the explicit calculation of the variance functions of F =F(

α

t)and G=F(

α

t)is not possible. For instance if t =2 the probability

α

2is the uniform distribution on the segment(−

π

/2,

π

/2).In this case Lα

2(

θ

) =

sinh(πt/2)

πt/2 : no way to compute

θ

=

ψ

α2(m)in a close formula when m=k0α

2(

θ

) =

π

2

cotanh(

πθ

) 2 )− 2

πθ

.

Shanbhag (1979) and, in their Proposition 4, Barlev and Letac (2012), have other proofs of the ’only if’ condition of existence of

α

t.

Proof. For t>1 we just rely on entry 3. 631, 9 of Gradshteyn and Ryzhik (2007).

If t<1 we show that

α

tdoes not exist by showing that

θ

7→k00α

t(i

θ

)is not positive.

We obtain

k00αt(i

θ

) =

n=0

(n+2t)2θ42 [(n+2t)2+θ42]2. and a careful calculation shows that

θ→lim

θ

2k00αt(i

θ

) =2(t−1)

If t<1 then

θ

7→kα00t(i

θ

)cannot be positive for all

θ

R, and this ends the proof.

Proposition 3.5. If

µ

t is defined by (9) then

µ

texists if and only if t is a positive integer N. In this case

µ

N is the image of the binomial distribution B(N,1/2)by x7→2xN.

Comments. The most interesting particular case corresponds to t=2 since in this case we meet the uniform distribution on a segment with the associated pair

µ

2(dx) = x

4 sinh(

π

x/2)dx, (

µ

2)(dy) =1

21(1,1)(y)dy.

This is also an illustration of Proposition 2.3 applied to an=1/2nsince∑n=1Y2nn is uniform on (-1,1) when(Yn)n=0is an iid sequence of symmetric Bernoulli random variables. For this example, the explicit calculation of the variance functions of F= F(

µ

t)and G=F(

µ

t)gives

(13)

VF(m) =t+m2

t ,VG(m) =Nm2 N .

Proof of Proposition 3.5.⇐is obvious. To prove ⇒suppose that there exists a positive integer n0such that n0−1<t<n0and suppose that

µ

texists. Taking the image

τ

of

µ

tby the map x7→x0=xt, choosing

θ

>0 and denoting z=ewe get

Z +∞

eθx0

τ

(dx0) = Z +∞

eθ(xt)

µ

t(dx) = 1 2t

n=0

t(t−1). . .(t−n+1) n! zn. Since t(t−1). . .(t−n+1)<0 when n=n0+1 this shows that

τ

({−2n0−2})<0, a contradiction.

The third example is obtained by considering the Babel class of NEF, namely the set of exponential families such that the variance function has the form VF=P

+ Q

where

, P and Q are polynomials with respective degrees less or equal to 2,1,2.Looking for possible pairs(F,G)in this class such that VF(m) =VG(im)and such that F and G are generated by associated distributions(

µ

,

ν

)-and therefore symmetric- implies that

(m) =Am2+C,P is a constant and Q(m) =A0m2+C0. The case C=0 is excluded since the domain of the mean MFand MGare symmetric interval and VF and VGare real analytic on them. As a consequence either F or G must be such that

(m) =1−m2(up to affinities). But there is only one type of NEF in the Babel class such that

(m) =1−m2and it is generated by the trinomial distributions defined for 0<a<1 by

µ

a= 1

a+1(a

δ

0+1 2

δ

1+1

2

δ

1) (18)

and their entire powers of convolution. Of course the limit cases are related to Bernoulli, since

µ

0=1 2

δ

1+1

2

δ

1,

µ

1= (1

2

δ

1/2+1

2

δ

1/2)∗(1

2

δ

1/2+1 2

δ

1/2).

Proposition 3.6. If

µ

ais defined by (18) with a∈(0,1)then

µ

aexists and is

µ

a=

τ

b

τ

b.

where a=cos 2b with 0<b<

π

/4 and

τ

±b(x) = cos b

coshπx4 e±bxdx.

Proof. We have

(14)

Lµa(

θ

) =a+cosh

θ

a+1 ,VF(µa)= 1

1−a2m2a

√1−a2 r 1

1−a2m2. Therefore, if

µ

adoes exist it must satisfy

Lµa(

θ

) = a+1

a+cos

θ

,VF(µa)= 1

1−a2+m2a

√1−a2 r 1

1−a2+m2 with

Θ

(

µ

a)= (−zµa,zµa)where zµa is the smallest positive solution of cos

θ

=−a.

Such a

µ

aactually exists. To see this we write a=cos 2b with 0<b<

π

/4 and by simple trigonometry and the help of formula (6):

cos 2b+1 cos 2b+cos

θ

=

cos b

cos(θ2b)× cos b

cos(θ2+b)=Lτb(

θ

)Lτb(

θ

) where

τ

±b(x) = cos b

coshπ4xe±bxdx.

4 Discussion and easy cases for (Am

4

+ Bm

2

+C)

1/2

In this section we recall known and not so well known results about a few particular cases. The cases where only one of the three numbers A,B,C is not zero are classical:

we get respectively the gamma, Poisson or normal case. We now investigate three more interesting particular cases (they are all described in Letac 1992 as elements of the Babel class).

4.1 The case A = 0

The useful results are contained in the following proposition:

Proposition 4.1. Let t>0.Let N1 and N2be two independent standard Poisson random variables with expectation t/2.Then the exponential family Ftwith domain of the meansRand variance function(m2+t2)1/2 exists and is generated by the distribution

µ

t of N1N2.Furthermore

µ

t(dx) =

nZ

etI|n|(t)

δ

n(dx) where

Ix(t) =

n=0

1 n!

Γ

(n+x+1)

t 2

2n+x

.

(15)

Proof. SinceE(eθ(N1N2) =et(coshθ1)we get that

Θ

(

µ

t) =Rand that kµt(

θ

) =t(cosh

θ

−1),k0µ

t(

θ

) =t sinh

θ

, k00µ

t(

θ

) =t cosh

θ

= (k0µt(

θ

)2+t2)1/2. Thus VF(µt)(m) = (m2+t2)1/2 as desired, and the domain of the means isR. A consequence of this proposition and of 3 and 2 is that (Bm2+C)1/2 is always a variance function for B and C>0.

4.2 The case C = 0

Proposition 4.2. Let t>0.Then the exponential family Ftwith domain of the means Rand variance function m(1+mt22)1/2exists. In particular F1is generated by

µ

1=

δ

0+2∑n=1

δ

n. More specifically, P is in F1if and only if there exists q∈(0,1) such that P is the convolution of the Bernoulli distribution 1+q1

δ

0+1+qq

δ

1with the geometric distribution(1−q)∑n=0qn

δ

n.

Proof. Writing for

θ

<0 Lµ1(

θ

) =1+eθ

1eθ it is easily seen that it generates a natural exponential family with domain of the means(0,∞)and variance function m(1+ m2)1/2.The only non trivial point of the proposition is the fact that the elements of F1are infinitely divisible. For this we write

kµ1(

θ

) =

n=1

1

n(1+ (−1)n)e.

Since the coefficient 1n(1+ (−1)n)of e is≥0 the result is proved (although it is difficult to compute

µ

texplicitly when t is not an integer.

A consequence of this proposition is that(Am4+Bm2)1/2is a variance function for A and B>0 with domain of the means(0,∞).

4.3 The case B

2

4AC = 0

Here is a well known fact (see Morris (1982)):

Proposition 4.3. Let t>0. The natural exponential family Ft with domain of the meansRand variance function t(1+mt22)is generated by the probability

µ

t defined by (8).

This rules the case B24AC=0 such that Ax2+Bx+C has a negative double root with A>0.

(16)

Proposition 4.4. Let t>0. The natural exponential family Ft with domain of the means(t,∞)and variance function2t(mt22−1)exists. In particular F1is generated by

µ

1=∑n=1n

δ

n.

Proof. We do not give the details about

µ

1which are standard. Since the elements of F1are negative binomial distributions shifted by 1, they are still infinitely divisible and Ftdoes exist for all t>0.

This rules out the case B24AC=0 such that Ax2+Bx+C has a positive double root x0with A>0 and domain of the means(√x0,∞).

Proposition 4.5. Let N>0 be an integer. The natural exponential family Ft with domain of the means(−N,N)and variance functionN2(1−mN22)exists. It is generated by(

δ

1+

δ

1)N.

Proof. This is an easy and classical fact.

4.4 Ax

2

+ Bx + C cannot have simple roots on (0, ∞)

We discard some values of(A,B,C).Suppose that Ax2+Bx+C has a positive simple root x0>0. Then(Am4+Bm2+C)1/2cannot be a variance function. For by the principle of maximal analyticity, the domain of the means will have m0=√x0has boundary point. Since x0is a simple root, then the variance function around m0will be equivalent to k|mm0|1/2for some positive constant k.But this is forbidden by the principle of Jørgensen, Martinez and Tsao (1994) mentioned in (10).

4.5 The splitting of the elliptic variances in three cases

The only cases that we are left to consider in order to have a classification of the variance functions of the form(Am4+Bm2+C)1/2are now the cases where Ax2+Bx+C is strictly positive on[0,∞)and has no double negative root. Of course this implies that A>0 and C>0.To simplify the matters, we choose C=1 and we introduce the function V(m) = (Am4+Bm2+1)1/2and, for t>0, the function Vt(m) =tV(m/t).A simple analysis shows that Ax2+Bx+1 has no positive roots and no double negative roots if and only if there exists a non zero real number a and a positive number b such that

Ax2+Bx+1= (1+ax)2+2b2x.

Let us insist of the fact that a can be negative. Finally we introduce a complex num- ber k through its square in order to use the standard notations of elliptic functions:

(17)

k2=1+2a b2 This leads to three cases

1. The case−1≤k2<0.It corresponds to the fact that P(x) = (1+ax)2+2b2x has no roots and that the minimum of P on[0,∞)is reached on 0.

2. The case k2<−1.Here P has no roots and reaches its minimum on[0,∞)at

b2(a+b2)/a2.

3. The case k2>0.Here P has two distinct negative real roots. Taking A=1 instead of C=1 and P(x) = (x+a2)(x+b2)is convenient.

We investigate these cases in the next three sections.

5 The elliptic cases: The case − 1 ≤ k

2

< 0

We write k2=−1+p with 0p<1 and we introduce the following two constants:

K = Z 1

0 (1−x2)1/2(2−px2)1/2dx K0 =

Z 1

0 (1−x2)1/2(1+ (1−p)x2)1/2dx. (19) Here is our first serious result:

Theorem 5.1. Suppose that k2=1+2ab2 =−1+p∈[−1,0).For b=√

2 and a=

−2+p there exists a natural exponential family Gtwith domain of the meansRand variance function

t r

(1+am2

t2)2+2b2m2 t2

when t is a multiple of a.It is concentrated on 2Kπ Z.The family G|a| is generated by a symmetric probability measure

µ

|a|which is the convolution of the Bernoulli distribution 12(

δ

2Kπ +

δ

2Kπ )by an infinitely divisible distribution

α

|a|concentrated on πKZ.We denote q=e−πK0/Kand for a positive integer

ν

we denote

cν=c−ν=qν−(−1)νq 1−q >0.

Then the Laplace transform of

α

tis Z

eθx

α

t(dx) =exp t

|a|

ν∈Z\{0}

cν(eνπθK −1)

! .

(18)

Finally the characteristic function of

µ

2|a|is℘(s+K)1

3p

where

is the elliptic Weier- strass function satisfying

02=4(

−1+2p

3 )(

p

3)(

+1−p 3)

which is doubly periodic with primitive periods 2K and 2iK0.In particular it has zeros and Gt cannot be infinitely divisible.

Comments. Doing b=√

2 is not really a restriction. Using the formula a2VF(m/a) for the image of F by x7→ax gives the description of F for an arbitrary b>0.

Proof. We apply the standard procedure for computing the Laplace transform of a generating measure when the variance function is given. We shall use the following change of variable u2= (1+am2)2+2b2m2for u≥1.This implies that

m2= 1

a2[−ab2+p

b4+2ab2+a2u2].

We consider now the new change of variable u=1

2((2+b2

a)w2b2 aw2) = b2

2a(k2w2− 1 w2)

with 0<w<1.This choice is designed in order to have b4+2ab2+a2u2=b4k2+ a2u2transformed in a perfect square of a rational function of w :

pb4+2ab2+a2u2=a

2((2+b2

a)w2+ b2 aw2) =b2

2 (k2w2+ 1 w2) This leads to

m2= b2

2a2w2(1−w2)(1−k2w2) (20) but also to a surprising result

a+b2+a2m2= (a+b2

2 )w2+ b2 2w2=b2

2(k2w2+ 1 w2) du= [(a+b2

2 )w2+ b2 2w2] 2

awdw=b2

a (k2w2+ 1 w2)dw

w du

a+b2+a2m2 = 2

awdw (21)

Recall that a<0 and that w7→u is decreasing. Thus we get, gathering (21) and (20)

Références

Documents relatifs

The idea here is to retrieve learning algorithm by using the exponential family model with clas- sical statistical principle such as the maximum penalized likelihood estimator or

We complete the result in [2] by showing the exponential decay of the perturbation of the laminar solution below the critical Rayleigh number and of the convective solution above

This work generalizes the use of duality in the theory of Markov processes as proposed by Sigmund (see Anderson’s book). The dual doesn’t

we prove that the evolution of sharp cut-off approximations of the Gamow function is outgoing and exponentially damped.. An error estimate is

In this section we study the particular case of categorical variables by first illustrating how our general approach can be used when working with a specific distribution, and

Their linearity properties were recently studied by Drakakis, Requena, and McGuire, who conjectured that the absolute values of the Fourier coefficients of an Exponential Welch

We show that these vectorial exponential functions can achieve high order convergence with a significant gain in term of the number of basis functions compare to more

On the two dual graphs above, the full points represent components of odd multiplicity whereas empty points represent components of even multiplicity of the total transform of