Associated natural exponential families and elliptic functions

(1)

HAL Id: hal-01977671

https://hal.archives-ouvertes.fr/hal-01977671

Submitted on 10 Jan 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Associated natural exponential families and elliptic functions

Gérard Letac

To cite this version:

Gérard Letac. Associated natural exponential families and elliptic functions. Conference on Proba-

bility, statistics and their applications, Jun 2015, Aarhus, Denmark. �hal-01977671�

(2)

Associated natural exponential families and elliptic functions

G´erard Letac

To Ole Barndorff-Nielsen for his 80th birthday.

Abstract This paper studies the variance functions of the natural exponential fam- ilies (NEF) on the real line of the form(Am⁴+Bm²+C)^1/2where m denoting the mean. Surprisingly enough, most of them are discrete families concentrated on

λ

^Z for some constant

λ

and the Laplace transform of their elements are expressed by elliptic functions. The concept of association of two NEF is an auxilliary tool for their study: two families F and G are associated if they are generated by symmet- ric probabilities and if the analytic continuations of their variance functions satisfy V_F(m) =V_G(m√

−1). We give some properties of the association before its applica- tion to these elliptic NEF. The paper is completed by the study of NEF with variance functions m(Cm⁴+Bm²+A)^1/2.They are easier to study and they are concentrated on aN.

Primary: 62E10 , 60G51 Secondary: 30E10

Key words: Variance functions, exponential dispersion models, function

℘

^of Weierstrass.

1 Foreword

Ole and I met for the first time in the Summer School of Saint Flour in 1986. Having been converted to statistics by V. Seshadri two years before, I had learnt about exponential families through Ole’s book (1978) and I had fought with cuts and steep- ness. Marianne Mora had just completed her thesis in Toulouse and was one of G´erard Letac

Equipe de Statistique et Probabilit´es, Universit´e de Toulouse, 118 route de Narbonne, 31062 Toulouse, France,

[email protected]

1

(3)

the Saint Flour participants: she was the first from Toulouse to make the pilgrim- age to Aarhus the year after, followed by many others researchers from Toulouse:

Evelyne Bernadac, C´elestin Kokonendji, Dhafer Malouche, Muriel Casalis, Abdel- hamid Hassairi, Angelo Koudou and myself. Over the years all of us were in contact with the everflowing ideas of Ole. During these Aarhus days (and Ole’s visits to Toulouse) we gained a better understanding of the L´evy processes, of generalized inverse Gaussian distributions and their matrix versions, of differential geometry applied to statistics. Among all the topics which have interested Ole, the choice today is the one for which he may be the least enthusiastic (see the discussion of Letac 1991), namely the classification of exponential families through their variance functions: Ole thought correctly that although the results were satisfactory for the mind, one could not see much real practical applications: in the other hand Mendeleiev is universally admired for its prophetic views of chemical elements which had not been yet discovered. Descriptions of natural exponential families with more and more sophisticated variance functions V have been done: when V is a second degree polynomial in the mean (Morris 1982), a power (Tweedie 1984, Jørgensen 1987), a third degree polynomial (Letac-Mora 1990), the Babel class P+Q√

R where poly- nomials P,Q,R have degrees not bigger than 2,1,2 (Letac 1992, Jørgensen 1997).

This is for univariate NEF: even more important works have been done for mul- tivariate NEF, but the present paper will confine to one dimensional distributions only.

Having forgotten variance functions during the last twenty years and having turned to random matrices and Bayesian theory, our interest for the topic has been rejuvenated by the paper by S. Bar-Lev and F. Van de Duyn Schouten (2004). The authors consider exponential dispersion models G such that one of the transforma- tions

T(P)(dx) = xP(dx) R

RxP(dx), T²(P)(dx) = x²P(dx) R

Rx²P(dx).

maps G into itself or one of its translates. For T(P) they obtain exactly the exponential dispersion models concentrated on the positive line with quadratic variance functions: gamma, Poisson, negative binomial and binomial and no others. For T²(P)they obviously obtain the previous ones, but they observe that new variance functions appear, in particular(m⁴+c²)^1/2, without being able to decide whether these natural exponential families (NEF) exist after all (it should be mentioned here that their formula (11) is not correct and this fact greatly invalidates their paper). As a result, our initial motivation was to address this particular question: is(m⁴+c²)^1/2 a variance function? As we shall see, the answer is yes for a discrete set of c.To solve this particular problem, we have to design methods based on elliptic functions, and these methods appear to have a wider domain of applicability. For this reason, the aim of the present paper is the classification of the variance functions of the form (Am⁴+Bm²+C)^1/2and their reciprocals in the sense of Letac-Mora (1990), namely the variance functions of the form m(Cm⁴+Bm²+A)^1/2.

Section 2 recalls general facts and methods for dealing with NEF’s. Section 3 opens a long parenthesis on pairs of associated NEF: if F and G are NEF generated by symmetric probabilities, we say that they are associated roughly if we can write

(4)

V_F(m) =f(m²)and V_G(m) =f(−m²).This definition seems to be a mere curiosity of distribution theory, but appears to be illuminating when applied to our elliptic NEF. Since this concept of association has several interesting aspects, I have pro- vided here several detailed examples (Section 3.1) that the reader should skip if he is only interested in elliptic NEF.

Section 4 rules out trivial values of the parameters for elliptic NEF. Sections 5-7 investigate the various cases according to the parameters(A,B,C), Section 8 con- siders the reciprocal families of the previous elliptic NEF: these reciprocal NEF are interesting distributions of the positive integers. Section 9 makes brief comments on the variance functions(

α

^m+

β

)p

P(m)where P is an arbitrary polynomial of degree≤4,actually a complete new field of research. While the statements of the present paper are understandable without knowledge of elliptic functions, the proofs of Sections 5-7 make heavy use of them, and we shall constantly refer to the mag- nificent book by Sansone and Gerretsen (1960) that we frequently quote by SG.

2 Retrieving an NEF from its variance function

The concept of exponential family is obviously the backbone of Ole’s book (Barn- dorff -Nielsen (1978)) or of Brown (1986), but the notations for the simpler object called natural exponential family are rather to be found in Morris (1982), Jørgensen (1987) and Letac and Mora (1990).

If

µ

is a positive non Dirac Radon measure onRconsider its Laplace transform L_µ(

θ

) =

Z ∞

−∞e^θx

µ

(dx)≤∞.

Assume that the interior

Θ

(

µ

)of the interval D(

µ

) ={

θ

∈R; L_µ(

θ

)<∞}is not empty. The set of positive measures

µ

^on^R^{such that}

Θ

(

µ

)is not empty and such that

µ

is not concentrated on one point is denoted byM(R). We denote byM₁(R) the set of probabilities

µ

contained inM(R).In other terms, the elements ofM₁(R) are the probability laws onRwhich have a non trivial Laplace transform.

Write k_µ=log L_µ.Then the family of probabilities F=F(

µ

) ={P(

θ

,

µ

);

θ

_∈

Θ

(

µ

)} where

P(

θ

,

µ

)(dx) =e^θx⁻^k^µ^(θ)

µ

(dx)

is called the natural exponential family generated by

µ

.Two basic results are m=k⁰_µ(

θ

) =

Z _∞

−∞xP(

θ

,

µ

)(dx)

and the fact that k⁰_µ is increasing (or that k_µ is convex). The set k⁰_µ(

Θ

(

µ

)) =MF

is called the domain of the means. We denote by

ψ

_µ ^{: M}F →

Θ

(

µ

)the reciprocal

(5)

function of m=k⁰_µ(

θ

). Thus F=F(

µ

)can be parametrized by M_Fby the map from MFto F which is

m7→P(

ψ

_µ(m),

µ

) =P(m,F).

In other terms an element of F(

µ

)can be identified by the value of the mean m rather than by the value of the canonical parameter

θ

. One can prove that the variance V_F(m)of P(m,F)is

VF(m) =k⁰⁰_µ(

ψ

_µ(m)) = 1

ψ

_µ⁰(m). (1) The map m7→V_F(m)from M_Fto(0,∞)is called the variance function and character- izes F.The Jørgensen set of

µ

^{is the set}

Λ

(

µ

)of positive numbers t such that there exists a positive measure

µ

t such that

Θ

(

µ

t) =

Θ

(

µ

)and such that L_µ_t = (L_µ)^t. Obviously,

Λ

(

µ

)is an additive semigroup which contains all positive integers. If t∈

Λ

(

µ

)we denote F_t=F(

µ

t)and it is easily checked that M_F_t =tM_Fand that

V_F_t(m) =tV_Fm t

. (2)

The union G=G(

µ

) =∪t∈Λ(µ)F(

µ

t) is called the exponential dispersion model generated by

µ

.

If F is a NEF and if h(x) =ax+b (with a6=0)then the family h(F)of images of elements of F by h is still a NEF with M_h(F)=h(MF)and

V_h(F)(m) =a²V_F m−b

a

. (3)

In spite of the similarity between (2) and (3), the last formula is much more useful for dealing with a NEF F which is known only by its variance function: the reason is that the Jørgensen set

Λ

(F)of F is unknown in many circumstances. In fact

Λ

(F)is a closed additive semi group of[0,∞)which can be rather complicated (see Letac, Malouche and Maurer (2002) for an example). In the other hand an affinity is always defined, and this fact can be use to diminish the number of parameters of a family of variance functions. For instance, if we consider the variance function

√Am⁴+Bm²+C such that C>0, without loss of generality we could assume that C=1 by using the dilation x7→x/√

C.

An important fact for the sequel is that V_Fis real analytic, that means that for any m₀∈M_Fthere exists a positive number r such that for m₀−r<m<m₀+r we have

V_F(m) =

∑

∞ n=0

(m−m₀)ⁿ

n! V_F⁽ⁿ⁾(m0)

which implies that V_F is analytically extendable to a connected open set of the complex plane containing the real segment MF.If

µ

_∈^M1(R),the Laplace trans- form L_µ defined on the open interval

Θ

(

µ

)is extendable analytically in a unique way to the strip

Θ

(

µ

) +iRof the complex plane. This extension is also denoted L_µ and

θ

7→L_µ(i

θ

)is the Fourier transform of the probability

µ

. The function

(6)

k_µ(

θ

) =log L_µ(

θ

)could be also extendable to an analytic function on the same strip, but it would be a multivalued function if L_µ(

θ

)has zeros in the strip.

To conclude this section, recall the four steps allowing us to pass from the vari- ance function V_F of a NEF F to a measure

µ

such that F=F(

µ

).

1. Writing d

θ

=

ψ

_µ⁰(m)dm=_V^dm

F(m), we compute

θ

=

ψ

µ(m)as a function of m by a quadrature;

2. we deduce from this the parameter m as a function m=k⁰_µ(

θ

)(this is generally a difficult point);

3. we compute k_µ(

θ

)by a second quadrature and obtain L_µ =e^k^µ;

4. we use dictionary, creativity, or inversion Fourier formulas to retrieve

µ

^{from its} Laplace transform.

We keep these four steps in mind for dealing with V_F(m) =√

Am⁴+Bm²+C in the sequel. It is worthwhile to sketch here an example with

V(m) =p

1+4m⁴.

For 0<w<1 we do the change of variable and perform the first step:

m=

√1−w⁴

2w ,w²=−2m²+p

1+4m⁴,d

θ

= dw

√1−w⁴,

θ

= Z 1

w(m)

√dw 1−w⁴. The second step introduces a function C(

θ

)defined on the interval [0,K] where K=^R₀¹√^dw

1−w⁴ =1.3098..as

θ

=

Z 1 C(θ))

√dw

1−w⁴. (4)

Since w(m) =C(

θ

), up to the knowledge of C(

θ

),and taking derivative of both sides of (4), the second step is performed since

m=k⁰(

θ

) =

p1−C(

θ

)⁴

2C(

θ

) =−C⁰(

θ

) 2C(

θ

).

The third step is easy and we get k(

θ

) =−¹2logC(

θ

)and the Laplace transform L(

θ

) = √_C(θ)¹ .The fourth step needs to be explicit about C(

θ

)and the theory of elliptic functions becomes necessary: details about this particular example are in Theorem 4.1 when doing k²=−1.The function L will be the Laplace transform of a discrete distribution concentrated on set on numbers of the form n/a where n is a relative integer and where a is the complicated number ^2K_π =0.8338... If we use formula (3) we get the following surprizing result: the function√

a⁴+4m⁴is the variance function of a NEF concentrated onZ.

(7)

3 Associated natural exponential families

The source of this concept is the pair of identities (5) below: if

µ

(dx) = dx

2 cosh^π₂^x

and

ν

=¹₂(

δ

₋1+

δ

1)is the symmetric Bernoulli distribution then Z _+∞

−∞ e^θx

µ

(dx) = 1

cos

θ

^(for^|

θ

|<

π

/2), Z _+∞

−∞ e^iθx

ν

(dx) =cos

θ

, (5) which could be as well presented by reversing the roles of Fourier and Laplace transforms:

Z +∞

−∞ e^iθx

µ

(dx) = 1 cosh

θ

^,

Z +∞

−∞ e^θx

ν

(dx) =cosh

θ

. (6) This is an example of what we are going to call an associated pair(

µ

,

ν

)of probabilities onR. Here is the definition:

Definition 3.1 Let

µ

^and

ν

^{be in}^M1(R)such that

µ

^and

ν

are symmetric. We say that(

µ

,

ν

)is an associated pair if for all

θ

∈

Θ

(

µ

)the Fourier transform of

ν

^is 1/L_µ(

θ

).In other terms for

θ

∈

Θ

(

µ

)we have

Z +∞

−∞ e^θ^x

µ

(dx) =L_µ(

θ

), Z +∞

−∞ eⁱ^θ^x

ν

(dx) = 1

L_µ(

θ

). (7) The corresponding natural exponential families F=F(

µ

)and G=F(

ν

)are also said to be associated.

We describe now the easy consequences of this definition:

Proposition 3.1 Let(

µ

,

ν

)inM₁(R)be an associated pair. Then 1. (Symmetry) The pair(

ν

,

µ

)is also associated;

2. (Uniqueness) If(

µ

,

ν

1)is also associated, then

ν

1=

ν

.

3. (Convolution) If(

µ

⁰,

ν

⁰)is an associated pair then(

µ

∗

µ

⁰,

ν

∗

ν

⁰)is also an associated pair.

4. (Zeros) Denote z_µ=inf{

θ

>0 ; L_µ(i

θ

) =0}.Then

Θ

(

ν

) = (−z_µ,z_µ).

5. (Variance functions) Consider the associated pair F=F(

µ

)and G=F(

ν

)of NEF. If V_F and V_Gare extended as analytic functions to the complex plane in a neighborhood of zero, then V_F(m) =V_G(im).

Comments

(8)

1. Clearly, since the Fourier transform _L¹

µ(θ) is real, the probability

ν

must be symmetric.

2. Symmetry of

ν

implies that

Θ

(

ν

)is a symmetric interval, as well as the mean domain of F(

ν

).

3. Because of the uniqueness of Part 2, we shall also write

µ

^∗for indicating that (

µ

,

µ

^∗)is an associated pair. In this case

µ

^∗is called the associated probability to

µ

(when it exists). We also observe that

(

µ

^∗)^∗=

µ

, (

µ

∗

µ

⁰)^∗=

µ

^∗∗(

µ

⁰)^∗.

4. It is not correct to think that if

µ

^{is in}^M1(R)then

µ

^∗always exists. An example is given by the first Laplace distribution (also called the bilateral exponential distribution)

µ

(dx) =1

2e^−|^x^|dx,

Θ

(

µ

) = (−1,1),L_µ(

θ

) = 1 1−

θ

²^.

Suppose that

ν

=

µ

^∗exists. Then its Fourier transform on(−1,1)is 1−

θ

²^{. This} implies that its Laplace transform is L_ν(

θ

) =1+

θ

²and therefore

Θ

(

ν

) =R. But if k_ν=log L_µ it is easy to see that the sign of k_ν⁰⁰(

θ

)is the sign of 1−

θ

², which implies that k_νis not convex, a contradiction.

A more complicated example is given by

µ

t(dx) = 2^t⁻²

πΓ

(t)

Γ

(t+ix

2 )

2

dx (8)

for t>0.We will see in Section 3.1 that

µ

t∈M₁(R)satisfies

Θ

(

µ

t) = (−^π₂,^π₂) and

2^t⁻²

πΓ

(t)

Z _+∞

−∞ e^θx|

Γ

(t+ix

2 )|²dx= 1

(cos

θ

)^t. (9) If t is not an integer, then

µ

_t^∗=

ν

does not exist (Proposition 3.2). An obvious case is t=1/2 : if X,Y are iid such that Pr(X+Y =±1) =1/2 then Pr(X =

±1/2) =1/4 and Pr(X+Y =0)≥1/16>0,a contradiction.

5. In Definition 3.1, suppose that we relax the constraint on

ν

to have a Laplace transform. Consider the example

µ

(dx) = dx 2 cosh(

π

x)/2

with Laplace transform 1/cos

θ

^{defined on}

Θ

(

µ

) = (−^π₂,^π₂).A possible associated

ν

is the Bernoulli ¹₂(

δ

₋1+

δ

1)which satisfies^R₋^+∞_∞e^iθx

ν

(dx) =cos

θ

ⁱⁿ particular on|

θ

|<

π

/2)However it is not excluded that there exists other probabilities

ν

fulfilling the same property on|

θ

|<

π

/2).Imposing

ν

∈M₁(R)rules out this phenomenon, from Part 2 of Proposition 3.1.

(9)

6. Here is the simplest example illustrating Part 5 of Proposition 3.1. We use once more the associated pair (5). In this case MF=R, aF=∞,VF(m) =1+m², M_G= (−1,1),a_G=1, V_G(m) =1−m². See Proposition 3.5 below.

7. SELF ASSOCIATED PAIRS ANDNEF: A trivial example is

µ

=N(0,1)since

µ

=

µ

^∗.More generally, V_Fis a function of m⁴if and only if

µ

=

µ

^∗. An other impor- tant example will be found in Theorem 5.1 below, which is V_F(m) =√

1+4m⁴. Note that the symmetry of

µ

is essential: if V_F(m) =m⁴, with M_F= (0,∞),we have V_F(m) =V_F(im)but the concept of association does not make sense here.

8. This Part 5 provides also a way to decide quickly from the examination of the variance function that

µ

^∗does not exist. For instance, if

µ

∼X−Y where X and Y are iid with the Poisson distribution of mean 1, then F=F(

µ

)has variance function V_F(m) =√

1+m².Would G=F(

µ

^∗)exist, its variance function would be V_G(m) =√

1−m². The domain of the mean of G would be(−1,1),from the principle of analytic continuation of variance functions (Theorem 3.1 in Letac and Mora (1990)). However on around the point m=1,the function V_Gwould be equivalent to 2(1−m)^1/2. This is forbidden by the principle of Jørgensen, Martinez and Tsao (1993): this principle says that if MG= (a,b)with b<∞and if

V_G(m)∼m→bA×(b−m)^p (10) then p∈/(0,1).

Similarly consider the variance function V_F(m) = (1+m²)^3/2defined on M_F= R.One can consult Letac(1991) chapter 5 example 1.2 for a probabilistic inter- pretation. It is generated by a

µ

^{such that}

Θ

(

µ

) = (−1,1)and k_µ=√

1−

θ

²−1.

For seeing that V_G(m) = (1−m²)^3/2cannot be a variance function we observe the following. If

ν

=

µ

^∗exists then

k_ν(

θ

) =p

1+

θ

²−1.

Therefore, by using the principle of maximal analytic continuation (see Proposi- tion 3.2 below), we have

Θ

(

ν

) =R.As a consequence L_ν(

θ

) =e

√1+θ²−1is an entire function, which is clearly impossible.

Proof of Proposition 3.1. 1) Suppose that

ν

is in M₁(R). Then the knowledge of the Fourier transform of

ν

on the interval

Θ

(

µ

) gives the knowledge of the Laplace transform L_ν on

Θ

(

ν

).Now the Fourier transform of

µ

restricted to

Θ

(

ν

) is L_µ(i

θ

) =1/L_ν(i

θ

)from the relation (7) extended by analyticity.

2) If

ν

1exists, its Fourier transform coincides with the Fourier transform of

ν

^on the interval

Θ

(

µ

).By analyticity, the two coincide everywhere and

ν

=

ν

1.

3) is obvious.

4) Since the Fourier transform of

ν

restricted to

Θ

(

µ

)is 1/L_µ(

θ

)then in a neighborhood of

θ

=0, the Laplace transform of

ν

satisfies L_ν(

θ

) =1/L_µ(i

θ

).Now we use the following result:

(10)

Proposition 3.2. (Principle of maximal analyticity) If

ν

∈M(R)and if

Θ

(

ν

) = (a,b)suppose that there exists(a1,b1)⊃(a,b)and a real analytic function f on (a1,b₁)which is strictly positive and such that f(

θ

) =L_ν(

θ

)for a<

θ

<b.Then a=a₁and b=b₁.

Proof. Use the method of proof of Theorem 3.1 of Letac and Mora (1990) or Kawata (1972), chapter 7.

We now return to the proof of Proposition 3.1, Part 4). Write

Θ

(

ν

) = (−b,b).

Clearly b>z_µ is impossible since it would imply that L_ν(z_µ)would be finite, a con- tradiction with L_µ(z_µ) =0.We apply Proposition 2.2 to the present

ν

^{, to}(a1,b₁) = (−z_µ,z_µ)and to the positive analytic function on this interval f(

θ

) =1/L_µ(

θ

).As a consequence b=b₁=z_µ and the result 4) is proved.

5) Consider the functions L_µ and L_ν.They are analytic on the strips

Θ

(

µ

)×iR and

Θ

(

ν

)×iR, and from Part 4)

Θ

(

µ

) +i

Θ

(

ν

)is the open square with vertices

±z_ν±iz_ν.Let Z be the set of zeros of the analytic function

θ

7→L_µ(

θ

)restricted to the square

Θ

(

µ

) +i

Θ

(

ν

).From the principle of isolated zeros, Z contains only a finite number of points in the compact set[−a,a]×[−b,b]when a<z_νand b<

z

µ

.Also Z has no zeros on the set S= (−z_ν,z_ν)∪(−iz_µ,iz_µ).Consider now the part Z₊₊contained in the first quadrant, and its closed convex hull C₊₊.Similarly consider C_±_,_±, the closed set C=C₊₊∪C₊₋∪C−+∪C₋₋and the open set U=

Θ

(

µ

) +i

Θ

(

ν

)\C.Then U is a simply connected set and is a neighborhood of S.

We are in position to define log L_µ=k_µon the open set U as an analytic function.

On this set U we have

k_µ(

θ

) =−k_ν(i

θ

), k⁰_µ(

θ

) =−ik_ν(i

θ

), k⁰⁰_µ(

θ

) =k⁰⁰_ν(i

θ

). (11) Since

V_F(k⁰_µ(

θ

)) =k⁰⁰_µ(

θ

),V_G(k⁰_ν(

θ

)) =k_ν⁰⁰(

θ

) we get finally

V_G(ik⁰_µ(i

θ

)) =k⁰⁰_µ(i

θ

)

and this is saying that for m in the open set k⁰_µ(U)we have V_F(m) =V_G(im),which is the desired result.

Proposition 3.3. (Convolution of Bernoulli’s). Let(an)^∞_n=1be a real sequence such that∑^∞n=1a²_n<∞.Let(Xn)^∞_n=1and(Yn)^∞_n=1be two iid sequences such that

X_n∼ 1

2 cosh(

π

x/2), Y_n∼1

2(

δ

₋1+

δ

1).

Then the distributions

µ

^of∑^∞_n=1a_nX_nand

ν

^of∑^∞_n=1a_nY_nare associated.

Proof. Easy, from (5) and Part 3) of Proposition 2.1. Note that for a_n=1/3ⁿthen

ν

is the purely singular Cantor distribution on(−1/2,1/2), while

µ

has a density.

(11)

3.1 Examples of associated probabilities

Here are 3 groups of examples. It can be observed that they offer three different generalizations of (5). We start with the classical formula for t>0 correct for

θ

∈ (−t,t):

Z _+∞

−∞ e^xθ dx

(cosh x)^t =2^t⁻¹

Γ

(t)

Γ

(t+

θ

2 )

Γ

(t−

θ

2 ) (12)

with

Θ

(

µ

t) = (−t,t).In particular using the duplication formula√

π Γ

(t) =2^t⁻¹

Γ

(^t₂)

Γ

(^t+1₂ ) we get the Laplace transform of the probability

α

tbelow:

α

t(dx) =

Γ

(^t+1₂ )

√

π Γ

(^t₂)× dx

(cosh x)^t, L_α_t(

θ

) = 1

Γ

(₂^t)²×

Γ

(t+

θ

2 )

Γ

(t−

θ

2 ) (13) with

Θ

(

α

t) = (−t,t).It is worthwhile to mention that if X and Y are iid with distri- bution

β

(t

2,1)(dx) = t

2x²^t⁻¹1_(0,1)(x)dx and if U=p

X/Y then logU∼

α

t.

Formula (12) is easily proven by the change of variable u=e^2x and the formula R_∞

0 u^p−1du

(1+u)^p+q =B(p,q)for p,q>0.The Fourier version of (12) is Z _+∞

−∞ e^ixθ dx

(cosh x)^t = 2^t⁻¹

Γ

(t)

Γ

(t+i

θ

2 )

2

(14) leading by Fourier inversion to

2^t⁻¹ 2

πΓ

(t)

Z _+∞

−∞ e^iθx

Γ

(t+ix

2 )

2

dx= 1

(cosh

θ

)^t (15) and by analyticity to (9). For a while, let us specialize these formulas to t=2p−1 and to t =2p where p is a positive integer. From the complements formula

Γ

(z)

Γ

(1−z) =

π

/sin(

π

z)and

Γ

(z+1) =z

Γ

(z)we have for t=1,2

Γ

(1+

θ

2 )

Γ

(1−

θ

2 ) =

π

cos^πθ₂ ,

Γ

(1+

θ

2)

Γ

(1−

θ

2) =

πθ

2 sin^πθ₂ and more generally

Γ

(2p+1+

θ

2 )

Γ

(2p+1−

θ

2 ) = 1

2^p(1−

θ

²)(9−

θ

²). . .((2p−1)²−

θ

²)×

π

cos(16)^πθ₂ ,

Γ

(p+

θ

2)

Γ

(p−

θ

2) = 1

2^p(4−

θ

²)(16−

θ

²). . .(4p²−

θ

²)×

πθ

2 sin^πθ₂ .(17) Proposition 3.4. If

α

t is defined by (13) then

α

t^∗ exists if and only if t≥1.In

(12)

particular

α

₁^∗=¹₂(

δ

_−π/2+

δ

_π/2)is a Bernoulli distribution and for t>1 we have

α

_t^∗(dx) =

Γ

(t)

√

πΓ

(^t⁻₂¹)(cos x)^t⁻²1₍_{−π/2,π/2)}(x)dx.

In particular for t=2p+1 and t=2p+2 where p is an non negative integer, then (16) and (17) give(

ϕ

t)⁻¹when

ϕ

tis the Fourier transform of

α

_t^∗.

Comments. For this example, the explicit calculation of the variance functions of F =F(

α

t)and G=F(

α

_t^∗)is not possible. For instance if t =2 the probability

α

₂^∗is the uniform distribution on the segment(−

π

/2,

π

/2).In this case L_α∗

2(

θ

) =

sinh(πt/2)

πt/2 : no way to compute

θ

=

ψ

_α₂^∗(m)in a close formula when m=k⁰_α∗

2(

θ

) =

π

2

cotanh(

πθ

) 2 )− 2

πθ

.

Shanbhag (1979) and, in their Proposition 4, Barlev and Letac (2012), have other proofs of the ’only if’ condition of existence of

α

_t^∗.

Proof. For t>1 we just rely on entry 3. 631, 9 of Gradshteyn and Ryzhik (2007).

If t<1 we show that

α

_t^∗does not exist by showing that

θ

7→k⁰⁰_α

t(i

θ

)is not positive.

We obtain

k⁰⁰_α_t(i

θ

) =

∑

∞ n=0

(n+₂^t)²−^θ₄² [(n+₂^t)²+^θ₄²]². and a careful calculation shows that

θ→lim∞

θ

²^k⁰⁰_α_t(i

θ

) =2(t−1)

If t<1 then

θ

_7→k_α⁰⁰_t(i

θ

)cannot be positive for all

θ

_∈^R, and this ends the proof.

Proposition 3.5. If

µ

t is defined by (9) then

µ

_t^∗exists if and only if t is a positive integer N. In this case

µ

_N^∗ is the image of the binomial distribution B(N,1/2)by x7→2x−N.

Comments. The most interesting particular case corresponds to t=2 since in this case we meet the uniform distribution on a segment with the associated pair

µ

2(dx) = x

4 sinh(

π

x/2)dx, (

µ

2)^∗(dy) =1

21₍₋_1,1)(y)dy.

This is also an illustration of Proposition 2.3 applied to a_n=1/2ⁿsince∑^∞_n=1^Y₂ⁿn is uniform on (-1,1) when(Yn)^∞_n=0is an iid sequence of symmetric Bernoulli random variables. For this example, the explicit calculation of the variance functions of F= F(

µ

t)and G=F(

µ

t^∗)gives

(13)

VF(m) =t+m²

t ,V_G(m) =N−m² N .

Proof of Proposition 3.5.⇐is obvious. To prove ⇒suppose that there exists a positive integer n₀such that n₀−1<t<n₀and suppose that

µ

t^∗exists. Taking the image

τ

^of

µ

t^∗by the map x7→x⁰=x−t, choosing

θ

>0 and denoting z=e⁻^2θwe get

Z _+∞

−∞ e^θx⁰

τ

(dx⁰) = Z _+∞

−∞ e^θ(x⁻^t)

µ

_t^∗(dx) = 1 2^t

∑

∞ n=0

t(t−1). . .(t−n+1) n! zⁿ. Since t(t−1). . .(t−n+1)<0 when n=n₀+1 this shows that

τ

({−2n₀−2})<0, a contradiction.

The third example is obtained by considering the Babel class of NEF, namely the set of exponential families such that the variance function has the form V_F=P

∆

+ Q√

∆

^where

∆

, P and Q are polynomials with respective degrees less or equal to 2,1,2.Looking for possible pairs(F,G)in this class such that V_F(m) =V_G(im)and such that F and G are generated by associated distributions(

µ

,

ν

)-and therefore symmetric- implies that

∆

(m) =Am²+C,P is a constant and Q(m) =A⁰m²+C⁰. The case C=0 is excluded since the domain of the mean M_Fand M_Gare symmetric interval and V_F and V_Gare real analytic on them. As a consequence either F or G must be such that

∆

(m) =1−m²(up to affinities). But there is only one type of NEF in the Babel class such that

∆

(m) =1−m²and it is generated by the trinomial distributions defined for 0<a<1 by

µ

a= 1

a+1(a

δ

0+1 2

δ

₋1+1

2

δ

1) (18)

and their entire powers of convolution. Of course the limit cases are related to Bernoulli, since

µ

0=1 2

δ

₋1+1

2

δ

1,

µ

1= (1

2

δ

₋1/2+1

2

δ

1/2)∗(1

2

δ

₋1/2+1 2

δ

1/2).

Proposition 3.6. If

µ

ais defined by (18) with a∈(0,1)then

µ

_a^∗exists and is

µ

_a^∗=

τ

b∗

τ

₋b.

where a=cos 2b with 0<b<

π

/4 and

τ

_±b(x) = cos b

cosh^πx₄ e^±^bxdx.

Proof. We have

(14)

L_µ_a(

θ

) =a+cosh

θ

a+1 ,V_F(µ_a₎= 1

1−a²−m²− a

√1−a² r 1

1−a²−m². Therefore, if

µ

a^∗does exist it must satisfy

L_µa∗(

θ

) = a+1

a+cos

θ

^,^V^F(µ^a^∗⁾⁼ 1

1−a²+m²− a

√1−a² r 1

1−a²+m² with

Θ

(

µ

a)^∗= (−z_µ_a,z_µ_a)where z_µ_a is the smallest positive solution of cos

θ

=−a.

Such a

µ

a^∗actually exists. To see this we write a=cos 2b with 0<b<

π

/4 and by simple trigonometry and the help of formula (6):

cos 2b+1 cos 2b+cos

θ

⁼

cos b

cos(^θ₂−b)× cos b

cos(^θ₂+b)=L_τ_b(

θ

)L_τ₋_b(

θ

) where

τ

_±b(x) = cos b

cosh^π₄^xe^±^bxdx.

4 Discussion and easy cases for (Am

⁴

+ Bm

²

+C)

^1/2

In this section we recall known and not so well known results about a few particular cases. The cases where only one of the three numbers A,B,C is not zero are classical:

we get respectively the gamma, Poisson or normal case. We now investigate three more interesting particular cases (they are all described in Letac 1992 as elements of the Babel class).

4.1 The case A = 0

The useful results are contained in the following proposition:

Proposition 4.1. Let t>0.Let N₁ and N₂be two independent standard Poisson random variables with expectation t/2.Then the exponential family F_twith domain of the meansRand variance function(m²+t²)^1/2 exists and is generated by the distribution

µ

t of N1−N2.Furthermore

µ

t(dx) =

∑

n∈Z

e⁻^tI_|_n_|(t)

δ

n(dx) where

I_x(t) =

∑

∞ n=0

1 n!

Γ

(n+x+1)

t 2

2n+x

.

(15)

Proof. SinceE(e^θ(N¹⁻^N²) =e^t(coshθ⁻¹⁾we get that

Θ

(

µ

t) =Rand that k_µ_t(

θ

) =t(cosh

θ

−1),k⁰_µ

t(

θ

) =t sinh

θ

, k⁰⁰_µ

t(

θ

) =t cosh

θ

= (k⁰_µ_t(

θ

)²+t²)^1/2. Thus V_F(µ_t₎(m) = (m²+t²)^1/2 as desired, and the domain of the means isR. A consequence of this proposition and of 3 and 2 is that (Bm²+C)^1/2 is always a variance function for B and C>0.

4.2 The case C = 0

Proposition 4.2. Let t>0.Then the exponential family F_twith domain of the means Rand variance function m(1+^m_t2²)^1/2exists. In particular F₁is generated by

µ

1=

δ

0+2∑^∞n=1

δ

n. More specifically, P is in F₁if and only if there exists q∈(0,1) such that P is the convolution of the Bernoulli distribution _1+q¹

δ

0+_1+q^q

δ

1with the geometric distribution(1−q)∑^∞_n=0qⁿ

δ

n.

Proof. Writing for

θ

<0 L_µ₁(

θ

) =^1+e^θ

1−e^θ it is easily seen that it generates a natural exponential family with domain of the means(0,∞)and variance function m(1+ m²)^1/2.The only non trivial point of the proposition is the fact that the elements of F₁are infinitely divisible. For this we write

k_µ₁(

θ

) =

∑

∞ n=1

1

n(1+ (−1)ⁿ)e^nθ.

Since the coefficient ¹_n(1+ (−1)ⁿ)of e^nθ is≥0 the result is proved (although it is difficult to compute

µ

texplicitly when t is not an integer.

A consequence of this proposition is that(Am⁴+Bm²)^1/2is a variance function for A and B>0 with domain of the means(0,∞).

4.3 The case B

²

− 4AC = 0

Here is a well known fact (see Morris (1982)):

Proposition 4.3. Let t>0. The natural exponential family F_t with domain of the meansRand variance function t(1+^m_t2²)is generated by the probability

µ

t defined by (8).

This rules the case B²−4AC=0 such that Ax²+Bx+C has a negative double root with A>0.

(16)

Proposition 4.4. Let t>0. The natural exponential family F_t with domain of the means(t,∞)and variance function₂^t(^m_t2²−1)exists. In particular F₁is generated by

µ

1=∑^∞_n=1n

δ

n.

Proof. We do not give the details about

µ

1which are standard. Since the elements of F₁are negative binomial distributions shifted by 1, they are still infinitely divisible and F_tdoes exist for all t>0.

This rules out the case B²−4AC=0 such that Ax²+Bx+C has a positive double root x₀with A>0 and domain of the means(√x₀,∞).

Proposition 4.5. Let N>0 be an integer. The natural exponential family F_t with domain of the means(−N,N)and variance function^N₂(1−^m_N²2)exists. It is generated by(

δ

1+

δ

₋1)^∗^N.

Proof. This is an easy and classical fact.

4.4 Ax

²

+ Bx + C cannot have simple roots on (0, ∞)

We discard some values of(A,B,C).Suppose that Ax²+Bx+C has a positive simple root x0>0. Then(Am⁴+Bm²+C)^1/2cannot be a variance function. For by the principle of maximal analyticity, the domain of the means will have m₀=√x₀has boundary point. Since x₀is a simple root, then the variance function around m₀will be equivalent to k|m−m₀|^1/2for some positive constant k.But this is forbidden by the principle of Jørgensen, Martinez and Tsao (1994) mentioned in (10).

4.5 The splitting of the elliptic variances in three cases

The only cases that we are left to consider in order to have a classification of the variance functions of the form(Am⁴+Bm²+C)^1/2are now the cases where Ax²+Bx+C is strictly positive on[0,∞)and has no double negative root. Of course this implies that A>0 and C>0.To simplify the matters, we choose C=1 and we introduce the function V(m) = (Am⁴+Bm²+1)^1/2and, for t>0, the function Vt(m) =tV(m/t).A simple analysis shows that Ax²+Bx+1 has no positive roots and no double negative roots if and only if there exists a non zero real number a and a positive number b such that

Ax²+Bx+1= (1+ax)²+2b²x.

Let us insist of the fact that a can be negative. Finally we introduce a complex num- ber k through its square in order to use the standard notations of elliptic functions:

(17)

k²=1+2a b² This leads to three cases

1. The case−1≤k²<0.It corresponds to the fact that P(x) = (1+ax)²+2b²x has no roots and that the minimum of P on[0,∞)is reached on 0.

2. The case k²<−1.Here P has no roots and reaches its minimum on[0,∞)at

−b²(a+b²)/a².

3. The case k²>0.Here P has two distinct negative real roots. Taking A=1 instead of C=1 and P(x) = (x+a²)(x+b²)is convenient.

We investigate these cases in the next three sections.

5 The elliptic cases: The case − 1 ≤ k

²

< 0

We write k²=−1+p with 0≤p<1 and we introduce the following two constants:

K = Z 1

0 (1−x²)⁻^1/2(2−p−x²)⁻^1/2dx K⁰ =

Z 1

0 (1−x²)⁻^1/2(1+ (1−p)x²)⁻^1/2dx. (19) Here is our first serious result:

Theorem 5.1. Suppose that k²=1+^2a_b2 =−1+p∈[−1,0).For b=√

2 and a=

−2+p there exists a natural exponential family G_twith domain of the meansRand variance function

t r

(1+am²

t²)²+2b²m² t²

when t is a multiple of a.It is concentrated on _2K^π Z.The family G_|_a_| is generated by a symmetric probability measure

µ

_|a|which is the convolution of the Bernoulli distribution ¹₂(

δ

₋_2K^π +

δ

_2K^π )by an infinitely divisible distribution

α

_|a|concentrated on ^π_KZ.We denote q=e^−πK⁰^/Kand for a positive integer

ν

^{we denote}

c_ν=c_−ν=q^ν−(−1)^νq^2ν 1−q^2ν >0.

Then the Laplace transform of

α

tis Z _∞

−∞e^θx

α

t(dx) =exp t

|a|

∑

ν∈Z\{0}

c_ν(e^νπθ^K −1)

! .

(18)

Finally the characteristic function of

µ

2|a|is_℘(s+K)¹

−3^p

where

℘

is the elliptic Weier- strass function satisfying

℘

⁰²=4(

℘

−1+2p

3 )(

℘

−p

3)(

℘

+1−p 3)

which is doubly periodic with primitive periods 2K and 2iK⁰.In particular it has zeros and G_t cannot be infinitely divisible.

Comments. Doing b=√

2 is not really a restriction. Using the formula a²V_F(m/a) for the image of F by x7→ax gives the description of F for an arbitrary b>0.

Proof. We apply the standard procedure for computing the Laplace transform of a generating measure when the variance function is given. We shall use the following change of variable u²= (1+am²)²+2b²m²for u≥1.This implies that

m²= 1

a²[−a−b²+p

b⁴+2ab²+a²u²].

We consider now the new change of variable u=1

2((2+b²

a)w²− b² aw²) = b²

2a(k²w²− 1 w²)

with 0<w<1.This choice is designed in order to have b⁴+2ab²+a²u²=b⁴k²+ a²u²transformed in a perfect square of a rational function of w :

pb⁴+2ab²+a²u²=a

2((2+b²

a)w²+ b² aw²) =b²

2 (k²w²+ 1 w²) This leads to

m²= b²

2a²w²(1−w²)(1−k²w²) (20) but also to a surprising result

a+b²+a²m²= (a+b²

2 )w²+ b² 2w²=b²

2(k²w²+ 1 w²) du= [(a+b²

2 )w²+ b² 2w²] 2

awdw=b²

a (k²w²+ 1 w²)dw

w du

a+b²+a²m² = 2

awdw (21)

Recall that a<0 and that w7→u is decreasing. Thus we get, gathering (21) and (20)