Beta and Dirichlet sub-Gaussianity

(1)

HAL Id: hal-01950665

https://hal.archives-ouvertes.fr/hal-01950665

Submitted on 11 Dec 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Beta and Dirichlet sub-Gaussianity

Olivier Marchal, Julyan Arbel

To cite this version:

(2)

Beta and Dirichlet sub-Gaussianity

Olivier Marchal, Univ. Lyon, Univ. Monnet, Institut Camille Jordan, France

Julyan Arbel, Inria, Mistis, Grenoble, France,

www.julyanarbel.com

Abstract

• _{Optimal proxy variance σ}2_opt _{for the sub-Gaussianity of Beta}

distribu-tion, improves recent conjecture σ2₀ by Elder (2016): σ2_opt ≤ σ2

0 =

1

4(α+β+1) .

• _{Provide different proof techniques for}

(i) symmetrical case (α = β): direct coef. comparison of entire series

(ii) non-symmetrical case (α , β): ordinary differential equation satisfied

by moment-generating function, aka confluent hypergeometric function

• _{Derive optimal proxy variance for Dirichlet.}

Sub-Gaussianity & optimal proxy variance

• _{A random variable X with finite mean µ} = E[X] is sub-Gaussian if there

is a positive number σ2 (called a proxy variance) such that:

E[exp(x(X − µ))] ≤ exp

x2σ2

2

!

for all x ∈ R. (1)

• _{If X is sub-Gaussian, the optimal proxy variance is:}

σ2

opt(X) = min{σ

2

≥ 0 _{such that X is σ}2_{-sub-Gaussian}.}

Variance lower bounds optimal proxy variance: Var[X] ≤ σ2_opt(X)_{. When}

σ2

opt(X) = Var[X], X is said to be strictly sub-Gaussian.

Optimal proxy variance illustrated & compared

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.04 0.08 0.12 α+β=1 α 0 2 4 6 8 10 0.000 0.005 0.010 0.015 0.020 α+β=10 α 2 3 4 1 1 ₂ ₃ 4 0.05 0.1 0.15

• _Top_{: varying α on x-axis, α} + β = 1

(left) and α + β = 10 (right).

• _Left_{: α, β ∈ [0.2, 4].} ◦ _{Magenta: σ}2_opt_{(α, β)} ◦ _{Green: Var[Beta(α, β)]} ◦ _{Black (dots): σ}2 0 = 1 4(α+β+1)

Looking for connections with literature

◦ _{Compare with} _{Bernoulli setting} ₍_{Berend and Kontorovich}_, ₂₀₁₃_;

Buldygin and Moskvichova, 2013)

◦ _{Possible links of our non-uniform sub-Gaussian result with}

◦ _{transportation inequalities}

◦ _{logarithmic Sobolev inequalities}

Optimal proxy variance for the Beta

Theorem Beta(α, β) _{is σ}2_opt(α, β)_{-sub-Gaussian} _with:

                   σ2 opt(α, β) = α (α + β)x₀ 1F1(α + 1; α + β + 1; x0) 1F1(α; α + β; x0) − 1 !

where x₀ is the unique solution of the equation

log(₁F₁(α; α + β; x₀)) = αx0 2(α + β) 1 + 1F1(α + 1; α + β + 1; x0) 1F1(α; α + β; x0) ! .

Simple explicit upper bound to σ2_opt(α, β) _{is σ}2₀(α, β) = _4(α_+β+1)1 _:

Var[Beta(α, β)] ≤ σ2_opt(α, β) ≤ σ2₀ _{(strict when α , β)}

Sketch of proof

• α = β: direct coef. comparison of entire series representations of (₁₎

• α , β: study the MGF aka confluent hypergeometric function or

Kummer’s function

y(x) def= E[exp(xX)] = ₁F₁(α; α + β; λ) =

∞ X j=0 Γ(α + j)Γ(α + β) ( j!)Γ(α)Γ(α + β + j) x j.

using the ordinary differential equation it satisfies

xy00(x) + (α + β − x)y0(x) − αy(x) = 0

via the difference

u_t(x) def= exp µx + σ2_t x2/2 _{− E[exp(xX)], σ}2_t = t Var[X] + (1 − t)σ2₀

0 1 2 3 −1 0 1 2 x10−3 ● x₀ u₀ u_t_opt u_t_non_opt u₁

◦ _{For t} = 0 (σ2₀_{), dotted black curve remains > 0}

◦ _For t = t_opt _(σ2_opt_{), magenta curve has minimum} = 0 at x₀

◦ _For _t = 1 (Var[X]), dashed green curve has negative second derivative

at x = 0, directly negative around 0

◦ _{For t}_{non opt} _{∈ (t}_opt, 1), orange, dash and dots curve is first positive, then

negative, and positive again

References

Berend, D. and Kontorovich, A. (2013). On the concentration of the missing mass. Electronic Communications in Probability, 18(3):1–7.

Buldygin, V. V. and Moskvichova, K. (2013). The sub-Gaussian norm of a binary random variable. Theory of probability and mathematical statistics, 86:33–49.

Elder, S. (2016). Bayesian adaptive data analysis guarantees from subgaussianity. arXiv preprint: arXiv:1611.00065.

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30.