Exact marginals and normalizing constant for Gibbs distributions

(1)

HAL Id: hal-00795132

https://hal.archives-ouvertes.fr/hal-00795132

Submitted on 27 Feb 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Exact marginals and normalizing constant for Gibbs

distributions

Cécile Hardouin, Xavier Guyon

To cite this version:

Cécile Hardouin, Xavier Guyon. Exact marginals and normalizing constant for Gibbs distributions.

C. R. Acad. Sci. Paris, Ser. I., 2010, 348, pp.199-201. �hal-00795132�

(2)

Statistics

Exact marginals and normalizing constant for Gibbs distributions

C. Hardouin

a,1

_{, X. Guyon}

a

a_{CES/SAMOS-MATISSE/Universit´e de Paris 1}

Abstract

We present a recursive algorithm for the calculation of the marginal of a Gibbs distribution π. A direct conse-quence is the calculation of the normalizing constant of π.

R´esum´e

Récurrences et constante de normalisation pour des modèles de Gibbs. Nous proposons dans ce travail une récurrence sur les lois marginales d’une distribution de Gibbs π. Une conséquence directe est le calcul exact de la constante de normalisation de π.

1. Introduction

Usually, obtaining the marginals and/or the normalizing constant C of a discrete probability distri-bution π involves high dimensional summation : for example, for the binary Ising model on a simple grid 10 × 10, the calculation of C involves 2100 _{terms. One way to prevent this problem is to change}

distribution of interest for an alternative as, for example in spatial statistics, replacing the likelihood for the conditional pseudo likelihood ([1]). Another solution consists of estimating the normalizing constant; see for example Pettitt & al ([8]) and Moeller & al ([7]) for efficient Monte Carlo methods, Bartolucci and Besag ([2]) for a recursive algorithm computing the exact likelihood of a Markov random field, Reeves and Pettitt ([9]) for an efficient computation of the normalizing constant for a factorisable model.

We present specific results for a Gibbs distribution π. We derive results of Khaled ([5,6]) who gives an original linear recursion on the marginals of π, the law of Z = (Z1, Z2, · · · , ZT) ∈ ET; this result eases

the calculation of π’s normalizing constant. We generalize Khaled results noticing that if π is a Gibbs

Email addresses: Cecile.Hardouin@univ-paris1.fr (C. Hardouin), guyon@univ-paris1.fr (X. Guyon).

1 _{Adresse de correspondance : C. Hardouin, SAMOS-MATISSE/Universit´e de Paris 1, 90 rue de Tolbiac, 75634 Paris}

(3)

distribution on T = {1, 2, · · · , T }, then π is a Markov field on T , so it is easy to manipulate its conditional distributions that are the basic tools of our forward recursions.

2. Markov representations of a Gibbs field

Let T > 0 be a fix positive integer, E = {e1, e2, · · · , eN} a finite state space, Z = (Z1, Z2, · · · , ZT) ∈ ET

a temporal sequence with distribution π. Let us denote z(t) = (z1, z2, · · · , zt). We assume that π is a

Gibbs distribution with energy and potentials:

π(z(T )) = C exp UT(z(T )) with C−1= X z(T )∈ET exp UT(z(T )) where (1) Ut(z(t)) = X s=1,t θs(zs) + X s=2,t Ψs(zs−1, zs) for 2 ≤ t ≤ T , and U1(z1) = θ1(z1).

So, π is a bilateral 2 nearest neighbours Markov field ([4,3])

π(zt| zs, 1 ≤ s ≤ T and s 6= t) = π(zt| zt−1, zt+1) (2)

but Z is also a Markov chain :

π(zt| zs, s ≤ t − 1) = π(zt| zt−1) if 1 < t ≤ T. (3)

An important difference appears between formulas (3) and (2): indeed, (2) is computationnally feasible, when (3) is not.

3. Recursion over marginal distributions 3.1. Future-conditional contribution Γt(z(t))

For t ≤ T − 1, the distribution π(z1, z2, · · · , zt| zt+1, zt+2, · · · , zT) conditionally to the future, depends

only on zt+1: π(z1, z2, · · · , zt| zt+1, zt+2, · · · , zT) = P π(z1, z2, · · · , zT) ut 1∈Etπ(u t 1, zt+1, ..zT)= π(z1, z2, · · · , zt| zt+1).

We can also write π(z1, z2, · · · , zt | zt+1) = Ct(zt+1) exp Ut∗(z1, z2, · · · , zt; zt+1) where Ut∗ is the

future-conditional energ : U∗ t(z1, z2, · · · , zt; zt+1) = Ut(z1, z2, · · · , zt) + Ψt+1(zt, zt+1), (4) and Ct+1(zt+1)−1= P ut 1∈Etexp {U ∗ t(u1, ..., ut; zt+1)}. Then, for i = 1, N : π(z1, z2, · · · , zt| zt+1= ei) = Ct(ei)γt(z1, z2, · · · , zt; ei) where γt(z(t); ei) = exp Ut∗(z(t); ei).

With the convention ΨT +1 ≡ 0, we define for t ≤ T , the vector Γt(z(t)) ∈ RN of the future-conditional

contributions as

(Γt(z(t)))i= γt(z(t); ei), 1 ≤ i ≤ N .

and the recursion matrix Atby

At(i, j) = exp{θt(ej) + Ψt+1(ej, ei)}, i, j = 1, N. (5)

Then we get the following fundamental recurrence. 2

(4)

Proposition 3.1 For all 2 ≤ t ≤ T , z(t) = (z1, z2, · · · , zt) ∈ Etand ei∈ E, we have:

γt(z(t − 1), ej; ei) = At(i, j) × γt−1(z(t − 1); ej) , (6)

and _X

zt∈E

Γt(z(t − 1), zt) = AtΓt−1(z(t − 1)). (7)

3.2. Forward recursions on marginals and normalization constant

Let us define the following 1 × N row vectors : E1= BT = (1, 0, · · · , 0), and the (Bt)t=T,2 defined by

the forward recursion Bt−1 = BtAt if t ≤ T ; we also denote K1 =

P

z1∈EΓ1(z1) ∈ R

N_{. We give below}

the main result of this work.

Proposition 3.2 Marginal distributions πtand calculation of the normalization constant C.

(1) For 1 ≤ t ≤ T :

πt(z(t)) = C × BtΓt(z(t)). (8)

(2) The normalization constant C of the joint distribution π verifies: C−1_{= E}

1ATAT −1· · · A2K1. (9)

The formula (9) reduces to C−1_{= E}

1ATAT −2K1for time invariant potentials.

As a basic example, let us consider E = {0, 1}, θt(zt) = αzt, and Ψt+1(zt, zt+1) = βztzt+1; the analytic

expressions of A, K1 are trivially derived. We computed C−1 = E1ATAT −2K1 for increasing values of

T ; the computing time is always negligible for T ≤ 700, whereas computing C−1 _{by direct summation}

needs 750 seconds for T = 20, 6 hours for T = 25, and the method becoming ineffectual for T > 25.

4. Extensions to general Gibbs fields

There are various generalisations of the preceeding results. 4.1. Temporal Gibbs model

Let us give the following example as an illustration to possible extensions. Coming back to the previous model (1), we add the interaction potentials Ψ2,s(zs−2, zs). Then π is a 4 nearest neighbours Markov field

but also a Markov chain of order 2. Conditionally to the future, we get

π(z1, z2, · · · , zt | zt+1, zt+2, · · · , zT) = π(z(t) | zt+1, zt+2) = Ct(zt+1, zt+2) exp Ut∗(z(t); zt+1, zt+2), with

U∗

t(z(t); zt+1, zt+2) = Ut(z(t)) + Ψ1,t+1(zt, zt+1) + Ψ2,t+1(zt−1, zt+1) + Ψ2,t+2(zt, zt+2),

Then, for a, b and c ∈ E, U∗

t(z(t − 1), a; (b, c)) = Ut−1∗ (z(t − 1); (a, b)) + θt(a) + Ψ1,t+1(a, b) + Ψ2,t+2(a, c);

analogously to the previous example, we define the future-conditional contributions and the N2_{× N}2

matrices Atby

γt(z(t); (zt+1, zt+2)) = exp Ut∗(z(t); (zt+1, zt+2)

(5)

Similarly as 3.1, we get the following recursion:

γt(z(t − 1), ek; (ei, ej)) = At((i, j), (k, i)) × γt−1(z(t − 1); (ek, ei))

We thus obtain a recurrence (7) on the contributions Γt(z(t)) and analogous results as (8) and (9) for

the bivariate Markov chain (Zt−1, Zt), t = 1, T .

4.2. Spatial Gibbs fields

For t ∈ T ={1, 2, ...T }, let us consider Zt= (Z(t,i), i ∈ I), where I = {1, 2, · · · , m}, Z(t,i)∈ F . Then

Z = (Zs, s = (t, i) ∈ S) is a spatial field on S = T ×I. We note again zt= (z(t,i), i ∈ I), z(t) = (z1, .., zt),

z = z(T ) and we suppose that the distribution π of Z is a Gibbs distribution with translation invariant

potentials ΦAk(•), k = 1, K associated to a family of subsets {Ak, k = 1, K} of S. For A ⊆ S, let us define H(A) = sup{|u − v| , ∃(u, i) and (v, j) ∈ A}, and H = sup{H(Ak), k = 1, K}. With this notation,

we write the Gibbs-energy

U (z) = H X h=0 T X t=h+1 Ψ(zt−h, · · · , zt) with Ψ(zt−h, · · · , zt) = X k:H(Ak)=h X s∈St(k) ΦAk+s(z)

where St(k) = {s = (u, i) : Ak+ s ⊆ S and t − H(Ak) ≤ u ≤ t}. Then (Zt) is a Markov process of order

H and Yt = (Zt−H, Zt−H+1, · · · Zt), t > H a Markov chain on EH for which we get the results (8) and

(9).

We applied the result to the calculation of the normalization constant for an Ising model. For m = 10 and T = 100, the computing time is less than 20 seconds.

References

[1] J. Besag, 1974. Spatial interactions and the statistical analysis of lattice systems. J. Roy. Statist. Soc. B 148, 1-36 [2] F. Bartolucci, J. Besag, 2002. A recursive algorithm for Markov random fields. Biometrika 89 3, 724-730

[3] X. Guyon, 1995. Random Fields on a Network: Modeling, Statistics, and Applications. Springer-Verlag, New York [4] R. Kindermann, J.L. Snell, 1980. ), Markov random fields and their applications, Contemp. Maths.

[5] M. Khaled, 2008. A multivariate generalization of a Markov switching model. working paper C.E.S. Universit´e Paris 1,

http://mkhaled.chez-alice.fr/khaled.html

[6] M. Khaled, 2008. Estimation bayésienne de modèles espace-état non linéaires. Ph. D. Thesis, Université Paris 1,

http://mkhaled.chez-alice.fr/khaled.html

[7] J. Moeller, A.N. Pettitt, R. Reeves, K.K. Berthelsen, 2006. An efficient Markov chain Monte Carlo method for distributions with intractable normalizing constants. Biometrika 93 2 , 451-458.

[8] A.N. Pettitt, N. Friel, R. Reeves, 2003. fficient calculation of the normalizing constant of the autologistic and related models on the cylinder and lattice. J. Roy. Statist. Soc. B 65, part 1, 235-246

[9] R. Reeves,A.N. Pettitt, 2004. Efficient recursions for general factorisable models. Biometrika 91 3 , 751-757.