On conditional independence and log-convexity

(1)

www.imstat.org/aihp 2012, Vol. 48, No. 4, 1137–1147

DOI:10.1214/11-AIHP431

On conditional independence and log-convexity ¹

František Matúš

Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Pod vodárenskou vˇeží 4, 182 08 Prague, Czech Republic. E-mail:[email protected]

Received 20 November 2010; revised 21 April 2011; accepted 26 April 2011

Abstract. If conditional independence constraints define a family of positive distributions that is log-convex then this family turns out to be a Markov model over an undirected graph. This is proved for the distributions on products of finite sets and for the regular Gaussian ones. As a consequence, the assertion known as Brook factorization theorem, Hammersley–Clifford theorem or Gibbs–Markov equivalence is obtained.

Résumé. Si des contraintes d’indépendance conditionnelle définissent une famille de distributions positives qui est log-convexe, alors cette famille doit être un modèle de Markov sur un graphe non-dirigé. Ceci est démontré pour les distributions sur le pro- duits d’ensembles finis et pour les distributions gaussiennes régulières. Par conséquent, l’assertion connue comme le théorème de factorisation de Brook, le théorème de Hammersley–Clifford ou l’équivalence de Gibbs–Markov est obtenue.

MSC:Primary 62H05; 62M40; secondary 62H17; 62J10; 05C50; 11C20; 15A15

Keywords:Conditional independence; Markov properties; Factorizable distributions; Graphical Markov models; Log-convexity; Gibbs–Markov equivalence; Markov fields; Hammersley–Clifford theorem; Contingency tables; Gibbs potentials; Multivariate Gaussian distributions; Positive definite matrices; Covariance selection model

1. Main result

LetN be a finite set,X_i,i∈N, finite nonempty state spaces,X_I the product ofX_i overi∈I, andπ_I the coordinate projection ofX_N toX_I,I ⊆N. The marginal of a probability measure (p.m.)P onX_N toI is the p.m.π_IP onX_I given by

πIP (πIx)=

y∈X

P (y)δ_x,y^I , x∈XN,

whereδ_x,y^I equals one ifπ_Ix=π_Iy and zero otherwise.

A p.m.P onXNsatisfies aconditional independence(CI-) constraint if

π_{ij K}P (π_{ij K}x)·π_KP (π_Kx)=π_iKP (π_iKx)·π_{j K}P (π_{j K}x), x∈X_N, (1) wherei, j∈N are different andK⊆N\ij. By convention, an elementi∈N is not distinguished from the singleton {i}and the sign∪for unions of subsets ofN is omitted.

1Supported in part by Grant Agency of Academy of Sciences of the Czech Republic, Grant IAA 100750603, and by Grant Agency of the Czech Republic, Grant 201/08/0539.

(2)

The family of all ordered couples(ij|K)withi,j andKas above is denoted byR. ForL⊆RletP_Ldenote the family of p.m.’s onX_Nthat satisfy theCI-constraint given by each(ij|K)∈LandP_L⁺=P_L∩P⁺. Here,P⁺is the family of p.m.’s onXN that are positive in the senseP (x) >0 for allx∈XN. Given a p.m.P, the projectionsπi, i∈N, can be interpreted as the random variables jointly distributed according toP. Their distribution belongs toP_L if and only ifπ_i andπ_jare stochastically independent given(π_k)_k_∈_K for all(ij|K)∈L.

A nonempty subfamily ofP⁺islog-convexif it contains together with p.m.’sP andQalso the p.m. proportional tox→P^t(x)Q¹⁻^t(x),x∈X_N, for all 0< t <1. The general definition of log-convexity is recalled and discussed in Section5.3.

A hypergraph(N,A)consists of the vertex setN and a nonempty familyAof subsets ofN, called hyperedges.

A p.m.P onX_N isfactorizablew.r.t. the hypergraph if for eachI∈Athere exists a real-valued functionψ_I onX_I such that

P (x)=

I∈A

ψ_I(π_Ix), x∈X_N. (2)

The family of all p.m.’s on X_N that factorize w.r.t. the hypergraph is denoted by F_A and F_A⁺=F_A∩P⁺. An undirected graphG=(N, E)has a vertex setN and an edge setE, contained in the family_N

2

of all two-element subsets ofN. A setLof vertices is a clique ofGif_L

2

⊆E. IfAis the family of cliques ofG, the notationF_G⁺is preferred toF_A⁺and one speaks about the factorization w.r.t.G.

Theorem 1. IfL⊆RandP_L⁺is log-convex then this family coincides withF_G⁺whereG=(N, E)is the graph with ij∈Eif and only if(ij|K) /∈Lfor allK⊆N\ij.

In other words, given a familyLthe graphGis constructed onN by removing from_N

2

allijsuch that(ij|K)∈L for someK⊆N\ij; Theorem1asserts that the log-convexity ofP_L⁺impliesP_L⁺=F_G⁺.

IfLconsists exclusively of the couples(ij|K)havingK=N\ij then it is log-convex. This follows easily from a well-known equivalent definition ofCI-constraints, see (6) below. In this special case Theorem 1implies what is called by statisticians Brook factorization theorem [5,16] or Hammersley–Clifford theorem [3,7] and by physicists Gibbs–Markov equivalence [19,26].

Corollary 1. IfG=(N, E)is an undirected graph andLconsists of all couples(ij|N\ij )such thatij is not an edge ofGthenP_L⁺=F_G⁺.

A proof of Theorem 1 is deferred to Section4. It is based on properties of interaction spaces, summarized in Section2, and new observations on the behaviour of the familyP_Laround the uniform p.m., presented in Section3.

Discussion and remarks are collected in Section 5 that contains also a short independent proof of Corollary 1 of geometric flavor. Section6is devoted to a version of Theorem1involving regular multidimensional Gaussian p.m.’s, formulated as Theorem2.

2. Interaction spaces and factorizability

The factorization (2) of positive p.m.’s w.r.t. a hypergraph can be equivalently described on the exponential scale by a linear space. The space decomposes orthogonally to interaction spaces. The material of this section is standard, see [10] and [20], Appendix B.2. The aim is to prepare arguments used in Sections3and4, and supply self-contained proofs.

ForI ⊆NletVI denote the linear space of those functionsvofx∈XNthat depend onxonly throughπIx, thus v(x)=v(y)onceπIx=πIy,x, y∈XN. Forv∈VI

πIv(πIx)= |XN\I|v(x), x∈XN, (3)

where the marginalization of functions onXN is defined analogously to that of p.m.’s, thusπIv(πIx)is the sum of v(y)overy∈XN satisfyingπIx=πIy.

(3)

Lemma 1. A functionuonX_Nis orthogonal toVJ,J⊆N,if and only ifπ_Ju=0.

Proof. A functionvbelongs toVJ if and only if it can be written as the compositionv=v_Jπ_J wherev_J:X_J→R. Since

u, v =

x∈X_N

u(x)·v(x)=

x_J∈X_J

π_Ju(x_J)·v_J(x_J)= π_Ju, v_J

andv_J is arbitrary the assertion follows.

ForI ⊆N letUI denote the orthogonal complement inVI to the sum ofVJ overJI, and be referred to as an interaction space.

Lemma 2. IfJ⊆N does not containI⊆Nandu∈UI,thenπ_Ju=0.

Proof. Since UI ⊆VI the marginalπ_Ju of u∈UI to J depends onx_J ∈X_J only throughπ_I^J_∩_Jx_J whereπ_I^J_∩_J denotes the coordinate projection ofX_J toX_I_∩_J. It follows from (3), applied toπ_JuandI ∩J in the roles ofv andI, thatπ_I^J_∩_JπJu(π_I^J_∩_JxJ)equals|XJ\I|πJu(xJ)for xJ ∈XJ. Therefore, it suffices to prove that the marginal π_I^J_∩_JπJu=πI∩Juofu∈UIvanishes. By Lemma1, this is equivalent to the orthogonality ofuandVI∩J which holds

by the definition ofUIandI∩J=I.

Lemma 3. The spacesUI,I⊆N,are pairwise orthogonal.

Proof. IfI, J ⊆N are different then, up to symmetry,J does not containI. By Lemma2, ifu∈UI thenπ_Ju=0.

Lemma1implies thatuis orthogonal toVJ. Thus,UIandUJ are orthogonal.

For a hypergraph(N,A), letW_Adenote the sum ofVI overI∈A.

Lemma 4. For a hypergraph(N,A)the spaceW_Aequals the orthogonal sum ofUIover allI⊆N that are covered by some hyperedge fromA.

Proof. By Lemma3, the assertion follows from its special instance disregarding orthogonality. On account of the definition ofW_A, it suffices to restrict to the hypergraphs withA= {I},I⊆N, in which caseW_A=VI. Induction on the cardinality ofI is employed to prove thatVIis the sum ofUJ overJ⊆I. IfI=∅thenV_∅andU_∅coincide with the space of constant functions onX_Nand the assertion is trivial. Otherwise,VIdecomposes orthogonally intoUI

and the sum ofVJ overJI. By the induction assumption, this sum equals the sum ofUJ overJI whence the

induction step is completed.

Corollary 2. The Euclidean spaceVN=R^X^N decomposes orthogonally toUI,I⊆N.

Lemma 5. For every hypergraph(N,A)the familyF_A⁺ consists of the p.m.’s that are proportional toe^w wherew belongs to the orthogonal sum ofUIover allI⊆Nthat are covered by some hyperedge fromA.

Proof. Ifwbelongs to the above sum ofUI then it is in the sum ofVIoverI∈A. Writingwas the sum of functions v_Iπ_Iwherev_I:X_I→R, a p.m. proportional to e^wis given byx→texp[_I_∈Av_I(π_Ix)],x∈X_N, with some constant t >0. Such a p.m. belongs toF_A⁺, choosingψIin (2) proportional to e^v^I.

If P ∈F_A⁺ then, by (2), positive functionsψI exist such thatP (x)=e^w(x),x∈XN, wherew is the sum over I ∈Aof the functionsx→lnψ_I(π_Ix). Thus,P is proportional to e^w, with the proportionality constant equal to 1.

By definition,wbelongs toW_A, equal to the above sum ofUIby Lemma4.

(4)

Corresponding to the interaction spacesUI, a special baseα_y,y∈X_N, of the spaceR^X^N is constructed. To this end, it is assumed that eachX_ihas a distinguished element denoted by0i and0=(0i)_i_∈_N. The functionα_yis defined atx∈XNby

αy(x)=

(−1)^|^s(y)^∩^s(x)^|, x∼y,

0, otherwise,

wheres(y)denotes the support{i∈N: y_i =0i}ofy andx∼y abbreviates the equality of projections ofx andy ontos(x)∩s(y).

Lemma 6. IfI⊆N then{α_y: s(y)⊆I}is a base ofVI and{α_y: s(y)=I}a base ofUI.

Proof. Fory, z∈X_Nthe scalar product ofα_yandα_z equals the sum of(−1)^|[s(y)^s(z)^]∩s(x)^|overx∈X_Nsuch that x∼y andx∼z. Fori∈s(y)\s(z)the range of summation is partitioned into the pairs that differ only in theith coordinate, belonging to{0i, y_i}. By the summations over the pairs, the scalar product vanishes. Therefore,α_yandα_z are orthogonal whens(y)=s(z). Ifyandzhave the same support thenα_y(z)=(−1)^|s(y)^|δ^N_y,z, and thus the functions α_ywiths(y)=I are independent. It follows that{α_y: s(y)⊆I}is an independent set. This set is a base ofVIbecause αy∈Vs(y)⊆VIand dimVI= |XI|. Then,{αy: s(y)=I}is a base ofUIby Lemma4.

In particular, the spaceUI has a positive dimension if and only if|Xi|>1 for alli∈I. IfXi= {0i,1i},i∈N, then eachUI is spanned by the single functionα_y wherey=(y_i)_i_∈_Nhasy_i equal to1i fori∈I and0i otherwise. These functions form an orthogonal base ofR^X^N by Corollary2.

The following technical lemma is prepared for the discussion in Section5.2.

Lemma 7. For any hypergraph(N,A)the familyW_Ais the direct sum of the spaces V0,I =

v∈VI: v(x)=0oncex_i=0ifor somei∈I overI⊆Nthat can be covered by someJ∈A.

The spaceV0,I consists of the functionsv_Iπ_I wherev_I:X_I→Rvanishes on eachx_I∈X_I having a coordinate x_i=0i. Such a functionv_I is said to beadaptedto0.

Proof of Lemma7. ForI⊆Nletρ_Imapx∈X_Ntoy∈X_Ngiven byπ_Iy=π_Ixandπ_N_\_Iy=π_N_\_I0. Letw∈R^X^N be decomposed as the sum ofvIπI overI⊆Nwhere everyvIis adapted to0. Ifx∈XNandI=s(x)then

L⊆I

(−1)^|^I^\^L^|w(ρLx)=

L⊆I

(−1)^|^I^\^L^|

K⊆L

vK(πKρLx)

=

K⊆I

v_K(π_Kx)

K⊆L⊆I

(−1)^|^I^\^L^|=v_I(π_Ix). (4)

Therefore, the functionsv_I are unique and, in turn, the sum of allV0,I,I⊆N, is direct.

Ifvis a function onXNthen

I⊆N

(−1)^|^N^\^I^|vρ_I∈V0,N.

In fact, ifx∈XN has a coordinatexi=0ithenρIx=ρI\ixfori∈I⊆N, and grouping the summands into the pairs I,I\ithe sum vanishes. Sincev=vρ_Nandvρ_I∈VI it follows thatVN=R^X^NequalsV0,N plus the sum ofVIover IN. By induction on the cardinality ofN,VNis the sum ofV0,I overI⊆N, which implies the assertion.

(5)

3. Conditional independence and interactions

In this section, theCI-constraints are related to interaction spaces. This is done via the relation(ij|K)Lbetween (ij|K)∈RandL⊆Ndefined by the statement ‘i /∈Lorj /∈LorL⊆ij K.’ In other words,(ij|K) /Lif and only ifij ⊆L⊆ij K, see Section5.3. From now on,mI denotes|XI|⁻¹,I⊆N.

Lemma 8. If (ij|K)Land u∈UL then the measure given byP (x)=m_N[1+u(x)],x∈X_N,satisfies the CI- constraint given by(ij|K).

Proof. By Lemma2, ifij Kdoes not containLthen the marginal ofP toij Kis uniform. Thus, theCI-constraint (1) reduces to mij KmK=miKmj K. Otherwise,L⊆ij K, and it follows from(ij|K)Lthat i /∈Lor j /∈L. By the symmetry between iandj, it suffices to restrict to the casei /∈L. Then,L⊆j K which implies thatubelongs to Vj K⊆Vij K. By the double use of (3) withij Kandj Kin the role ofI, the constraint (1) rewrites to

m_{ij K} 1+u(x)

·π_KP (π_Kx)=π_iKP (π_iKx)·m_{j K} 1+u(x)

, x∈X_N. (5)

Ifj ∈LthenKandiK do not containL. By Lemma2, the marginals ofP toK andiK are uniform whence (5) holds. If j /∈L then L⊆K, and thus u∈VK ⊆ViK. By the double use of (3) withK andiK in the role of I, π_KP (π_Kx)=m_K[1+u(x)]andπ_iKP (π_iKx)=m_iK[1+u(x)]. Hence, (5) is always satisfied.

ForL⊆RletLdenote the family ofL⊆N such that(ij|K)Lfor all(ij|K)∈L.

Corollary 3. IfL⊆RandL∈Lthen any p.m.onXNthat differs from the uniform one by a vector fromUL,as in Lemma8,belongs toP_L.

An example of the constructionL→L arises from a graphG=(N, E)andKG= {(ij|N\ij ): ij ∈_N

2

\E}, arriving at the familyK_Gof cliques ofG.

For a later reference the following simple assertion is needed.

Lemma 9. For anyL⊆R,ifM⊆N and every differenti, j∈Mcan be covered by someL_ij ∈L contained inM thenM∈L.

Proof. If a couple(ij|K)∈Lhasi, j∈Mthen, by the assumption,ij⊆L_ij⊆Mfor someL_ij∈L. The definition ofLimplies that(ij|K)Lij, and thusLij⊆ij Kbecauseij⊆Lij. Hence,M⊆ij K. It follows that(ij|K)M.

In turn,M∈L.

In the remaining part of this section, theCI-constraints (1) are analyzed via the mappingsΨ_ij_|_Kgiven by Ψ_ij_|_K(w)=

π_{ij K}w(π_{ij K}x)·π_Kw(π_Kx)−π_iKw(π_iKx)·π_{j K}w(π_{j K}x)

x∈XN.

Here,(ij|K)∈Randwis a function onX_N. A p.m. can play the role ofwas well.

Lemma 10. The Jacobian ofΨ_ij_|_Kat the uniform p.m.onX_Nis equal to mKδ_x,y^{ij K}+mij Kδ^K_x,y−mj Kδ_x,y^iK −miKδ^{j K}_x,y

x,y∈X_N.

Proof. The coordinate function ofΨij|K indexed byx∈XN equals

z∈X_N

w(z)w(z) δ_x,z^{ij K}δ^K_x,z−δ^iK_x,zδ_x,z^{j K} .

(6)

Differentiating w.r.t.w(y),y∈X_N,

z∈X_N

δ^N_y,zw(z)+w(z)δ_z,y^N δ^{ij K}_x,zδ_x,z^K −δ_x,z^iKδ^{j K}_x,z .

Whenw(z)=m_N,z∈X_N, this equals mN

z∈XN

δ_x,y^{ij K}δ_x,z^K −δ_x,y^iKδ^{j K}_x,z +mN

z∈XN

δ_x,z^{ij K}δ_x,y^K −δ_x,z^iKδ^{j K}_x,y

and the assertion follows.

Let Ker(ij|K)denote the kernel of the Jacobian ofΨij|Kat the uniform p.m.

Lemma 11. ForL⊆Rthe intersection ofKer(ij|K)over(ij|K)∈Lis equal to the sum ofULoverL∈L. Proof. IfL /∈Lthenij⊆L⊆ij Kfor some(ij|K)∈L. Foru∈ULandv∈Ker(ij|K)

x∈X_N

u(x)

y∈X_N

m_Kδ_x,y^{ij K}+m_{ij K}δ^K_x,y−m_{j K}δ_x,y^iK −m_iKδ^{j K}_x,y

v(y)=0

because the inner sums equal zero due to Lemma10. Since none of the setsK,iK andj KcontainsLthe marginals ofuto these sets vanish by Lemma2and the above equation rewrites to

x∈X_N

y∈X_N

u(x)v(y)δ_x,y^{ij K}=0.

SinceL⊆ij Kthe functionubelongs toVij K, and thusu(x)δ^{ij K}x,y =u(y)δx,y^{ij K}. Therefore,uandvare orthogonal. In turn, Ker(ij|K)is orthogonal toUL.

By Corollary2, the intersection of Ker(ij|K) over(ij|K)∈L is contained in the sum ofUL over L∈L. The

opposite inclusion is a consequence of Corollary3.

4. Log-convexity and conditional independence

A nonempty family of positive p.m.’s onX_N islog-linearif it contains together with p.m.’sP andQalso the p.m.

proportional tox→P^t(x)Q^s(x),x∈X_N, for all realt ands. The family islog-affineif this is required only with s=1−t. The log-convexity assumes the additional restriction 0< t <1. The log-affine families correspond to the full exponential families [20].

In this section,L⊆R.

Lemma 12. IfP_L⁺is log-convex then it is log-linear.

Proof. LetP , Q∈P_L⁺, 0< t <1 andRt(x)=P^t(x)Q¹⁻^t(x),x∈X. By the log-convexity, for(ij|K)∈LEqs (1) hold withP replaced byRt. Thus, the coordinate functions oft→Ψij|K(Rt)vanish whent ranges between 0 and 1.

Since they are holomorphic they vanish identically. It follows thatP_L⁺is closed to the log-affine combinations. The assertion follows because it is not difficult to see that a log-affine family that contains the uniform p.m. onX_N is

log-linear.

Lemma 13. IfP_L⁺ is log-convex and a p.m.P onX_N is proportional toe^w for somew∈R^X^N thenP ∈P_L⁺if and only ifwbelongs to the sum ofULoverL∈L.

(7)

Proof. The log-convex combinations ofP with the uniform p.m.

P_t(x)=e^{t w(x)}

y∈X_N

e^{t w(y)}, x∈X_N,

are viewed as a curve parameterized byt. Its tangent vector at the uniform p.m.P₀equals m_Nw−m²_N

y∈X_N

w(y).

If P ∈P_L⁺ then the log-convexity of P_L⁺ implies that the curve ranges in P_L⁺. Hence, the tangent belongs to Ker(ij|K)for(ij|K)∈L. By Lemma11, the tangent is in the sum ofULoverL∈L. Since∅∈LandU∅consists of the constant functions the sum containsw.

Letwbelong to the sum ofULoverL∈L. By Lemma12,P_L⁺is log-linear. Therefore, to prove thatP ∈P_L⁺ it suffices to confine to the case w∈ULfor someL∈L. The assertion is trivial ifL=∅, havingwconstant and P uniform. Otherwise,L=∅andw is orthogonal toU∅ by Lemma3. Then,x→mN[1+εw(x)]defines a p.m.

forεsufficiently close to 0. By Corollary3, this p.m. belongs toP_L. It is proportional to e^ln^[¹⁺^εw^]. The log-affinity ofP_L⁺implies that the p.m.P_εproportional to e^w^ε withw_ε=¹_εln[1+εw]belongs toP_L⁺. Limiting withεto 0 the functionsw_εconverge towandP_εconverges toP ∈P⁺. Hence,P ∈P_L⁺.

TheCI-constraint (1) given by(ij|K)can be equivalently expressed as

Q(xixjxK)·Q(yiyjxK)=Q(xiyjxK)·Q(yixjxK), xi, yi∈Xi, xj, yj∈Xj, xK∈XK, (6) whereQdenotes the marginal ofP toij K,x_ix_jx_Kis the element ofX_{ij K}that projects tox_i,x_j andx_K, andy_iy_jx_K, x_iy_jx_Kandy_ix_jx_K have analogous meaning. IfK=N\ij thenQ=P and it is easy to see that if two p.m.’s satisfy the equations in (6) then the equations also hold for the log-linear combinations of the p.m.’s.

Under the additional natural assumption|X_i|>1 for alli∈N, the following lemma and the proof of Theorem1 can be slightly simplified but when not excluding|X_i| =1 only additional minor technicalities are needed.

Lemma 14. IfP_L⁺is log-convex,L∈Land|X|>1for each∈Lthen all subsets ofLbelong toL.

Proof. It suffices to prove thatL\∈Lfor all∈L. This is accomplished when the violation of(ij|K)(L\)for some(ij|K)implies that the couple is not inL. Thus, assumeij⊆L\⊆ij K. By the assumption on cardinalities, there exists y ∈X_N with s(y)=L. Let z∈X_N haves(z)=and the same th coordinate asy,y=0. Since L, ∈LLemma6impliesαy∈UL andαz∈U. By the log-convexity and Lemma13, the p.m.P proportional to e^α^y⁺^α^zis inP_L⁺.

There existst >0 such thatt·π_N_\P (x_N_\)equals e^α^y⁽⁰^x^N^\⁾⁺^α^z⁽⁰^x^N^\⁾+e^α^y^(y^x^N^\⁾⁺^α^z^(y^x^N^\⁾+

x∈X\{0,y}

e^α^y^(x^x^N^\⁾⁺^α^z^(x^x^N^\⁾

=e^α^y⁽⁰^x^N\⁾⁺¹+e^α^y^(y^x^N\⁾⁻¹+ |X| −2, xN\∈XN\,

whereα_yandα_zvanish atxx_N_\. Combiningij⊆L\⊆ij KandL∈L, the setij Kcannot contain. Therefore, πij KP =Qis a marginal ofπN\P. SinceπN\P depends onxN\ only throughπ_{ij K}^N^\xN\it follows from (3) that Q(π_{ij K}^N^\xN\)is equal to|XN\ij K|πN\P (xN\). Therefore,

Q(0i0j0K)Q(y_iy_j0K)−Q(0iy_j0K)Q(y_i0j0K)

is a positive multiple of[e²+e⁻²+ |X| −2]²− |X|². Using (6), this implies(ij|K) /∈L.

Proof of Theorem1. LetRN,Xconsist of the couples(ij|N\ij )with|X_i| =1 or|X_j| =1. By (1),P_L=P_L∪RN,X

for allL⊆R. LetG_L=(N, E)be the graph withij /∈Eif and only if (ij|K)∈Lfor someK⊆N\ij. By (2),

(8)

a p.m. factorizes w.r.t.G_Lif and only if it does w.r.t.G_L∪R_N,X. It follows that it suffices to prove the assertion of Theorem1under the additional assumptionRN,X⊆L. Hence, anyL∈L with at least two elements has|X|>1 for all∈L. By Lemma14, all subsets of anyL∈L belong toL; thusL is hereditary. Using Lemma9, a set L⊆N with at least two elements belongs toL if and only ifij∈L for everyi, j ∈L; thusL is conformal. It follows thatLis the family of cliques ofG_L. Hence, the assertion of Theorem1obtains from Lemmas5and13.

5. Discussion and remarks

5.1. The main result and its corollary

For an undirected graphG=(N, E)letKGconsist of the couples(ij|N\ij )havingij /∈E. The p.m.’s fromP_K_G are calledpairwise Markov w.r.t. G. The families P_K⁺_G parameterized by undirected graphs Ghave been known in the statistical literature as theMarkov models over undirected graphs[9,20], over the contingency tables. They are log-convex because each familyP_{⁺_(ij_|_N_\_{ij )}_}is log-linear by the argument following (6) and intersections of log- linear families are log-linear. Hence, Theorem1directly implies the assertion of Corollary1,P_K⁺_G=F_G⁺. Rephrased verbally, given an undirected graph, a positive p.m. is pairwise Markov if and only if it factorizes. The assumption of positivity matters, see [26], p. 22, and [20], Example 3.10.

Alternatively,P_K⁺_G=F_G⁺is a simple consequence of the lemmas on the interaction spaces from Section2. In fact, it is rather straightforward thatP ∈P⁺satisfies theCI-constraint(ij|N\ij )if and only if lnP:x→lnP (x)belongs to the sum ofVN\i andVN\j, see, e.g., [20], (3.6). This is the sum ofUI whereI does not containij, on account of Lemma4. Therefore, by Corollary2,P ∈P⁺is pairwise Markov w.r.t.Gif and only if lnP belongs to the sum ofUI whereI contains noij /∈E, thusI is a clique ofG. To concludeP_K⁺_G=F_G⁺it suffices to evoke Lemma5.

It seems that this short geometric proof of Corollary1via the intersections of sums of the interaction spaces is new.

The presented argumentation is coordinate-free; no projectors, Moebius transform, algebras, and a special choice of 0 ∈X_N are employed. The only comparable proof of Corollary1 is the inductive one by Brook [5], see also [16], Theorem 7.1, which however has no geometric content.

ForL⊆R, Theorem1and log-convexity of the Markov models imply thatP_L⁺is log-convex if and only if it is Markov over an undirected graph. Thus, the class of these Markov models can be equivalently defined by means of theCI-constraints and log-convexity, without any reference to graphs. Here, both the conditional independence [11]

and log-convexity [6], including the geometrical viewpoint of [1], have been recognized for decades as basic building stones in statistics. Theorem1seems to be a new bridge between them.

5.2. Gibbs probability measures

Given a distinguished element0 =(0i)_i_∈_N ofX_N and a hereditary hypergraph(N,A), a p.m.P ∈P⁺isGibbsianif there exist real-valued functionsv_IonX_I such that

lnP (x)=

I∈A,I⊆s(x)

v_I(π_Ix), x∈X. (7)

The family of such probability measures is denoted byG₀⁺_,_A. IfxI∈XI and for somei∈I theith coordinate ofxI

equals0ithen the valuev_I(x_I)does not occur in (7). Therefore, it can be equivalently assumed in the above definition that all such values equal zero, thusv_Iare adapted to0. In this situation, the summation in (7) can run equivalently over I∈A. Thus, the Gibbs p.m.’s factorize in a special way,G₀⁺_,_A⊆F_A⁺. By Lemmas5and7, ifP ∈F_A⁺thenP =e^w forw∈W_Ain the form

I∈Av_Iπ_I with allv_I adapted to0. Therefore,F_A⁺⊆G_0,⁺_A. Thus, over a hypergraph, the notion of Gibbs p.m.’s does not depend on the choice of0 and coincides with the factorizability of positive p.m.’s. In literature, e.g., in [7], Theorem 2, [26], Theorem 2, typically (7) is stated to be equivalent to

v_K(π_Kx)=

I⊆K

(−1)^|^K^\^I^|lnP (ρ_Ix), K∈A, x∈X_N,s(x)=K,

which is a reinterpretation of the computation in (4), based on Moebius transform.

(9)

The identityP_K⁺_G =F_G⁺, disguised in Gibbs p.m.’s and potentials, has been revealed and proved independently several times. Over point lattices it goes back to [2,30]. Two early unpublished manuscripts [14,17]²have been fre- quently cited. Other proofs are in [3], p. 198, [7], p. 22, [15,27,28], and three proofs in [26], Theorem 2. For personal remarks see also Hammersley’s discussion in [3], p. 230.

Under topological assumptions on the state spaces, Hammersley–Clifford theorem from [12,20] asserts that over a graph the pairwise Markovness is equivalent to the factorization, for the p.m.’s with continuous and positive densities w.r.t. a product measure.

5.3. Miscellany

WhenP andQare p.m.’s on a measurable space that are not mutually singular,p andq are their densities with respect to a dominating measureR and 0< t <1, the log-convex combination ofP andQis defined as the p.m.

with theR-density proportional top^tq¹⁻^t. The definition is not dependent on the choice ofR. A family of p.m.’s is called log-convex if it is closed to the log-convex combinations of pairs of not mutually singular p.m.’s. Log-convex families of mutually absolutely continuous p.m.’s are called ‘geodesically convex’ in [6]. Examples of log-convex sets comprise the exponential families with convex sets of canonical parameters and their extensions [8].

A crucial role in the proof of Theorem1is played by the binary relationbetweenRand the power set ofN. It appeared previously prior to [23], Theorem 4. In addition to the mappingL→L, it gives rise to

A→A=

(ij|K)∈R: (ij|K)Lfor allL∈A ,

whereAis a family of subsets ofN. The pair of mappings forms a Galois connection [4], Chapter V, Sections 7 and 8.

By the remark preceding [23], Theorem 4, the connection gives rise to an antiisomorphism betweenDCI-relations and C^w-families, similarly to [23], Theorem 1.

The implication of Lemma9is valid for the family of connected sets in any topological space in the role ofL; it goes back at least to [18].

General conditional independence statements can be reduced to families of theCI-constraints (1) by [22], Lemma 3.

The familiesP_LandP_L⁺have been, in spite of considerable effort, far from being understood in general [12,24,31].

6. Gaussian probability measures

In this sectionN= {1,2, . . . , n}. A p.m. onRⁿis regular Gaussian (rG) if its density with respect to the Lebesgue measure has the form

x→(2π)⁻^n/2(detA)⁻^1/2exp −¹₂(x−μ)^TA⁻¹(x−μ)

, x∈Rⁿ,

whereμis a (column) vector fromRⁿandA=(a_i,j)_i,j_∈_Na real positive definite matrix. The standard notation detA is used for the determinant ofAand^T for the transposition.

AnrGp.m. satisfies theCI-constraint given by(ij|K)∈Rif and only if detAiK,j K vanishes whereAiK,j K is the submatrix ofAwhose rows are indexed byiK and columns byj K, see [21]. Hence, the p.m. is pairwise Markov w.r.t. an undirected graphG=(N, E)if and only if detAN\j,N\i=0 wheneverij /∈E; this equality is equivalent to vanishing of the element ofA⁻¹indexed byi, j. It follows that the p.m. factorizes w.r.t. the graph [20], Section 5.1.4, which implies Markovness [20], Proposition 3.8. Thus, as well-known, the pairwise Markovness of anyrGp.m. is equivalent to a factorization, over an undirected graph. For the generalCI-constraints on Gaussian variables see [12, 13,21,25,29].

Log-convex combinations of tworGp.m.’s parameterized byμ, Aandν, B are therGp.m.’s parameterized by someλ_μ,ν,t∈Rⁿand[t A⁻¹+(1−t )B⁻¹]⁻¹, 0< t <1.

Theorem 2. If the set of rG p.m.’s on Rⁿ that satisfy all CI-constraints from L⊆R is log-convex then this set coincides with the set of all rG p.m.’s that factorize w.r.t.the graph(N, E)that hasij∈Eif and only if(ij|K) /∈L for allK⊆N\ij.

2The former work is not available to the author.

On conditional independence and log-convexity

On conditional independence and log-convexity 1

František Matúš

On conditional independence and log-convexity ¹