www.imstat.org/aihp 2012, Vol. 48, No. 4, 1137–1147
DOI:10.1214/11-AIHP431
© Association des Publications de l’Institut Henri Poincaré, 2012
On conditional independence and log-convexity 1
František Matúš
Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Pod vodárenskou vˇeží 4, 182 08 Prague, Czech Republic. E-mail:[email protected]
Received 20 November 2010; revised 21 April 2011; accepted 26 April 2011
Abstract. If conditional independence constraints define a family of positive distributions that is log-convex then this family turns out to be a Markov model over an undirected graph. This is proved for the distributions on products of finite sets and for the regular Gaussian ones. As a consequence, the assertion known as Brook factorization theorem, Hammersley–Clifford theorem or Gibbs–Markov equivalence is obtained.
Résumé. Si des contraintes d’indépendance conditionnelle définissent une famille de distributions positives qui est log-convexe, alors cette famille doit être un modèle de Markov sur un graphe non-dirigé. Ceci est démontré pour les distributions sur le pro- duits d’ensembles finis et pour les distributions gaussiennes régulières. Par conséquent, l’assertion connue comme le théorème de factorisation de Brook, le théorème de Hammersley–Clifford ou l’équivalence de Gibbs–Markov est obtenue.
MSC:Primary 62H05; 62M40; secondary 62H17; 62J10; 05C50; 11C20; 15A15
Keywords:Conditional independence; Markov properties; Factorizable distributions; Graphical Markov models; Log-convexity; Gibbs–Markov equivalence; Markov fields; Hammersley–Clifford theorem; Contingency tables; Gibbs potentials; Multivariate Gaussian distributions; Positive definite matrices; Covariance selection model
1. Main result
LetN be a finite set,Xi,i∈N, finite nonempty state spaces,XI the product ofXi overi∈I, andπI the coordinate projection ofXN toXI,I ⊆N. The marginal of a probability measure (p.m.)P onXN toI is the p.m.πIP onXI given by
πIP (πIx)=
y∈X
P (y)δx,yI , x∈XN,
whereδx,yI equals one ifπIx=πIy and zero otherwise.
A p.m.P onXNsatisfies aconditional independence(CI-) constraint if
πij KP (πij Kx)·πKP (πKx)=πiKP (πiKx)·πj KP (πj Kx), x∈XN, (1) wherei, j∈N are different andK⊆N\ij. By convention, an elementi∈N is not distinguished from the singleton {i}and the sign∪for unions of subsets ofN is omitted.
1Supported in part by Grant Agency of Academy of Sciences of the Czech Republic, Grant IAA 100750603, and by Grant Agency of the Czech Republic, Grant 201/08/0539.
The family of all ordered couples(ij|K)withi,j andKas above is denoted byR. ForL⊆RletPLdenote the family of p.m.’s onXNthat satisfy theCI-constraint given by each(ij|K)∈LandPL+=PL∩P+. Here,P+is the family of p.m.’s onXN that are positive in the senseP (x) >0 for allx∈XN. Given a p.m.P, the projectionsπi, i∈N, can be interpreted as the random variables jointly distributed according toP. Their distribution belongs toPL if and only ifπi andπjare stochastically independent given(πk)k∈K for all(ij|K)∈L.
A nonempty subfamily ofP+islog-convexif it contains together with p.m.’sP andQalso the p.m. proportional tox→Pt(x)Q1−t(x),x∈XN, for all 0< t <1. The general definition of log-convexity is recalled and discussed in Section5.3.
A hypergraph(N,A)consists of the vertex setN and a nonempty familyAof subsets ofN, called hyperedges.
A p.m.P onXN isfactorizablew.r.t. the hypergraph if for eachI∈Athere exists a real-valued functionψI onXI such that
P (x)=
I∈A
ψI(πIx), x∈XN. (2)
The family of all p.m.’s on XN that factorize w.r.t. the hypergraph is denoted by FA and FA+=FA∩P+. An undirected graphG=(N, E)has a vertex setN and an edge setE, contained in the familyN
2
of all two-element subsets ofN. A setLof vertices is a clique ofGifL
2
⊆E. IfAis the family of cliques ofG, the notationFG+is preferred toFA+and one speaks about the factorization w.r.t.G.
Theorem 1. IfL⊆RandPL+is log-convex then this family coincides withFG+whereG=(N, E)is the graph with ij∈Eif and only if(ij|K) /∈Lfor allK⊆N\ij.
In other words, given a familyLthe graphGis constructed onN by removing fromN
2
allijsuch that(ij|K)∈L for someK⊆N\ij; Theorem1asserts that the log-convexity ofPL+impliesPL+=FG+.
IfLconsists exclusively of the couples(ij|K)havingK=N\ij then it is log-convex. This follows easily from a well-known equivalent definition ofCI-constraints, see (6) below. In this special case Theorem 1implies what is called by statisticians Brook factorization theorem [5,16] or Hammersley–Clifford theorem [3,7] and by physicists Gibbs–Markov equivalence [19,26].
Corollary 1. IfG=(N, E)is an undirected graph andLconsists of all couples(ij|N\ij )such thatij is not an edge ofGthenPL+=FG+.
A proof of Theorem 1 is deferred to Section4. It is based on properties of interaction spaces, summarized in Section2, and new observations on the behaviour of the familyPLaround the uniform p.m., presented in Section3.
Discussion and remarks are collected in Section 5 that contains also a short independent proof of Corollary 1 of geometric flavor. Section6is devoted to a version of Theorem1involving regular multidimensional Gaussian p.m.’s, formulated as Theorem2.
2. Interaction spaces and factorizability
The factorization (2) of positive p.m.’s w.r.t. a hypergraph can be equivalently described on the exponential scale by a linear space. The space decomposes orthogonally to interaction spaces. The material of this section is standard, see [10] and [20], Appendix B.2. The aim is to prepare arguments used in Sections3and4, and supply self-contained proofs.
ForI ⊆NletVI denote the linear space of those functionsvofx∈XNthat depend onxonly throughπIx, thus v(x)=v(y)onceπIx=πIy,x, y∈XN. Forv∈VI
πIv(πIx)= |XN\I|v(x), x∈XN, (3)
where the marginalization of functions onXN is defined analogously to that of p.m.’s, thusπIv(πIx)is the sum of v(y)overy∈XN satisfyingπIx=πIy.
Lemma 1. A functionuonXNis orthogonal toVJ,J⊆N,if and only ifπJu=0.
Proof. A functionvbelongs toVJ if and only if it can be written as the compositionv=vJπJ wherevJ:XJ→R. Since
u, v =
x∈XN
u(x)·v(x)=
xJ∈XJ
πJu(xJ)·vJ(xJ)= πJu, vJ
andvJ is arbitrary the assertion follows.
ForI ⊆N letUI denote the orthogonal complement inVI to the sum ofVJ overJI, and be referred to as an interaction space.
Lemma 2. IfJ⊆N does not containI⊆Nandu∈UI,thenπJu=0.
Proof. Since UI ⊆VI the marginalπJu of u∈UI to J depends onxJ ∈XJ only throughπIJ∩JxJ whereπIJ∩J denotes the coordinate projection ofXJ toXI∩J. It follows from (3), applied toπJuandI ∩J in the roles ofv andI, thatπIJ∩JπJu(πIJ∩JxJ)equals|XJ\I|πJu(xJ)for xJ ∈XJ. Therefore, it suffices to prove that the marginal πIJ∩JπJu=πI∩Juofu∈UIvanishes. By Lemma1, this is equivalent to the orthogonality ofuandVI∩J which holds
by the definition ofUIandI∩J=I.
Lemma 3. The spacesUI,I⊆N,are pairwise orthogonal.
Proof. IfI, J ⊆N are different then, up to symmetry,J does not containI. By Lemma2, ifu∈UI thenπJu=0.
Lemma1implies thatuis orthogonal toVJ. Thus,UIandUJ are orthogonal.
For a hypergraph(N,A), letWAdenote the sum ofVI overI∈A.
Lemma 4. For a hypergraph(N,A)the spaceWAequals the orthogonal sum ofUIover allI⊆N that are covered by some hyperedge fromA.
Proof. By Lemma3, the assertion follows from its special instance disregarding orthogonality. On account of the definition ofWA, it suffices to restrict to the hypergraphs withA= {I},I⊆N, in which caseWA=VI. Induction on the cardinality ofI is employed to prove thatVIis the sum ofUJ overJ⊆I. IfI=∅thenV∅andU∅coincide with the space of constant functions onXNand the assertion is trivial. Otherwise,VIdecomposes orthogonally intoUI
and the sum ofVJ overJI. By the induction assumption, this sum equals the sum ofUJ overJI whence the
induction step is completed.
Corollary 2. The Euclidean spaceVN=RXN decomposes orthogonally toUI,I⊆N.
Lemma 5. For every hypergraph(N,A)the familyFA+ consists of the p.m.’s that are proportional toew wherew belongs to the orthogonal sum ofUIover allI⊆Nthat are covered by some hyperedge fromA.
Proof. Ifwbelongs to the above sum ofUI then it is in the sum ofVIoverI∈A. Writingwas the sum of functions vIπIwherevI:XI→R, a p.m. proportional to ewis given byx→texp[I∈AvI(πIx)],x∈XN, with some constant t >0. Such a p.m. belongs toFA+, choosingψIin (2) proportional to evI.
If P ∈FA+ then, by (2), positive functionsψI exist such thatP (x)=ew(x),x∈XN, wherew is the sum over I ∈Aof the functionsx→lnψI(πIx). Thus,P is proportional to ew, with the proportionality constant equal to 1.
By definition,wbelongs toWA, equal to the above sum ofUIby Lemma4.
Corresponding to the interaction spacesUI, a special baseαy,y∈XN, of the spaceRXN is constructed. To this end, it is assumed that eachXihas a distinguished element denoted by0i and0=(0i)i∈N. The functionαyis defined atx∈XNby
αy(x)=
(−1)|s(y)∩s(x)|, x∼y,
0, otherwise,
wheres(y)denotes the support{i∈N: yi =0i}ofy andx∼y abbreviates the equality of projections ofx andy ontos(x)∩s(y).
Lemma 6. IfI⊆N then{αy: s(y)⊆I}is a base ofVI and{αy: s(y)=I}a base ofUI.
Proof. Fory, z∈XNthe scalar product ofαyandαz equals the sum of(−1)|[s(y)s(z)]∩s(x)|overx∈XNsuch that x∼y andx∼z. Fori∈s(y)\s(z)the range of summation is partitioned into the pairs that differ only in theith coordinate, belonging to{0i, yi}. By the summations over the pairs, the scalar product vanishes. Therefore,αyandαz are orthogonal whens(y)=s(z). Ifyandzhave the same support thenαy(z)=(−1)|s(y)|δNy,z, and thus the functions αywiths(y)=I are independent. It follows that{αy: s(y)⊆I}is an independent set. This set is a base ofVIbecause αy∈Vs(y)⊆VIand dimVI= |XI|. Then,{αy: s(y)=I}is a base ofUIby Lemma4.
In particular, the spaceUI has a positive dimension if and only if|Xi|>1 for alli∈I. IfXi= {0i,1i},i∈N, then eachUI is spanned by the single functionαy wherey=(yi)i∈Nhasyi equal to1i fori∈I and0i otherwise. These functions form an orthogonal base ofRXN by Corollary2.
The following technical lemma is prepared for the discussion in Section5.2.
Lemma 7. For any hypergraph(N,A)the familyWAis the direct sum of the spaces V0,I =
v∈VI: v(x)=0oncexi=0ifor somei∈I overI⊆Nthat can be covered by someJ∈A.
The spaceV0,I consists of the functionsvIπI wherevI:XI→Rvanishes on eachxI∈XI having a coordinate xi=0i. Such a functionvI is said to beadaptedto0.
Proof of Lemma7. ForI⊆NletρImapx∈XNtoy∈XNgiven byπIy=πIxandπN\Iy=πN\I0. Letw∈RXN be decomposed as the sum ofvIπI overI⊆Nwhere everyvIis adapted to0. Ifx∈XNandI=s(x)then
L⊆I
(−1)|I\L|w(ρLx)=
L⊆I
(−1)|I\L|
K⊆L
vK(πKρLx)
=
K⊆I
vK(πKx)
K⊆L⊆I
(−1)|I\L|=vI(πIx). (4)
Therefore, the functionsvI are unique and, in turn, the sum of allV0,I,I⊆N, is direct.
Ifvis a function onXNthen
I⊆N
(−1)|N\I|vρI∈V0,N.
In fact, ifx∈XN has a coordinatexi=0ithenρIx=ρI\ixfori∈I⊆N, and grouping the summands into the pairs I,I\ithe sum vanishes. Sincev=vρNandvρI∈VI it follows thatVN=RXNequalsV0,N plus the sum ofVIover IN. By induction on the cardinality ofN,VNis the sum ofV0,I overI⊆N, which implies the assertion.
3. Conditional independence and interactions
In this section, theCI-constraints are related to interaction spaces. This is done via the relation(ij|K)Lbetween (ij|K)∈RandL⊆Ndefined by the statement ‘i /∈Lorj /∈LorL⊆ij K.’ In other words,(ij|K) /Lif and only ifij ⊆L⊆ij K, see Section5.3. From now on,mI denotes|XI|−1,I⊆N.
Lemma 8. If (ij|K)Land u∈UL then the measure given byP (x)=mN[1+u(x)],x∈XN,satisfies the CI- constraint given by(ij|K).
Proof. By Lemma2, ifij Kdoes not containLthen the marginal ofP toij Kis uniform. Thus, theCI-constraint (1) reduces to mij KmK=miKmj K. Otherwise,L⊆ij K, and it follows from(ij|K)Lthat i /∈Lor j /∈L. By the symmetry between iandj, it suffices to restrict to the casei /∈L. Then,L⊆j K which implies thatubelongs to Vj K⊆Vij K. By the double use of (3) withij Kandj Kin the role ofI, the constraint (1) rewrites to
mij K 1+u(x)
·πKP (πKx)=πiKP (πiKx)·mj K 1+u(x)
, x∈XN. (5)
Ifj ∈LthenKandiK do not containL. By Lemma2, the marginals ofP toK andiK are uniform whence (5) holds. If j /∈L then L⊆K, and thus u∈VK ⊆ViK. By the double use of (3) withK andiK in the role of I, πKP (πKx)=mK[1+u(x)]andπiKP (πiKx)=miK[1+u(x)]. Hence, (5) is always satisfied.
ForL⊆RletLdenote the family ofL⊆N such that(ij|K)Lfor all(ij|K)∈L.
Corollary 3. IfL⊆RandL∈Lthen any p.m.onXNthat differs from the uniform one by a vector fromUL,as in Lemma8,belongs toPL.
An example of the constructionL→L arises from a graphG=(N, E)andKG= {(ij|N\ij ): ij ∈N
2
\E}, arriving at the familyKGof cliques ofG.
For a later reference the following simple assertion is needed.
Lemma 9. For anyL⊆R,ifM⊆N and every differenti, j∈Mcan be covered by someLij ∈L contained inM thenM∈L.
Proof. If a couple(ij|K)∈Lhasi, j∈Mthen, by the assumption,ij⊆Lij⊆Mfor someLij∈L. The definition ofLimplies that(ij|K)Lij, and thusLij⊆ij Kbecauseij⊆Lij. Hence,M⊆ij K. It follows that(ij|K)M.
In turn,M∈L.
In the remaining part of this section, theCI-constraints (1) are analyzed via the mappingsΨij|Kgiven by Ψij|K(w)=
πij Kw(πij Kx)·πKw(πKx)−πiKw(πiKx)·πj Kw(πj Kx)
x∈XN.
Here,(ij|K)∈Randwis a function onXN. A p.m. can play the role ofwas well.
Lemma 10. The Jacobian ofΨij|Kat the uniform p.m.onXNis equal to mKδx,yij K+mij KδKx,y−mj Kδx,yiK −miKδj Kx,y
x,y∈XN.
Proof. The coordinate function ofΨij|K indexed byx∈XN equals
z∈XN
z∈XN
w(z)w(z) δx,zij KδKx,z−δiKx,zδx,zj K .
Differentiating w.r.t.w(y),y∈XN,
z∈XN
z∈XN
δNy,zw(z)+w(z)δz,yN δij Kx,zδx,zK −δx,ziKδj Kx,z .
Whenw(z)=mN,z∈XN, this equals mN
z∈XN
δx,yij Kδx,zK −δx,yiKδj Kx,z +mN
z∈XN
δx,zij Kδx,yK −δx,ziKδj Kx,y
and the assertion follows.
Let Ker(ij|K)denote the kernel of the Jacobian ofΨij|Kat the uniform p.m.
Lemma 11. ForL⊆Rthe intersection ofKer(ij|K)over(ij|K)∈Lis equal to the sum ofULoverL∈L. Proof. IfL /∈Lthenij⊆L⊆ij Kfor some(ij|K)∈L. Foru∈ULandv∈Ker(ij|K)
x∈XN
u(x)
y∈XN
mKδx,yij K+mij KδKx,y−mj Kδx,yiK −miKδj Kx,y
v(y)=0
because the inner sums equal zero due to Lemma10. Since none of the setsK,iK andj KcontainsLthe marginals ofuto these sets vanish by Lemma2and the above equation rewrites to
x∈XN
y∈XN
u(x)v(y)δx,yij K=0.
SinceL⊆ij Kthe functionubelongs toVij K, and thusu(x)δij Kx,y =u(y)δx,yij K. Therefore,uandvare orthogonal. In turn, Ker(ij|K)is orthogonal toUL.
By Corollary2, the intersection of Ker(ij|K) over(ij|K)∈L is contained in the sum ofUL over L∈L. The
opposite inclusion is a consequence of Corollary3.
4. Log-convexity and conditional independence
A nonempty family of positive p.m.’s onXN islog-linearif it contains together with p.m.’sP andQalso the p.m.
proportional tox→Pt(x)Qs(x),x∈XN, for all realt ands. The family islog-affineif this is required only with s=1−t. The log-convexity assumes the additional restriction 0< t <1. The log-affine families correspond to the full exponential families [20].
In this section,L⊆R.
Lemma 12. IfPL+is log-convex then it is log-linear.
Proof. LetP , Q∈PL+, 0< t <1 andRt(x)=Pt(x)Q1−t(x),x∈X. By the log-convexity, for(ij|K)∈LEqs (1) hold withP replaced byRt. Thus, the coordinate functions oft→Ψij|K(Rt)vanish whent ranges between 0 and 1.
Since they are holomorphic they vanish identically. It follows thatPL+is closed to the log-affine combinations. The assertion follows because it is not difficult to see that a log-affine family that contains the uniform p.m. onXN is
log-linear.
Lemma 13. IfPL+ is log-convex and a p.m.P onXN is proportional toew for somew∈RXN thenP ∈PL+if and only ifwbelongs to the sum ofULoverL∈L.
Proof. The log-convex combinations ofP with the uniform p.m.
Pt(x)=et w(x)
y∈XN
et w(y), x∈XN,
are viewed as a curve parameterized byt. Its tangent vector at the uniform p.m.P0equals mNw−m2N
y∈XN
w(y).
If P ∈PL+ then the log-convexity of PL+ implies that the curve ranges in PL+. Hence, the tangent belongs to Ker(ij|K)for(ij|K)∈L. By Lemma11, the tangent is in the sum ofULoverL∈L. Since∅∈LandU∅consists of the constant functions the sum containsw.
Letwbelong to the sum ofULoverL∈L. By Lemma12,PL+is log-linear. Therefore, to prove thatP ∈PL+ it suffices to confine to the case w∈ULfor someL∈L. The assertion is trivial ifL=∅, havingwconstant and P uniform. Otherwise,L=∅andw is orthogonal toU∅ by Lemma3. Then,x→mN[1+εw(x)]defines a p.m.
forεsufficiently close to 0. By Corollary3, this p.m. belongs toPL. It is proportional to eln[1+εw]. The log-affinity ofPL+implies that the p.m.Pεproportional to ewε withwε=1εln[1+εw]belongs toPL+. Limiting withεto 0 the functionswεconverge towandPεconverges toP ∈P+. Hence,P ∈PL+.
TheCI-constraint (1) given by(ij|K)can be equivalently expressed as
Q(xixjxK)·Q(yiyjxK)=Q(xiyjxK)·Q(yixjxK), xi, yi∈Xi, xj, yj∈Xj, xK∈XK, (6) whereQdenotes the marginal ofP toij K,xixjxKis the element ofXij Kthat projects toxi,xj andxK, andyiyjxK, xiyjxKandyixjxK have analogous meaning. IfK=N\ij thenQ=P and it is easy to see that if two p.m.’s satisfy the equations in (6) then the equations also hold for the log-linear combinations of the p.m.’s.
Under the additional natural assumption|Xi|>1 for alli∈N, the following lemma and the proof of Theorem1 can be slightly simplified but when not excluding|Xi| =1 only additional minor technicalities are needed.
Lemma 14. IfPL+is log-convex,L∈Land|X|>1for each∈Lthen all subsets ofLbelong toL.
Proof. It suffices to prove thatL\∈Lfor all∈L. This is accomplished when the violation of(ij|K)(L\)for some(ij|K)implies that the couple is not inL. Thus, assumeij⊆L\⊆ij K. By the assumption on cardinalities, there exists y ∈XN with s(y)=L. Let z∈XN haves(z)=and the same th coordinate asy,y=0. Since L, ∈LLemma6impliesαy∈UL andαz∈U. By the log-convexity and Lemma13, the p.m.P proportional to eαy+αzis inPL+.
There existst >0 such thatt·πN\P (xN\)equals eαy(0xN\)+αz(0xN\)+eαy(yxN\)+αz(yxN\)+
x∈X\{0,y}
eαy(xxN\)+αz(xxN\)
=eαy(0xN\)+1+eαy(yxN\)−1+ |X| −2, xN\∈XN\,
whereαyandαzvanish atxxN\. Combiningij⊆L\⊆ij KandL∈L, the setij Kcannot contain. Therefore, πij KP =Qis a marginal ofπN\P. SinceπN\P depends onxN\ only throughπij KN\xN\it follows from (3) that Q(πij KN\xN\)is equal to|XN\ij K|πN\P (xN\). Therefore,
Q(0i0j0K)Q(yiyj0K)−Q(0iyj0K)Q(yi0j0K)
is a positive multiple of[e2+e−2+ |X| −2]2− |X|2. Using (6), this implies(ij|K) /∈L.
Proof of Theorem1. LetRN,Xconsist of the couples(ij|N\ij )with|Xi| =1 or|Xj| =1. By (1),PL=PL∪RN,X
for allL⊆R. LetGL=(N, E)be the graph withij /∈Eif and only if (ij|K)∈Lfor someK⊆N\ij. By (2),
a p.m. factorizes w.r.t.GLif and only if it does w.r.t.GL∪RN,X. It follows that it suffices to prove the assertion of Theorem1under the additional assumptionRN,X⊆L. Hence, anyL∈L with at least two elements has|X|>1 for all∈L. By Lemma14, all subsets of anyL∈L belong toL; thusL is hereditary. Using Lemma9, a set L⊆N with at least two elements belongs toL if and only ifij∈L for everyi, j ∈L; thusL is conformal. It follows thatLis the family of cliques ofGL. Hence, the assertion of Theorem1obtains from Lemmas5and13.
5. Discussion and remarks
5.1. The main result and its corollary
For an undirected graphG=(N, E)letKGconsist of the couples(ij|N\ij )havingij /∈E. The p.m.’s fromPKG are calledpairwise Markov w.r.t. G. The families PK+G parameterized by undirected graphs Ghave been known in the statistical literature as theMarkov models over undirected graphs[9,20], over the contingency tables. They are log-convex because each familyP{+(ij|N\ij )}is log-linear by the argument following (6) and intersections of log- linear families are log-linear. Hence, Theorem1directly implies the assertion of Corollary1,PK+G=FG+. Rephrased verbally, given an undirected graph, a positive p.m. is pairwise Markov if and only if it factorizes. The assumption of positivity matters, see [26], p. 22, and [20], Example 3.10.
Alternatively,PK+G=FG+is a simple consequence of the lemmas on the interaction spaces from Section2. In fact, it is rather straightforward thatP ∈P+satisfies theCI-constraint(ij|N\ij )if and only if lnP:x→lnP (x)belongs to the sum ofVN\i andVN\j, see, e.g., [20], (3.6). This is the sum ofUI whereI does not containij, on account of Lemma4. Therefore, by Corollary2,P ∈P+is pairwise Markov w.r.t.Gif and only if lnP belongs to the sum ofUI whereI contains noij /∈E, thusI is a clique ofG. To concludePK+G=FG+it suffices to evoke Lemma5.
It seems that this short geometric proof of Corollary1via the intersections of sums of the interaction spaces is new.
The presented argumentation is coordinate-free; no projectors, Moebius transform, algebras, and a special choice of 0 ∈XN are employed. The only comparable proof of Corollary1 is the inductive one by Brook [5], see also [16], Theorem 7.1, which however has no geometric content.
ForL⊆R, Theorem1and log-convexity of the Markov models imply thatPL+is log-convex if and only if it is Markov over an undirected graph. Thus, the class of these Markov models can be equivalently defined by means of theCI-constraints and log-convexity, without any reference to graphs. Here, both the conditional independence [11]
and log-convexity [6], including the geometrical viewpoint of [1], have been recognized for decades as basic building stones in statistics. Theorem1seems to be a new bridge between them.
5.2. Gibbs probability measures
Given a distinguished element0 =(0i)i∈N ofXN and a hereditary hypergraph(N,A), a p.m.P ∈P+isGibbsianif there exist real-valued functionsvIonXI such that
lnP (x)=
I∈A,I⊆s(x)
vI(πIx), x∈X. (7)
The family of such probability measures is denoted byG0+,A. IfxI∈XI and for somei∈I theith coordinate ofxI
equals0ithen the valuevI(xI)does not occur in (7). Therefore, it can be equivalently assumed in the above definition that all such values equal zero, thusvIare adapted to0. In this situation, the summation in (7) can run equivalently over I∈A. Thus, the Gibbs p.m.’s factorize in a special way,G0+,A⊆FA+. By Lemmas5and7, ifP ∈FA+thenP =ew forw∈WAin the form
I∈AvIπI with allvI adapted to0. Therefore,FA+⊆G0,+A. Thus, over a hypergraph, the notion of Gibbs p.m.’s does not depend on the choice of0 and coincides with the factorizability of positive p.m.’s. In literature, e.g., in [7], Theorem 2, [26], Theorem 2, typically (7) is stated to be equivalent to
vK(πKx)=
I⊆K
(−1)|K\I|lnP (ρIx), K∈A, x∈XN,s(x)=K,
which is a reinterpretation of the computation in (4), based on Moebius transform.
The identityPK+G =FG+, disguised in Gibbs p.m.’s and potentials, has been revealed and proved independently several times. Over point lattices it goes back to [2,30]. Two early unpublished manuscripts [14,17]2have been fre- quently cited. Other proofs are in [3], p. 198, [7], p. 22, [15,27,28], and three proofs in [26], Theorem 2. For personal remarks see also Hammersley’s discussion in [3], p. 230.
Under topological assumptions on the state spaces, Hammersley–Clifford theorem from [12,20] asserts that over a graph the pairwise Markovness is equivalent to the factorization, for the p.m.’s with continuous and positive densities w.r.t. a product measure.
5.3. Miscellany
WhenP andQare p.m.’s on a measurable space that are not mutually singular,p andq are their densities with respect to a dominating measureR and 0< t <1, the log-convex combination ofP andQis defined as the p.m.
with theR-density proportional toptq1−t. The definition is not dependent on the choice ofR. A family of p.m.’s is called log-convex if it is closed to the log-convex combinations of pairs of not mutually singular p.m.’s. Log-convex families of mutually absolutely continuous p.m.’s are called ‘geodesically convex’ in [6]. Examples of log-convex sets comprise the exponential families with convex sets of canonical parameters and their extensions [8].
A crucial role in the proof of Theorem1is played by the binary relationbetweenRand the power set ofN. It appeared previously prior to [23], Theorem 4. In addition to the mappingL→L, it gives rise to
A→A=
(ij|K)∈R: (ij|K)Lfor allL∈A ,
whereAis a family of subsets ofN. The pair of mappings forms a Galois connection [4], Chapter V, Sections 7 and 8.
By the remark preceding [23], Theorem 4, the connection gives rise to an antiisomorphism betweenDCI-relations and Cw-families, similarly to [23], Theorem 1.
The implication of Lemma9is valid for the family of connected sets in any topological space in the role ofL; it goes back at least to [18].
General conditional independence statements can be reduced to families of theCI-constraints (1) by [22], Lemma 3.
The familiesPLandPL+have been, in spite of considerable effort, far from being understood in general [12,24,31].
6. Gaussian probability measures
In this sectionN= {1,2, . . . , n}. A p.m. onRnis regular Gaussian (rG) if its density with respect to the Lebesgue measure has the form
x→(2π)−n/2(detA)−1/2exp −12(x−μ)TA−1(x−μ)
, x∈Rn,
whereμis a (column) vector fromRnandA=(ai,j)i,j∈Na real positive definite matrix. The standard notation detA is used for the determinant ofAandT for the transposition.
AnrGp.m. satisfies theCI-constraint given by(ij|K)∈Rif and only if detAiK,j K vanishes whereAiK,j K is the submatrix ofAwhose rows are indexed byiK and columns byj K, see [21]. Hence, the p.m. is pairwise Markov w.r.t. an undirected graphG=(N, E)if and only if detAN\j,N\i=0 wheneverij /∈E; this equality is equivalent to vanishing of the element ofA−1indexed byi, j. It follows that the p.m. factorizes w.r.t. the graph [20], Section 5.1.4, which implies Markovness [20], Proposition 3.8. Thus, as well-known, the pairwise Markovness of anyrGp.m. is equivalent to a factorization, over an undirected graph. For the generalCI-constraints on Gaussian variables see [12, 13,21,25,29].
Log-convex combinations of tworGp.m.’s parameterized byμ, Aandν, B are therGp.m.’s parameterized by someλμ,ν,t∈Rnand[t A−1+(1−t )B−1]−1, 0< t <1.
Theorem 2. If the set of rG p.m.’s on Rn that satisfy all CI-constraints from L⊆R is log-convex then this set coincides with the set of all rG p.m.’s that factorize w.r.t.the graph(N, E)that hasij∈Eif and only if(ij|K) /∈L for allK⊆N\ij.
2The former work is not available to the author.