HAL Id: hal-00894251
https://hal.archives-ouvertes.fr/hal-00894251
Submitted on 1 Jan 1999
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Blocking Gibbs sampling in the mixed inheritance model
using graph theory
Mogens Sandø Lund, Claus Skaanning Jensen
To cite this version:
Original
article
Blocking
Gibbs
sampling
in the mixed
inheritance model
using graph theory
Mogens
Sandø
Lund
Claus
Skaanning
Jensen
a
DIAS,
Department ofBreeding
andGenetics,
Research CentreFoulum,
P.O. Box 50, 8830
Tjele,
Denmarkb
AUC,
Department
ofComputer Science,
FredrikBajers Vej 7E.,
’
9220
Aalborg 0,
Denmark(Received
10February
1998;
accepted
18 November1998)
Abstract - For the mixed inheritance model
(MIM),
including
both asingle
locusand a
polygenic effect,
we present a Markov chain Monte Carlo(MCMC)
algorithm
in which discrete genotypes of the
single
locus aresampled
inlarge
blocks from theirjoint
conditional distribution. Thisrequires
exact calculation of thejoint
distribution of agiven block,
which can be verycomplicated.
Calculations of thejoint
distributionswere obtained
using graph
theoretic methods forBayesian
networks. Anexample
of asimulated
pedigree
suggests that thisalgorithm
is more efficient thanalgorithms
withunivariate
updating
oralgorithms using blocking
of sires with their finaloffspring.
The
algorithm
can be extended to modelsutilising genetic
markerinformation,
inwhich case it holds the
potential
to solve the criticalreducibility problem
of MCMC methods often associated with such models.©
Inra/Elsevier,
Parisblocking / Gibbs sampling
/
mixed inheritance model/
graph theory/
Bayesiannetwork
Résumé -
Échantillonnage
de Gibbs par bloc dans le modèle à hérédité mixteen utilisant la théorie des graphes. Pour le cas de l’hérédité mixte
(un
seul locusavec un fond
polygénique),
onprésente
unalgorithme
de Monte-Carlo par chaînesde Markov
(MCMC)
danslequel
lesgénotypes
au locus unique sont échantillonnésen blocs
importants
àpartir
de leur distributionjointe
conditionnelle. Ceciexige
lecalcul exact de distribution
conjointe
d’un bloc donnéqui
peut être trèscompliquée.
Le calcul des distributions
jointes
est obtenu en utilisant des méthodesgraphiques
théoriques
pour les réseauxbayésiens.
Unexemple
depedigree
simulésuggère
quecet
algorithme
estplus
efficace que lesalgorithmes
à mise àjour
univariants ou pargroupes de descendance issue de même
père.
Cetalgorithme
peut être étendu à des*
Correspondence
and reprintsmodèles utilisant l’information de marqueurs
génétiques
cequi
permet d’éliminer lerisque
de réductibilité souvent associé à de tels modèlesquand
onapplique
des méthodes MCMC.©
Inra/Elsevier,
Parisblocage
/
échantillonnage de Gibbs/
modèle à hérédité mixte / théorie desgraphes
/
réseau bayésien1. INTRODUCTION
In mixed inheritance models
(MIM),
it is assumed thatphenotypes
are influencedby
thegenotypes
at asingle
locus and apolygenic
component
[19].
Unfortunately,
it is not feasible to maximise the likelihood function associated with such modelsusing analytical techniques.
Even in the case ofsingle
gene models withoutpolygenic
effects,
the need tomarginalise
over thedistribution of the unknown
single
genotypes
results incomputations
whichare not feasible. For this reason, Sheehan
[20]
used the localindependence
structure of
genotypes
to derive a Gibbssampling
algorithm
for a one-locusmodel. This
technique
circumvented the need for exact calculations incomplex
joint genotypic
distributions as the Gibbssampler only requires
knowledge
of the full conditional distributions.Algorithms
for the morecomplex
MIMs were laterimplemented using
eithera Monte Carlo EM
algorithm
[8],
or afully
Bayesian approach
[9]
with the Gibbssampler. However,
Janss et al.[9]
found that the Gibbssampler
had very poormixing properties owing
to astrong
dependency
betweengenotypes
of related individuals.They
also noticed that thesample
space waseffectively
partitioned
into
subspaces
between which movement occurred with lowprobability.
This occurred because some discretegenotypes
rarely
changed
states. This is knownas
practical
reducibility.
Both themixing
andreducibility
properties
arevastly
improved
by
sampling
genotypes
jointly.
Consequently,
Janss et al.[9]
applied
a
blocking
strategy
with the Gibbssampler,
in whichgenotypes
of sires and their finaloffspring
(non-parents),
weresampled
simultaneously
from theirjoint
distribution(sire blocking).
Thisblocking
strategy
made itsimple
to obtain exact calculations of thejoint
distribution andimproved
themixing
properties
in data structures with many finaloffspring.
However,
theblocking
strategy
of Janss and co-workers is not ageneral
solution to theproblem
because finaloffspring
may constituteonly
a small fraction of all individuals in apedigree.
An extension of another
blocking
Gibbssampler developed by
Jensen et al.[13]
couldprovide
ageneral
solution to MIMs. Theirsampler
was forone-locus
models,
andsampled
genotypes
of many individualsjointly,
even when thepedigree
wascomplex.
The method relied on agraphical
modelrepresentation
and treatedgenotypes
as variables in aBayesian
network. This results in agraphical representation
of thejoint
probability
distribution for which efficientalgorithms
toperform
exact inference exist(e.g.
[16]).
However,
a constraint ofthe
blocking
Gibbssampler developed by
Jensen and co-workers is that itonly
handles discrete
variables,
and in turn cannot be used in MIMs.The
objective
of thisstudy
is to extend theblocking
Gibbssampler
of Jensen et al.[13]
such that it can be used in MIMs. A simulatedexample
ispresented
toillustrate the
practicality
of theproposed
method. The data from theexample
2. MATERIALS AND METHODS
2.1. Mixed inheritance model
In the
MIM, phenotypes
are assumed to be influencedby
thegenotype
at asingle
major
locus and apolygenic
effect. Thepolygenic
effect is the combinedeffect of many additive and unlinked
loci,
each with a small effect. Classificationeffects
(e.g.
herd,
year or othercovariates)
caneasily
be included in the model.The statistical model for a MIM is defined as:
where y is a
(n
*
1)
vector of nobservations,
b is a(p
*
1)
vectorof p
classificationeffects,
u is a(q
*
1)
vectorof
q random
polygenic
effects,
m is a(3
*
1)
vectorof
genotype
effects and e is a(n
*
1)
vector of n random residuals. X is a(n
*
r)
design
matrixassociating
data with the ’fixed’effects,
and Z a(n
*
q)
design
matrixassociating
data withpolygenic
andsingle
gene effects. W is anunknown
(q
*
3)
randomdesign
matrix ofgenotypes
at thesingle
locus.Given location and scale
parameters,
the data are assumed to benormally
distributed aswhere
6
e
is
the residual variance. Forpolygenic effects,
we invoke theinfinites-imal additive
genetic
model[1],
resulting
innormally
distributedpolygenic
effects,
such thatwhere A is the known additive
relationship
matrixdescribing
thefamily
relations betweenindividuals,
and6
u
is
the additive variance ofpolygenic
effects.The
single
locus was assumed to have two alleles(A
1
andA
z
),
such that each individual had one of the threepossible
genotypes:
A
l
A
I
,
2
A
A
l
andA
Z
A
2
.
Foreach individual in the
pedigree,
thesegenotypes
wererepresented
as a randomvector,
w;,taking
values(100),
(O10)
or(001).
The vectors w; form therows of W and will for notational convenience be referred to as col, w2 and
0 ) 3
. For individuals which do not have known
parents
(i.e.
founderindividuals)
theprobability
distribution ofgenotype
w; was assumed to beP
(
Wi
If).
The distribution forgenotype
frequency
of the basepopulation
( f ),
was assumed tofollow
Hardy-Weinberg
proportions.
For individuals with knownparents,
thegenotype
distribution is denoted asp(w;!ws;re(;), w
aan
,(;)).
This distributiondescribes the
probability
of allelesconstituting
genotype
w;,being
transmitted fromparents
withgenotypes W
s
ire(i)
andW
dam
(i)
whensegregation
of alleles follows Mendelian transmissionprobabilities.
For individuals withonly
oneknown parent, a
dummy
individual is inserted for themissing
parent.
where W =
(w
l
, ... ,
w
n
),
F is the set offounders,
and NF is the set ofnon-founders.
To
fully specify
theBayesian model, improper
uniformpriors
were usedfor the fixed and
genotypic
effects[i.e. p(b)
occonstant,
p(m)
occonstant].
Variance
components
(i.e.
6
e
and
au)
were
assumed apriori
to beindependent
and to follow theconjugate
inverted gamma distribution(i.e.
1/
62
has theprior
distribution of a gamma random variable withparameters
a; and(3
i
).
Theparameters
a; and(3
i
can be chosen so that theprior
distribution hasany desired mean and variance. The
conjugate
Betaprior
was used for allelefrequency
(p(f) - Beta(a
f
,
(3
f
) ).
The
joint posterior
density
of all modelparameters
isproportional
to theproduct
of theprior
distributions and the conditional distribution of thedata,
given
theparameters:
2.2. Gibbs
sampling
For
Bayesian inference,
themarginal
posterior
distribution for theparam-eters of the model is of interest. With MIMs this
requires
high
dimensionalintegration
and summation of thejoint posterior
distribution(1),
with cannotbe
expressed
in closed form. Toperform
theintegration
numerically
using
the Gibbssampler requires
the construction of a Markov chain which has(1)
(nor-malised)
as itsstationary
distribution. This can beaccomplished
by
defining
the transition
probabilities
of the Markov chain as the full conditionaldistribu-tions of each model
parameter.
Samples
are then taken from these distributionsin an iterative scheme. Each time a full conditional distribution is
visited,
it isused to
sample
thecorresponding variable,
and the realised value is substituted into the conditional distribution of all other variables(see,
e.g.[5]).
Instead of
updating
all variablesunivariately
it is alsopossible
tosample
several variables from theirjoint
conditionalposterior
distribution. Variables that aresampled
jointly
will be referred to as a ’block’. Aslong
as all variables aresampled,
the new Markov chain will still haveequation
(1)
as itsstationary
distribution.
2.2.1. Full conditional
posterior
distributionsFull conditional distributions were derived from the
joint posterior
distri-bution
(1).
Theresulting
distributions arepresented
later. These distributions2.2.2. Location
parameters
Hereafter,
the restricted additivemajor
gene model will beassumed,
suchthat m’ =
(-a,
0,
a)
or m =la,
where 1’ =(-1, 0, 1)
and a is the additive effectof the
major
locus gene.Allowing
forgenotypic
means to varyindependently
or
including
a dominance effect entails nodifficulty.
The gene effect
(a)
is considered a classification effect whenconditioning
onmajor
genotypes
(W)
and thegenetic
model at the locus.Consequently,
the locationparameters
in the model are6’
=[b’,
a,u’].
Let,
H =[X:ZWI:Zj,
Q =
0
A
!i, ,
k =(
y2
/(
y2
,
C =[H’H
+S2],
and m = la. Theposterior
dis-10 A-lk I
I e utribution of location effects
(0),
given
the variancecomponents,
major
geno-types
(W)
and data(y)
is(following
[17]):
Then, using
standard results from multivariate normaltheory
(e.g.
[18]
or[22]),
the full conditional distributions of theparameters
in 0 can be writtenas:
C
ii
is the ithdiagonal
element ofC, C-
i
is the ith row of Cexcluding C
ii
,
andH
i
is the ith column of H.2.2.3.
Major
genotypes
The full conditional distribution of a
given
genotype,
w;, is foundby
extracting
fromequation
(1)
the terms in which w; ispresent.
Theprobabilities
are here
given
up to a constant ofproportionality
and must be normalised to3
ensure that
L
P
(
Wi
=Mj
)
= 1. The full conditional distribution ofgenotype
j = l W
i is:
where
lief
andli
ENF
are indicatorfunctions,
which are 1 if individual i iscontained in the set of founders
(F)
or non-founders(NF),
respectively,
and0 otherwise.
O
f
f
(
i
)
is the set ofoffspring
of individuali,
such thati(k)
is the kthoffspring
of iresulting
from amating
with mate(i(k)).
The terms,P(
W
i
=W
i IP
1
)’I
EF
+ P(
W
i
=of individual i
receiving
allelescorresponding
togenotypes
WI, W2 or W3, and theproduct
overoffspring
represents
theprobability
of individual itransmitting
alleles in the
genotypes
of theoffspring,
which are conditioned upon. Ifindividual i has a
phenotypic record,
theadjusted
recordj,
=y; -
X;b - Z;u
contributes the
penetrance
function:where
X;
andZi
are the ith rows of the matrices X and Z.2.2.4. Allele
frequency
Conditioning
on thesampled
genotypes
of founder individuals results incon-tributions of
f
for eachA
l
sampled
and(1- f )
for eachA
2
sampled.
This isbe-cause the
sampled
genotypes
are realisations of the 2nindependent
Bernoulli( f )
random variables used as
priors
for basepopulation
alleles.Multiplying
thesecontributions
by
theprior
Beta(a
f
,
(3
f
)
gives
where nA, and nA2 are the numbers of
A
1
andA
2
alleles in the basepopulation.
The
specified
distribution isproportional
to aBeta(a
f
+ &dquo; nA(3
f
+n
A2
)
distri-bution.Taking
af =(3
f
=1,
theprior
on thisparameter
is a proper uniformdistribution.
2.2.5. Variance
components
The full conditional distribution of the variance
component
au is
which is
proportional
to the inverted gamma distribution:Similarly,
the full conditional distribution of the variancecomponent
6
e
is
The
algorithm
based on univariateupdating
can be summarised as follows:I. initiate
0, W,
f,
6
u, ae,
withlegal
starting values;
II.
sample major
genotypes
w; fromequation
(3)
for i ={ 1, ... , q};
III.sample
allelefrequency
fromequation
(4);
IV.
sample
locationparameters 6;
(classification
effects andpolygenic
effects)
univariately
fromequation
(2),
for i ={1,
dimension
S
};
V.sample
6
u
from
equation
(5);
VI.
sample
6
e
from
equation
(6);
VII.repeat
II-VI.Steps
II-VI constitute one iteration. Thesystem
isinitially
monitored until sufficient evidence for convergence is observed.Subsequently,
iterations arecontinued,
and thesampled
valuessaved,
until the desiredprecision
of features of theposterior
distribution has been achieved. Themixing
diagnostic
used isdescribed in a later section.
2.3.
Blocking strategies
A more efficient alternative to the univariate
updating
of variables is toupdate
a set of variablesmultivariately.
Variablesupdated jointly
will be referred to as a ’block’. In thisimplementation,
variables must besampled
from the full conditional distribution of the block. In the
present
modelblocking major
genotypes
of several individuals alleviates theproblems
of poorconvergence and
mixing
properties
causedby
the covariance structure between these variables.Janss et al.
[9]
constructed a block for eachsire, containing
genotypes
of thesire and its final
offspring.
All other individuals weresampled
from their fullconditional distributions. Janss and co-workers showed that exact calculations needed for these blocks are
simple,
and this is the firstapproach
weapply
in the
analysis
of the simulated data.However,
thisblocking
strategy
only
improves
thealgorithm
inpedigree
structures with several finaloffspring.
Inmany
applications only
a few finaloffspring
exist(e.g.
dairy
cattlepedigrees),
and theblocking
calculations become morecomplicated.
Therefore,
the secondapproach
applied
to the simulated data was to extend thebocking
Gibbssampling algorithm
of Jensen et al.[13],
using
agraphical
modelrepresentation
of
genotypes.
Here,
the conditional distributions of allparameters,
other than themajor
genotypes,
are the sameregardless
of whetherblocking
is used ornot.
2.3.1. Sire
blocking
In the sire
blocking
approach,
a block is constructed for each sirehaving
finaloffspring.
The blocks containgenotypes
of the sire and its finaloffspring.
Thisrequires
an exact calculation of thejoint
conditionalgenotypic distribution,
p(w;,
7 Wi(l) i ... ) W
i(
n
(i))
(
l
)),
i
,
i
IW -(
0,
y),
where i is the index of asire,
nidenotes the number of finaloffspring
of sirei,
and the finaloffspring
are indexedby
i(
1
), i(
2
)
’
... ,i(n;)
orsimply
i(1).
By definition,
this distribution isproportional
to
p(w;!W-(;,;(1)), 6, Y)
x)),S,y).
l
(
i
,
i
))lwi,W-(
i
(
n
),I,Wi(
1
(
i
p(w
Here,
the firstgenotypes
of the finaloffspring.
Incalculating
the distribution of the sire’sgenotype,
the threepossible
genotypes
of eachoffspring
are summed over,af-ter
weighting
eachgenotype
by
its relativeprobability.
In thisexpression,
wecondition on the mates and the final
offspring
do not haveoffspring
themselves.Therefore, neighbourhood
individuals that contribute to thegenotype
distri-bution of the sire are still the same as those in the full conditional distribution.Consequently,
the amount of exact calculation needed is linear in the size of the block. The second term is thejoint
distribution of finaloffspring
genotypes
conditional on the sire’s
genotype.
This isequivalent
to aproduct
of fullcondi-tional distributions of final
offspring
genotypes
because these areconditionally
independent, given
genotypes
ofparents.
Even
though
the finaloffspring
with a common sire aresampled jointly
withthis
sire,
theprevious
discussion shows that this isequivalent
tosampling
finaloffspring
from their full conditional distributions. Dams and sires with no finaloffspring
are alsosampled
from their full conditional distributions. This leads to thealgorithm proposed
by
Janss andcolleagues
which will be referred to as’sire
blocking’.
Sires are
sampled
according
toprobabilities:
where
Final(i)
is the set of finaloffspring
of sirei,
andNonFinal(i)
is the set of non-finaloffspring.
Dams are
sampled
according
toequation
(3),
and finaloffspring according
to:Again,
theprobabilities
must be normalised. The sireblocking
strategy
is then constructed as in theprevious
algorithm,
except
thatstep
II isreplaced
by
thefollowing:
if individual i is asire, sample
genotype
fromequation
(7),
followed
by
sampling
of finaloffspring
i(l)
fromequation
(8).
If individual i isa
dam, sample
genotype
fromequation
(3).
2.3.2. General
blocking using graph theory
This
approach
involves a moregeneral
blocking
strategy
by representing
major
genotypes
in agraphical
model. Thisrepresentation
enables the forma-tion ofoptimal blocks,
eachcontaining
themajority
ofgenotypes.
The blocksare formed so that exact calculations in each block are
possible.
These exactcalculations can be used to obtain a random
sample
from the full conditionaldistribution of the block.
denotes the variables of the
Bayesian network,
and e is called ’evidence’.The evidence can contain both the data
(y),
on which V has a causaleffect,
and other known
parameters.
In turn, theposterior
distribution is writtenas the
joint prior
of Vmultiplied by
the conditional distribution of evidence[p(Vle)
(xp(V)p(eIV)].
Jensen et al.
[13]
used theBayesian
networkrepresentation
as the basisof their
blocking
Gibbssampling algorithm
for asingle
locus model. In theirmodel,
V contained the discretegenotypes
and e thedata,
which were assumedto be
completely
determinedby
thegenotypes.
However,
MIMs are morecom-plex,
asthey
contain several variables in addition to themajor
genotypes
(e.g.
systematic
and random environmental effects as well as correlatedpolygenic
ef-fects affect
phenotypes).
Consequently,
therepresentation
of Jensen et al.[13]
cannot be useddirectly
for MIMs.To
incorporate
the extraparameters
of themodel,
a Gibbssampling
algo-rithm is constructed in which the continuous variables
pertaining
to the MIMare
sampled
from their full conditional densities. In each round thesampled
realisations can then be inserted as evidence in the
Bayesian
network. Thisalgorithm
requires
theBayesian
networkrepresentation
ofmajor
genotypes
(V - W),
with data and continuous variables as evidence(e
=b,
u, m,
f,
ae,
6
u, y).
However,
because an exact calculation of thejoint
distribution of allgenotypes
is notpossible,
a small number of blocks(e.g.
B
1
, B
2
, ... ,
B5)
areconstructed,
and for each block aBayesian
networkBN;
is defined. For eachBN;,
let the variables be thegenotypes
in the block V -B¡. Further,
let theev-idence be
genotypes
in thecomplementary
set(Bi
=WBB;),
realised values ofother
variables,
and the data[i.e.
e =(Bi ,
b,
u, m,ae, 6
u,
f,
y)].
TheseBayesian
networks are agraphical
representation
of thejoint
conditional distribution ofall
major
genotypes
within ablock, given
thecomplementary
set,
all othercon-tinuous
variables,
and the data(p(B; !B°,
b,
u, m,f,
6
e,
u
y)).
This isequiva-lent to a
Bayesian network,
where data corrected for the current values of all continuous variables are inserted as evidence[i.e. P
IBi,
i
(B
b,
u, m,f,
ae, (
F
2,
y)
ocp(Bi)
* p(ylw, b, u, m, f, (
y2
,
e
(
y2
)
u
=p(B
i
)
* p(y!Bi , f )!.
The lastterm
isde-scribed as the
penetrance
function underneathequation
(3).
In the
following
sections,
some details of thegraphical
modelrepresentation
are described. This is not intended to be a
complete description
ofgraphical
models,
which is a verycomprehensive
area of which more details can be foundin,
e.g.[14-16].
Thefollowing
is rather meant to focus onoperations
used inthe current work.
2.3.3.
Bayesian
networksA
Bayesian
network is agraphical
representation
of a set of randomvariables,
V,
which can beorganised,
in a directedacyclic
graph
(e.g.
[14]) (figure la).
Agraph
is directed when for eachpair
ofneighbouring
variables,
one variable iscausally dependent
on theother,
but not vice versa. These causaldependencies
between variables arerepresented
by
directed links which connect them. Thegraph
isacyclic if,
following
the direction of the directedlinks,
it is notpossible
to return to the same variable. Variables with causal linkspointing
to v;are denoted as
parents
ofv; [pa(v;)].
Should v; have parents, the conditionalhave no
parents,
this reduces to the unconditionalprior
distributionp(v;).
Thejoint
distribution is writtenp(V) =
n
p( V
i
Ipa( Vi)).
i
In this
study
the variables in the networkrepresent
amajor
genotype,
Wi.The links
pointing
fromparents
tooffspring
represent
probabilities
of allelesbeing
transmitted fromparents
tooffspring.
Therefore,
the conditional distri-butions associated with variables are the Mendeliansegregation probabilities
(
P
(
W
i I
W
,
;
,
Wd
)).
Asimple pedigree
isdepicted
infigure
la as aBayesian
net-work. Fromthis,
it isapparent
that apedigree
ofgenotypes
is aspecial
case of aBayesian
network.In
general,
exactcomputations
among thegenotypes
arerequired.
Forexample,
infigure
la should it berequired
to calculate,
p(w
l
wz,w
5
),
this canbe carried out as:
p(
WI
,
W2
,
W5
) =
E
5
g
,w
W7
,W6
p(w
,W
W4
g
,W2,W
i
).
W3,W4, W 6, W 7, W 8
The size of the
probability
table increasesexponentially
with the number ofgenotypes.
Therefore,
itrapidly
increases to sizes that are notmanageable.
However, by using
the localindependence
structure, recursive factorisation allows us to write the desired distribution as:This is much more efficient in terms of
storage
requirements
and describes thegeneral
ideaunderlying
methods for exactcomputations
ofposterior
distributions inBayesian
networks. When theBayesian
network containsloops,
tables are minimised.
Therefore,
analgorithm
isrequired.
The method of’peeling’ by
Elston and Stewart[4],
andgeneralised
by Cannings
et al.[2],
provides algorithms
forperforming
such calculations withgenetic applications.
However,
for otheroperations
needed in the blocked Gibbssampling algorithm,
peeling
cannot be used.Instead,
we use thealgorithm
of Lauritzen andSpiegelhalter
[16],
which also is based on the above ideas. Thisalgorithm
transforms theBayesian
network into a so-calledjunction
tree.2.3.4. The
junction
treeThe
junction
tree is asecondary
structure of theBayesian
network. Thisstructure
generates
aposterior
distribution that ismathematically
identical tothe
posterior
distribution in theBayesian
network.However, properties
of thejunction
treegreatly
reduce therequired computations.
The desiredproperties
are fulfilled
by
any structure that satisfies thefollowing
definition.Definition 1
(junction
tree).
Ajunction
tree is agraph
of clusters. Theclusters,
also calledcliques,
(Ci,
i =1, n;)
are subsets ofV,
and the union of allcliques
is V:(C
l
UU, ... ,
2
C
U G,
=V).
Thecliques
areorganised
into agraph
with noloops
(cycles),
andby
following
thepath
betweenneighbouring
cliques
it is notpossible
to return to the sameclique.
Between eachpair
ofneighbouring
cliques
is aseparator,
S,
which contains the intersection of the twocliques
(S
12
=C
l
UC
Z
).
Finally,
the intersection ofany two
cliques, C;
andC
j
,
ispresent
in allcliques
andseparators
on theunique path
betweenC;
andC
j
.
2.3.5. Transformation of a
Bayesian
network into ajunction
treeIn
general,
there is nounique junction
tree for agiven Bayesian
network.However,
thealgorithm
of Lauritzen andSpiegelhalter
[16]
generates
ajunction
tree for any
Bayesian
network with theproperty
that thecliques generally
be-come as small as
possible.
This isimportant
as smallcliques
make calculationsmore efficient. In the
following
section,
we introduce some basicoperations
of thatalgorithm, transforming
theBayesian
network shown infigure
la into ajunction
tree.The network is first turned into an undirected
graph, by
removing
the directions of the links. Links are then added betweenparents.
The added links(seen
infigure
1 b as the dashedlinks)
are denoted ’morallinks’,
and theresulting
graph
is called the ’moralgraph’.
The nextstep
is to’triangulate’
thegraph.
Ifcycles
oflength
greater
than threeexist,
and no other links connect variables in thatcycle,
extra ’fill-in links’ must be added until no suchcycles
exist. After links are added between
parents,
as shown infigure
1,
there is acycle
oflength
four which contains the variables w2, , w5W7 and w6. An extra fill-in link must be added either between w2and W7or as shown with the thicklink between W5 and ws.
Finally,
from thetriangulated
graph,
thejunction
treeis established
by identifying
all’cliques’.
These are defined as maximal sets of variables that are allpairwise
linked. In otherwords,
a set of variables thatare all
pairwise
connectedby
links must be in the sameclique.
Thesecliques
must be
arranged
into agraph
with noloops,
in such a way, that for eachpair
C
j
contain the intersectionC;
f1C
j
.
Thisrequirement
ensures that variables inC;
andC
j
areconditionally independent, given
variables on thepath
betweenthem.
2.3.6. Exact
computations
in ajunction
treeTo
perform
exactcalculations,
thejunction
tree is initialisedby
con-structing
belief tables for allcliques
(B(C
i
),...,B(C’
n
(
c
)))
andseparators
(B(5*i),...
,-B(6n(s)))-
Each belief table conforms to thejoint probability
dis-tribution of variables in thatclique
orseparator,
and contains the current belief of thejoint posterior
distribution of these variables. This is also called the beliefpotential
of thesecliques/separators.
Forexample,
B(C
l
)
represents
p( c
l
ly)
infigure
2a. In thefollowing
we assume that individual 8 infigure
1 ahas a
phenotypic
record.Then,
the belief tables are initialisedby
firstsetting
all entries in each belief table to one. Prior
probabilities
of variableswith-out
parents
are thenmultiplied
ontoexactly
onearbitrarily
chosenclique
inwhich the variable is
present.
Finally,
the conditionalprobabilities
of variables withparents
aremultiplied
onto theunique clique
which contains thatvari-able and its
parents.
Following
thisprocedure,
thejunction
tree infigure
Ic ccould be initialised as follows:
C
l
=,
)
W5
,
W2
Wl
(
is initialised withB(c
I
) =
p(wi)p(w2)p(w5Wi,W
2
),C*2
=(w
2
,W
5
,w
e
)
is initialised with all ones forB(c
2
),
3
c
=(w
2
,w
g
,we)
is initialised with)
B(C
3
=p(w
3
)p(we!w
2
,w
g
),
C
4
=(w
4
,
W
g,
W7
)
is initialised with) #
4
B(C
P(W
4
)P(W
7l
W
4
,W
5
),C
5
=(
W5
,
W6
,
W7
)
is initialised with all ones forB(C
5
),
C
6
=(w
s
, w
7
, w
s
)
is ini-tialised withB(C
6
)
=),
s
w
, W7 )p(
Ysl
WSIW6
(
P
andseparators
are initialisedwith all ones. After
having
initialised thejunction
tree in this way, we note that theproduct
of the beliefpotentials
for allcliques
isequal
to thejoint
posterior
distribution of all variables:The
general
rule of thisproperty
isgiven by:
2.3.7. Junction tree
propagation
The initialisation described in the
previous
section isarbitrary
in the sensethat
p(w
2
)
could have beenmultiplied
ontoB(C
2
)
instead ofB (Ci ) .
Therefore,
the belief tables do not at this
point
reflect theknowledge
of variables in thewith the information on all other variables.
Propagation
injunction
trees is ameans of
updating
the belief with such information.This
updating
isperformed by
means of anoperation
called’absorption’.
This has the effect of
propagating
information betweenneighbouring cliques.
For
example,
if information ispropagated
fromB(C
6
)
toB(C
5
)
as infigure
2,
B(C
5
)
is said to absorb fromB(C
6
),
or,equivalently, C
6
is said to send amessage to
C
5
.
Theabsorption operation
consists of thefollowing
calculation:B
*
(C
5
)
=B(CS) B*((! ))’
whereB
*
(S
5
) =
C6BS5
B(C
6
).
Theabsorption
canB(55 )
c!js!
06 BS5be
regarded
asupdating
the beliefpotential
of,
W5
p(
W6,W7) (B (W5,
W6,W7
))
with information on the belief
potential
of p(w
6
,
w7,W6
,
w
)(B(
S
w7,w8 ) ) .
Thisis
accomplished by
firstfinding
the conditional belief of variables inC
5
given
variables in
C
6
by
B(
W5I
w
6’
W7
)
w6
) .
w7
’
=
’
w5
(
B
Thejoint
belief of variablesj3(we,W7)
in
C
5
is thenupdated
with new information from6
C
by
B
*
(w
s
,W
e
,W
7
) =
B*(we,W7)B(w5!W6,W7),
whereB
*
(we,W
7
,Wg).
The
junction
tree is invariant to theabsorptions.
This means that after anabsorption, equation
(11)
is still true.The
object
is now toperform
a sequence ofabsorptions.
In thisstudy,
junction
tree towards asingle
clique.
Consequently,
calling
collect evidencefrom any
clique
results in the belief tablebeing
equivalent
to thejoint
pos-terior distribution of the variables it contains. As anexample,
figure
2 showsthat
calling
collect evidence fromC
r
results in:B (C
I
)
ex:1
,
P
W
(
W2,w
5I
ys),
and the order ofabsorptions
is established as follows.First,
c
I
requests
to absorbinformation from its
neighbours
(C
2
).
However,
thisoperation
isonly
allowed ifC
Z
hasalready
absorbed from all its otherneighbours
(C
3
andC
5
).
Since this is not the case,C
2
willrecursively
request
forabsorption
from thesecliques.
This isgranted
forC
3
,
butC
5
still has not absorbed fromC
4
andC
6
,
whichit
requested.
This isfinally granted,
and theabsorptions
can beperformed
inthe order illustrated in
figure
2a.Distribute evidence from
C
r
infigure
2b isperformed by allowing
c
I
to senda message to all its
neighbours.
When aclique
has received a message it willsend a new message to all of its
neighbours,
except
to theclique
it hasjust
received a message from. In ourexample
the order of messages(absorptions)
is illustrated in
figure
2b.If ’collect evidence’ is followed
by
the routine ’distribute evidence’ from thesame
clique,
then for anyclique
B(C;)
ccp(c¡Jy) [15].
This is a very attractiveproperty
because it is thenpossible
to find themarginal posterior density
of anyvariable, by summing
other variables out of anyclique
in which it ispresent,
rather than
summing
all other variables out of thejoint
distribution.2.3.8.
Example
of exact calculationsAn
example
isprovided
in this section to illustrate therelationship
betweenexact calculations with or without the use of the
junction
treerepresentation.
Should we want tocompute
themarginal
posterior
probability
distributionP
(
Wl
I
Y8
),
this can be carried outdirectly using
standard methods ofprobability:
However,
theindependence
structure betweengenotypes
allows for recursivefactorisation:
and we can write
equation
(9)
as:The
junction
treealgorithm
is then used as follows.First,
thejunction
treeevidence is then called from
C
l
to calculatep(Cl !y8).
Asalready described,
this call consists of the series ofabsorptions
ordered as illustrated infigure
2a.The
corresponding
calculations are as follows.First, absorptions
indicatedby
1in
figure 2a:
B* (
C
5 )
=4
>
(s
B*
B
*
(s
5
)
where B*(54)
#£
B(C4)
and infigure
2a:B*(c5)
=B(C5)
B(S44)
B (S5)
whereB * (S4)
L B(c4)
and!(64)
B(55) !*
( ! )
W4*
B
*
(S
5
) =
£ B(C8 )
andB
*
(C
2
)
=B (C
2
)
B(S22) ,
whereB
*
(S
2
) =
L
B(c
3
).
!—’
W8±i)J2)
’ ’!—’
W3
B*(S )
BSecond, absorptions
indicatedby
2 infigure
2a:)
2
(C
**
B
=B
(C
2
)
B
*
(
S3
)
B(S3)
where
B
*
(S
3
) =
!B*(C5).
Finally, absorptions
indicatedby
3 infigure
2a:W 7
After collect evidence has been
completed,
p(w
]
jy)
can be foundby
p(w
i
]y)
ccL
B
*
(C
l
).
Writing
these calculationstogether,
andsubstituting
W 2 W S
the initial
probabilities
(without
the tables of allones),
we obtain:This is
exactly
the same calculation asequation
(10),
which illustrates thatjunction
treepropagation
isbasically
a method toseparate
calculations intosmaller
steps,
and to arrange the order ofthese,
such that the correct result isobtained.
2.3.9. Random
propagation
Another
propagation
algorithm,
which relies on thejunction
treestructure,
is ’randompropagation’,
developed
by
Dawid[3].
This methodprovides
arandom
sample
from thejoint posterior
distribution of all variables in theBayesian network,
p(V!e).
Randompropagation
is initialisedby calling
collect evidence from anarbitrarily
chosenclique C
o
.
As mentionedpreviously,
this results inB(C
o
)
being
equal
to thejoint posterior
distribution of variables contained inC
o
,
(-B(C
o
)
ccP(Co!e)). B(C
o
)
is then used tosample
the variablesneighbouring cliques
(C.),
by absorption
fromC
o
toC
n
.
The belief tables ofe
n
will then beproportional
to thejoint posterior
distribution of variables contained in thegiven cliques,
conditional on the variablesalready
sampled.
That
is,
B(C
n
)
ccp(C’!C’o,e)
=p(C
nB
{Co
nCn}!Co,e).
Afternormalisation,
variables ofC
nB
{C
o
flC
n
}
aresampled,
andabsorptions
areperformed
to theirneighbouring cliques.
Sampling
andsending
messages is continued in thismanner until the entire network is
sampled.
The order in whichsampling
isperformed
follows the order of messages in distribute evidence(figure
2b).
Inour
genetic example,
we can first collect evidence toC
l
.
Performing
the randompropagation
algorithm
then involvessampling
from thefollowing
distributions:2.3.10.
Creating
blocksby
conditioning
The method of random
propagation
of Dawid[3]
can be used to obtain arandom
sample
of all variables from theirjoint
posterior
distribution,
p(V!e),
or
equivalently
p(wlb,
u, m,f,
(7 e 2,G2, u
y).
However,
if theBayesian
network islarge
andcomplex,
thecliques
of thejunction
tree may contain many variables. This is aproblematic
scenario,
as dimensions of thecorresponding
belief tablesare
exponential
in the number of variables thecliques
contain.Therefore,
thestorage
requirements
ofjunction
trees may becomeprohibitive, preventing
theperformance
of theoperations
described earlier. If this were the case, it would not bepossible
to obtain a randomsample
from thejoint
distribution of all variables.Conditioning
on a variable allows a newBayesian
network to bedrawn,
where the variable conditioned on isseparated
into several clones. This will often breakloops
in thenetwork,
as illustrated infigure
3. Whenloops
are
broken,
fewer fill-in links are needed to render thegraph
triangulated,
andconsequently,
fewer and smallercliques
are created. It follows that thestorage requirements
of thecorresponding junction
tree are smaller and randompropagation
can beperformed.
Theconcept
ofconditioning
is illustrated infigure
3,
where two differentvariables,
W5 and w7, are conditioned on. Theresulting Bayesian
networks are illustrated infigure
3a and c, and the reducedjunction
trees are illustrated infigure
3b and d. Thiscorresponds
to the creationof two
blocks,
B
I
={wiW
2
WgW
4
W
6
W
7
Wg}
and2
B
=81
W
C
W
5
W
W
4
3
W
2
W
l
fW
-The reducedjunction
trees demonstrate thatstorage
requirements
of thejunction
trees arereduced,
becauseloops
in theoriginal
Bayesian
network arebroken. This occurs as the
junction
trees contain fewercliques
and some ofthe
cliques
contain fewer unknown variables. Infigure
3b and d variables with lettersubscripts
are assumed known. It is easy to see that a verylarge
junction
tree can be reduced
sufficiently
withrespect
tostorage
constraints,
if many variables are conditioned on.Blocks were created
by
choosing
variablesthrough
thefollowing
iterativesteps.
First,
the variableyielding
thehighest
reduction instorage
requirements
when conditioned on was identified.
Second,
this variable was conditioned on incalculated. If the
storage requirements
werelarger
than what wasavailable,
the
steps
wererepeated
finding
the next variable to condition on. This wascontinued until all blocks satisfied the
storage
constraints of thesystem.
All variables were, of course,required
to be in at least one block. Jensen[10]
provides
more information on the block selectionalgorithm.
Blocks
B
l
, ... ,
B
n
are constructed such that each contains most of the vari-ables of the network. Thecomplementary
sets(i.e.
the variables that arecondi-tioned
on)
are calledBc, ... , B’.
In this way, eachset, B;
UB! ,
contains all themajor genotypes
in thepedigree.
As thejunction
tree of each block can nowbe stored in the
computer,
exact inference can beperformed,
and ajoint
sam-ple
of all variables in the block can be obtainedusing
the randompropagation
method.
Therefore, using
the described form ofblocking,
we can obtain randomsamples
from thejoint
distribution of ablock,
conditional on thecomplemen-tary
set of other variables in the MIM anddata,
p(B; !B! ,
b,
u, m,ae, aU,
f,
y).
In the
MIM,
the Gibbssampling algorithm using
general blocking
can thusbe summarised as follows: