Blocking Gibbs sampling in the mixed inheritance model using graph theory

(1)

HAL Id: hal-00894251

https://hal.archives-ouvertes.fr/hal-00894251

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Blocking Gibbs sampling in the mixed inheritance model

using graph theory

Mogens Sandø Lund, Claus Skaanning Jensen

To cite this version:

(2)

Original

article

Blocking

Gibbs

_sampling

in the mixed

inheritance model

_{using graph theory}

Mogens

Sandø

Lund

Claus

_Skaanning

Jensen

a

DIAS,

Department of

Breeding

and

_Genetics,

Research Centre

Foulum,

P.O. Box 50, 8830

_Tjele,

Denmark

b

AUC,

Department

of

_{Computer Science,}

Fredrik

_{Bajers Vej 7E.,}

’

9220

_{Aalborg 0,}

Denmark

(Received

10

_February

_1998;

_accepted

18 November

₁₉₉₈₎

Abstract - For the mixed inheritance model

_(MIM),

_including

both a

_single

locus

and a

_{polygenic effect,}

we _presenta Markov chain Monte Carlo

(MCMC)

_algorithm

in which discrete _genotypesof the

single

locus are

sampled

in

large

blocks from their

joint

conditional distribution. This

_requires

exact calculation of the

joint

distribution of a

_{given block,}

which can _{be very}

_complicated.

Calculations of the

_joint

distributions

were obtained

using graph

theoretic methods for

Bayesian

networks. An

example

of a

simulated

_pedigree

_suggeststhat this

_algorithm

is more efficient than

_algorithms

with

univariate

_updating

or

_{algorithms using blocking}

of sires with their final

_offspring.

The

_algorithm

can be extended to models

_{utilising genetic}

marker

_information,

in

which case it holds the

_potential

to solve the critical

_{reducibility problem}

of MCMC methods often associated with such models.

_©

_{Inra/Elsevier,}

Paris

blocking / Gibbs sampling

/

mixed inheritance model

_/

graph theory

/

Bayesian

network

Résumé -

_{Échantillonnage}

_{de Gibbs par bloc dans le modèle à hérédité mixte}

en utilisant la théorie des _graphes.Pour le cas de l’hérédité mixte

_(un

seul locus

avec un fond

_{polygénique),}

on

_présente

un

_algorithme

de Monte-Carlo _parchaînes

de Markov

_(MCMC)

dans

_lequel

les

_génotypes

au locus _uniquesont échantillonnés

en blocs

_importants

à

_partir

de leur distribution

_jointe

conditionnelle. Ceci

_exige

le

calcul exact de distribution

_conjointe

d’un bloc donné

_qui

_peutêtre très

_compliquée.

Le calcul des distributions

jointes

est obtenu en utilisant des méthodes

graphiques

théoriques

pour les réseaux

bayésiens.

Un

exemple

de

_pedigree

simulé

_suggère

_que

cet

_algorithme

est

_plus

efficace _queles

_algorithmes

à mise à

_jour

univariants ou _par

groupes de descendance issue de même

père.

Cet

algorithme

peut être étendu à des

*

Correspondence

and reprints

(3)

modèles utilisant l’information de marqueurs

génétiques

ce

_qui

_permetd’éliminer le

_risque

de réductibilité souvent associé à de tels modèles

_quand

on

_applique

des méthodes MCMC.

_©

_{Inra/Elsevier,}

Paris

blocage

/

échantillonnage de Gibbs

_/

modèle à hérédité mixte / théorie des

graphes

/

réseau _bayésien

1. INTRODUCTION

In mixed inheritance models

_(MIM),

it is assumed that

_phenotypes

are influenced

_by

the

_genotypes

at a

_single

locus and a

polygenic

_component

[19].

Unfortunately,

it is not feasible to maximise the likelihood function associated with such models

_{using analytical techniques.}

Even in the case of

single

gene models without

polygenic

effects,

the need to

_marginalise

over the

distribution of the unknown

_single

_genotypes

results in

_computations

which

are not feasible. For this reason, Sheehan

[20]

used the local

independence

structure of

_genotypes

to derive a Gibbs

_sampling

_algorithm

for a one-locus

model. This

_technique

circumvented the need for exact calculations in

_complex

joint genotypic

distributions as the Gibbs

_{sampler only requires}

_knowledge

of the full conditional distributions.

Algorithms

for the more

_complex

MIMs were later

_{implemented using}

either

a Monte Carlo EM

_algorithm

_[8],

or a

fully

_{Bayesian approach}

[9]

with the Gibbs

sampler. However,

Janss et al.

_[9]

found that the Gibbs

_sampler

had _{very poor}

mixing properties owing

to a

_strong

_dependency

between

_genotypes

of related individuals.

_They

also noticed that the

_sample

_spacewas

_effectively

_partitioned

into

_subspaces

between which movement occurred with low

_probability.

This occurred because some discrete

genotypes

rarely

changed

states. This is known

as

_practical

_{reducibility.}

Both the

_mixing

and

_reducibility

_properties

are

_vastly

improved

by

sampling

genotypes

jointly.

Consequently,

Janss et al.

_[9]

_applied

a

_blocking

_strategy

with the Gibbs

_sampler,

in which

_genotypes

of sires and their final

_offspring

_{(non-parents),}

were

_sampled

_{simultaneously}

from their

_joint

distribution

_{(sire blocking).}

This

_blocking

_strategy

made it

_simple

to obtain exact calculations of the

_joint

distribution and

_improved

the

_mixing

_properties

in data structures with _manyfinal

_offspring.

_However,

the

_blocking

_strategy

of Janss and co-workers is not a

_general

solution to the

_problem

because final

offspring

may constitute

only

a small fraction of all individuals in a

_pedigree.

An extension of another

_blocking

Gibbs

_{sampler developed by}

Jensen et al.

_[13]

could

provide

a

_general

solution to MIMs. Their

sampler

was for

one-locus

_models,

and

_sampled

_genotypes

of _manyindividuals

_jointly,

even when the

pedigree

was

_complex.

The method relied on a

_graphical

model

_{representation}

and treated

_genotypes

as variables in a

_Bayesian

network. This results in a

graphical representation

of the

_joint

_probability

distribution for which efficient

algorithms

to

_perform

exact inference exist

_(e.g.

_[16]).

_However,

a constraint of

the

_blocking

Gibbs

_{sampler developed by}

Jensen and co-workers is that it

_only

handles discrete

_variables,

and in turn cannot be used in MIMs.

The

_objective

of this

_study

is to extend the

_blocking

Gibbs

_sampler

of Jensen et al.

_[13]

such that it can be used in MIMs. A simulated

example

is

_presented

to

illustrate the

_practicality

of the

_proposed

method. The data from the

_example

(4)

2. MATERIALS AND METHODS

2.1. Mixed inheritance model

In the

_{MIM, phenotypes}

are assumed to be influenced

_by

the

_genotype

at a

single

major

locus and a

polygenic

effect. The

polygenic

effect is the combined

effect of many additive and unlinked

loci,

each with a small effect. Classification

effects

_(e.g.

_herd,

_yearor other

covariates)

can

_easily

be included in the model.

The statistical model for a MIM is defined as:

where y is a

(n

*

1)

vector of n

observations,

b is a

(p

*

1)

vector

_{of p}

classification

effects,

u is a

(q

*

1)

vector

_of

_{q random}

_polygenic

_effects,

m is a

(3

*

1)

vector

of

genotype

effects and e is a

(n

*

1)

vector of n random residuals. X is a

(n

*

r)

design

matrix

_associating

data with the ’fixed’

_effects,

and Z a

(n

*

q)

design

matrix

_associating

data with

_polygenic

and

_single

_geneeffects. W is an

unknown

_(q

_*

₃₎

random

design

matrix of

genotypes

at the

_single

locus.

Given location and scale

_parameters,

the data are assumed to be

_normally

distributed as

where

₆

_e

_is

the residual variance. For

_{polygenic effects,}

we invoke the

infinites-imal additive

_genetic

model

_[1],

_resulting

in

_normally

distributed

_polygenic

effects,

such that

where A is the known additive

relationship

matrix

_describing

the

_family

relations between

_individuals,

and

₆

_u

_is

the additive variance of

_polygenic

effects.

The

_single

locus was assumed to have two alleles

_(A

₁

and

_A

_z

_),

such that each individual had one of the three

_possible

_genotypes:

_A

_l

_A

_I

_,

₂

_A

_l

and

_A

_Z

_A

₂

_.

For

each individual in the

_pedigree,

these

_genotypes

were

represented

as a random

vector,

w;,

taking

values

(100),

(O10)

or

(001).

The vectors w; form the

rows of W and will for notational convenience be referred to as _co_l_,_w₂ and

0 ) 3

. For individuals which do not have known

parents

(i.e.

founder

_individuals)

the

_probability

distribution of

_genotype

w; was assumed to be

_P

₍

_Wi

_If).

The distribution for

_genotype

_frequency

of the base

population

( f ),

was assumed to

follow

_{Hardy-Weinberg}

_proportions.

For individuals with known

parents,

the

genotype

distribution is denoted as

p(w;!ws;re(;), w

aan

,(;)).

This distribution

describes the

probability

of alleles

_constituting

_genotype

_w;,

_being

transmitted from

_parents

with

_{genotypes W}

_s

_ire(i)

and

_W

_dam

_(i)

when

_segregation

of alleles follows Mendelian transmission

_{probabilities.}

For individuals with

_only

one

known _parent,a

dummy

individual is inserted for the

missing

_parent.

(5)

where W =

(w

l

, ... ,

w

n

),

F is the set of

founders,

and NF is the set of

non-founders.

To

_{fully specify}

the

_{Bayesian model, improper}

uniform

_priors

were used

for the fixed and

genotypic

effects

_{[i.e. p(b)}

oc

_constant,

p(m)

oc

constant].

Variance

_components

_(i.e.

₆

_e

_and

_au)

_were

assumed a

_priori

to be

_independent

and to follow the

_conjugate

inverted _gammadistribution

_(i.e.

_1/

₆₂

has the

prior

distribution of a _{gamma random variable with}

_parameters

a; and

(3

i

).

The

_parameters

a; and

(3

i

can be chosen so that the

_prior

distribution has

any desired mean and variance. The

conjugate

Beta

prior

was used for allele

frequency

(p(f) - Beta(a

f

,

(3

f

) ).

The

_{joint posterior}

_density

of all model

parameters

is

_proportional

to the

product

of the

_prior

distributions and the conditional distribution of the

data,

given

the

_parameters:

2.2. Gibbs

_sampling

For

_{Bayesian inference,}

the

_marginal

_posterior

distribution for the

param-eters of the model is of interest. With MIMs this

_requires

_high

dimensional

integration

and summation of the

joint posterior

distribution

_(1),

with cannot

be

_expressed

in closed form. To

_perform

the

_integration

_numerically

using

the Gibbs

_{sampler requires}

the construction of a Markov chain which has

₍₁₎

(nor-malised)

as its

_stationary

distribution. This can be

_accomplished

by

defining

the transition

_{probabilities}

of the Markov chain as the full conditional

distribu-tions of each model

parameter.

Samples

are then taken from these distributions

in an iterative scheme. Each time a full conditional distribution is

visited,

it is

used to

_sample

the

_{corresponding variable,}

and the realised value is substituted into the conditional distribution of all other variables

_(see,

e.g.

[5]).

Instead of

updating

all variables

univariately

it is also

_possible

to

_sample

several variables from their

_joint

conditional

_posterior

distribution. Variables that are

sampled

jointly

will be referred to as a ’block’. As

long

as all variables are

_sampled,

the new Markov chain will still have

equation

(1)

as its

_stationary

distribution.

2.2.1. Full conditional

_posterior

distributions

Full conditional distributions were derived from the

joint posterior

distri-bution

_(1).

The

_resulting

distributions are

presented

later. These distributions

(6)

2.2.2. Location

_parameters

Hereafter,

the restricted additive

_major

gene model will be

assumed,

such

that m’ =

_(-a,

_0,

_a)

or m =

la,

where 1’ =

(-1, 0, 1)

and a is the additive effect

of the

_major

locus _gene.

_Allowing

for

_genotypic

means to _vary

_{independently}

or

_including

a dominance effect entails no

_difficulty.

The gene effect

(a)

is considered a classification effect when

_conditioning

on

_major

_genotypes

(W)

and the

_genetic

model at the locus.

_{Consequently,}

the location

_parameters

in the model are

6’

=

[b’,

_a,

u’].

Let,

H =

[X:ZWI:Zj,

Q =

0 A

_{!i, ,}

k =

₍

_y2

_/(

_y2

_,

C =

[H’H

₊

S2],

and _m= la. The

_posterior

dis-10 A-lk I

I e u

tribution of location effects

_(0),

_given

the variance

_components,

_major

geno-types

(W)

and data

_(y)

is

_(following

_[17]):

Then, using

standard results from multivariate normal

_theory

_(e.g.

_[18]

or

[22]),

the full conditional distributions of the

parameters

in 0 can be written

as:

C

ii

is the ith

_diagonal

element of

C, C-

i

is the ith row of C

_{excluding C}

ii

,

and

H

i

is the ith column of H.

2.2.3.

_Major

_genotypes

The full conditional distribution of a

_given

_genotype,

_w;,is found

_by

extracting

from

_equation

₍₁₎

the terms in which w; is

present.

The

_{probabilities}

are here

given

up to a constant of

_{proportionality}

and must be normalised to

3

ensure that

L

P

(

Wi

=

Mj

)

= 1. The full conditional distribution of

_genotype

j = l W

i is:

where

_lief

and

_li

_ENF

are indicator

functions,

which are 1 if individual i is

contained in the set of founders

_(F)

or non-founders

(NF),

respectively,

and

0 otherwise.

_O

_f

₍

_i

₎

is the set of

_offspring

of individual

_i,

such that

_i(k)

is the kth

_offspring

of i

_resulting

from a

mating

with mate

(i(k)).

The _terms,

P(

W

i

=

W

i IP

1 )’I

EF

+ P(

W

i

=

(7)

of individual i

_receiving

alleles

corresponding

to

_genotypes

_WI_,_W2 or _W3_,and the

product

over

_offspring

_represents

the

probability

of individual i

transmitting

alleles in the

_genotypes

of the

_offspring,

which are conditioned _upon.If

individual i has a

phenotypic record,

the

adjusted

record

j,

=

y; -

X;b - Z;u

contributes the

_penetrance

function:

where

_X;

and

_Zi

are the ith rows of the matrices X and Z.

2.2.4. Allele

_frequency

Conditioning

on the

sampled

_genotypes

of founder individuals results in

con-tributions of

_f

for each

A

l

sampled

and

_{(1- f )}

for each

A

2 sampled.

This is

be-cause the

sampled

genotypes

are realisations of the 2n

independent

Bernoulli( f )

random variables used as

_priors

for base

population

alleles.

Multiplying

these

contributions

_by

the

_prior

_Beta(a

_f

_,

₍₃

_f

₎

_gives

where nA, and nA2 are the numbers of

A

1

and

A

2

alleles in the base

population.

The

_specified

distribution is

_proportional

to a

_Beta(a

_f

+ &dquo; nA

(3

f

+

_n

_A2

₎

distri-bution.

_Taking

af =

₍₃

_f

=

_1,

the

_prior

_onthis

_parameter

is a _properuniform

distribution.

2.2.5. Variance

_components

The full conditional distribution of the variance

_component

_{au is}

which is

_proportional

to the inverted gamma distribution:

Similarly,

the full conditional distribution of the variance

_component

₆

_e

_is

(8)

The

_algorithm

based on univariate

_updating

can be summarised as follows:

I. initiate

_{0, W,}

_f,

₆

_{u, ae,}

with

_legal

starting values;

II.

_{sample major}

_genotypes

w; from

equation

(3)

for i =

{ 1, ... , q};

III.

_sample

allele

_frequency

from

_equation

_(4);

IV.

_sample

location

_{parameters 6;}

_{(classification}

effects and

_polygenic

_effects)

univariately

from

_equation

_(2),

for i =

_{1,

_dimension

_S

_};

V.

_sample

₆

_u

_from

_equation

_(5);

VI.

_sample

₆

_e

_from

_equation

_(6);

VII.

_repeat

II-VI.

Steps

II-VI constitute one iteration. The

_system

is

_initially

monitored until sufficient evidence for convergence is observed.

Subsequently,

iterations are

continued,

and the

_sampled

values

_saved,

until the desired

_precision

of features of the

posterior

distribution has been achieved. The

_mixing

_diagnostic

used is

described in a later section.

2.3.

_{Blocking strategies}

A more efficient alternative to the univariate

_updating

of variables is to

update

a set of variables

_{multivariately.}

Variables

_{updated jointly}

will be referred to as a ’block’. In this

_{implementation,}

variables must be

_sampled

from the full conditional distribution of the block. In the

present

model

blocking major

genotypes

of several individuals alleviates the

_problems

of _poor

convergence and

mixing

properties

caused

_by

the covariance structure between these variables.

Janss et al.

_[9]

constructed a block for each

_{sire, containing}

_genotypes

of the

sire and its final

_offspring.

All other individuals were

_sampled

from their full

conditional distributions. Janss and co-workers showed that exact calculations needed for these blocks are

_simple,

and this is the first

_approach

we

_apply

in the

_analysis

of the simulated data.

_However,

this

_blocking

_strategy

_only

improves

the

_algorithm

in

_pedigree

structures with several final

_offspring.

In

many

applications only

a few final

_offspring

exist

_(e.g.

_dairy

cattle

_pedigrees),

and the

_blocking

calculations become more

complicated.

Therefore,

the second

approach

applied

to the simulated data was to extend the

_bocking

Gibbs

sampling algorithm

of Jensen et al.

_[13],

_using

a

graphical

model

_{representation}

of

_genotypes.

_Here,

the conditional distributions of all

parameters,

other than the

_major

_genotypes,

are the same

_regardless

of whether

_blocking

is used or

not.

2.3.1. Sire

blocking

In the sire

_blocking

approach,

a block is constructed for each sire

having

final

offspring.

The blocks contain

_genotypes

of the sire and its final

_offspring.

This

requires

an exact calculation of the

_joint

conditional

_{genotypic distribution,}

p(w;,

7 Wi(l) i ... ) W

i(

n

(i))

(

l

)),

i

,

i

IW -(

0,

y),

where i is the index of a

_sire,

nidenotes the number of final

_offspring

of sire

_i,

and the final

_offspring

are indexed

_by

i(

1 ), i(

2 )

’

... ,

i(n;)

or

simply

i(1).

By definition,

this distribution is

proportional

to

_{p(w;!W-(;,;(1)), 6, Y)}

x

_)),S,y).

_l

₍

_i

_,

_i

_))lwi,W-(

_i

₍

_n

_),I,Wi(

₁

₍

_i

_p(w

_Here,

the first

(9)

genotypes

of the final

_offspring.

In

_calculating

the distribution of the sire’s

genotype,

the three

_possible

_genotypes

of each

_offspring

are summed _over,

af-ter

_weighting

each

_genotype

_by

its relative

_probability.

In this

_expression,

we

condition on the mates and the final

_offspring

do not have

_offspring

themselves.

Therefore, neighbourhood

individuals that contribute to the

_genotype

distri-bution of the sire are still the same as those in the full conditional distribution.

Consequently,

the amount of exact calculation needed is linear in the size of the block. The second term is the

_joint

distribution of final

_offspring

genotypes

conditional on the sire’s

_genotype.

This is

_equivalent

to a

_product

of full

condi-tional distributions of final

_offspring

_genotypes

because these are

_{conditionally}

independent, given

genotypes

of

_parents.

Even

_though

the final

_offspring

with a common sire are

_{sampled jointly}

with

this

_sire,

the

_previous

discussion shows that this is

_equivalent

to

_sampling

final

offspring

from their full conditional distributions. Dams and sires with no final

offspring

are also

sampled

from their full conditional distributions. This leads to the

_{algorithm proposed}

_by

Janss and

_colleagues

which will be referred to as

’sire

_blocking’.

Sires are

_sampled

_according

to

_{probabilities:}

where

_Final(i)

is the set of final

_offspring

of sire

_i,

and

_NonFinal(i)

is the set of non-final

_offspring.

Dams are

_sampled

_according

to

_equation

_(3),

and final

_{offspring according}

to:

Again,

the

_{probabilities}

must be normalised. The sire

_blocking

_strategy

is then constructed as in the

_previous

_algorithm,

_except

that

_step

II is

_replaced

by

the

_following:

if individual i is a

_{sire, sample}

_genotype

from

_equation

_(7),

followed

by

sampling

of final

_offspring

_i(l)

from

_equation

_(8).

If individual i is

a

_{dam, sample}

_genotype

from

_equation

_(3).

2.3.2. General

_{blocking using graph theory}

This

_approach

involves a more

_general

_blocking

_strategy

_{by representing}

major

genotypes

in a

_graphical

model. This

_{representation}

enables the forma-tion of

_{optimal blocks,}

each

_containing

the

_majority

of

_genotypes.

The blocks

are formed so that exact calculations in each block are

_possible.

These exact

calculations can be used to obtain a random

_sample

from the full conditional

distribution of the block.

(10)

denotes the variables of the

_{Bayesian network,}

and e is called ’evidence’.

The evidence can contain both the data

_(y),

on which V has a causal

_effect,

and other known

_parameters.

In _turn,the

_posterior

distribution is written

as the

_{joint prior}

of V

_{multiplied by}

the conditional distribution of evidence

[p(Vle)

(x

p(V)p(eIV)].

Jensen et al.

_[13]

used the

_Bayesian

network

_{representation}

as the basis

of their

_blocking

Gibbs

_{sampling algorithm}

for a

_single

locus model. In their

model,

V contained the discrete

_genotypes

and e the

data,

which were assumed

to be

_completely

determined

_by

the

_genotypes.

_However,

MIMs are more

com-plex,

as

_they

contain several variables in addition to the

_major

_genotypes

_(e.g.

systematic

and random environmental effects as well as correlated

_polygenic

ef-fects affect

_phenotypes).

_{Consequently,}

the

_{representation}

of Jensen et al.

_[13]

cannot be used

_directly

for MIMs.

To

_incorporate

the extra

parameters

of the

_model,

a Gibbs

_sampling

algo-rithm is constructed in which the continuous variables

_pertaining

to the MIM

are

sampled

from their full conditional densities. In each round the

sampled

realisations can then be inserted as evidence in the

Bayesian

network. This

algorithm

requires

the

_Bayesian

network

_{representation}

of

_major

_genotypes

(V - W),

with data and continuous variables as evidence

(e

=

b,

u, m,

f,

ae,

6 u, y).

However,

because an exact calculation of the

_joint

distribution of all

genotypes

is not

_possible,

a small number of blocks

_(e.g.

_B

₁

_{, B}

₂

_{, ... ,}

_B5)

are

constructed,

and for each block a

_Bayesian

network

BN;

is defined. For each

BN;,

let the variables be the

_genotypes

in the block V -

_{B¡. Further,}

let the

ev-idence be

_genotypes

in the

complementary

set

_(Bi

=

WBB;),

realised values of

other

_variables,

and the data

_[i.e.

e =

(Bi ,

b,

_u,_m,

ae, 6

u,

f,

y)].

These

Bayesian

networks are a

_graphical

_{representation}

of the

_joint

conditional distribution of

all

_major

_genotypes

within a

block, given

the

complementary

_set,

all other

con-tinuous

_variables,

and the data

_{(p(B; !B°,}

_b,

_u,_m,

_f,

₆

_e,

_u

_y)).

This is

equiva-lent to a

_{Bayesian network,}

where data corrected for the current values of all continuous variables are inserted as evidence

[i.e. P

IBi,

i

(B

b,

_{u, m,}

f,

ae, (

F

2,

y)

oc

p(Bi)

* p(ylw, b, u, m, f, (

y2

,

e

(

y2

)

u

=

p(B

i

)

* p(y!Bi , f )!.

The last

term

is

de-scribed as the

_penetrance

function underneath

_equation

_(3).

In the

_following

_sections,

some details of the

graphical

model

_{representation}

are described. This is not intended to be a

_{complete description}

of

_graphical

models,

which is a _very

_{comprehensive}

area of which more details can be found

in,

e.g.

[14-16].

The

following

is rather meant to focus on

_operations

used in

the current work.

2.3.3.

_Bayesian

networks

A

_Bayesian

network is a

graphical

representation

of a set of random

_variables,

V,

which can be

organised,

in a directed

_acyclic

_graph

_(e.g.

_{[14]) (figure la).}

A

graph

is directed when for each

_pair

of

_neighbouring

_variables,

one variable is

causally dependent

on the

_other,

but not vice versa. These causal

_dependencies

between variables are

represented

by

directed links which connect them. The

graph

is

_{acyclic if,}

_following

the direction of the directed

_links,

it is not

_possible

to return to the same variable. Variables with causal links

_pointing

to v;

are denoted as

_parents

of

v; [pa(v;)].

Should v; have parents, the conditional

(11)

have no

_parents,

this reduces to the unconditional

prior

distribution

_p(v;).

The

joint

distribution is written

_{p(V) =}

_n

_{p( V}

_i

_{Ipa( Vi)).}

i

In this

_study

the variables in the network

_represent

a

_major

_genotype,

Wi.

The links

_pointing

from

_parents

to

_offspring

_represent

_{probabilities}

of alleles

being

transmitted from

parents

to

_offspring.

_Therefore,

the conditional distri-butions associated with variables are the Mendelian

_{segregation probabilities}

(

P

(

W

i I

W

,

;

,

Wd

)).

A

_{simple pedigree}

is

_depicted

in

_figure

la as a

_Bayesian

net-work. From

this,

it is

_apparent

that a

_pedigree

of

_genotypes

is a

special

case of a

_Bayesian

network.

In

_general,

exact

_computations

_amongthe

_genotypes

are

_required.

For

example,

in

_figure

la should it be

_required

to calculate

_,

_p(w

_l

wz,

w

5 ),

this can

be carried out as:

p(

WI

,

W2

,

W5

) =

E

5 g

,w

W7

,W6

p(w

,W

W4

g

,W2,W

i

).

W3,W4, W 6, W 7, W 8

The size of the

probability

table increases

_{exponentially}

with the number of

genotypes.

Therefore,

it

_rapidly

increases to sizes that are not

_manageable.

However, by using

the local

_independence

structure, recursive factorisation allows us to write the desired distribution as:

This is much more efficient in terms of

_storage

_requirements

and describes the

_general

idea

_underlying

methods for exact

_computations

of

_posterior

distributions in

_Bayesian

networks. When the

_Bayesian

network contains

_loops,

(12)

tables are minimised.

_Therefore,

an

algorithm

is

_required.

The method of

’peeling’ by

Elston and Stewart

_[4],

and

_generalised

_{by Cannings}

et al.

_[2],

provides algorithms

for

_performing

such calculations with

genetic applications.

However,

for other

_operations

needed in the blocked Gibbs

_{sampling algorithm,}

peeling

cannot be used.

_Instead,

we use the

_algorithm

of Lauritzen and

Spiegelhalter

[16],

which also is based on the above ideas. This

_algorithm

transforms the

_Bayesian

network into a so-called

_junction

tree.

2.3.4. The

_junction

tree

The

_junction

tree is a

_secondary

structure of the

_Bayesian

network. This

structure

_generates

a

_posterior

distribution that is

_{mathematically}

identical to

the

_posterior

distribution in the

_Bayesian

network.

_{However, properties}

of the

junction

tree

_greatly

reduce the

_{required computations.}

The desired

_properties

are fulfilled

_by

_anystructure that satisfies the

_following

definition.

Definition 1

_(junction

_tree).

A

_junction

tree is a

_graph

of clusters. The

clusters,

also called

_cliques,

_(Ci,

i =

_{1, n;)}

are subsets of

_V,

and the union of all

_cliques

is V:

_(C

_l

U

_{U, ... ,}

₂

_C

_{U G,}

=

V).

The

cliques

_are

organised

into _a

graph

with no

_loops

(cycles),

and

_by

_following

the

path

between

_neighbouring

cliques

it is not

_possible

to return to the same

_clique.

Between each

_pair

of

neighbouring

cliques

is a

_separator,

_S,

which contains the intersection of the two

_cliques

_(S

₁₂

=

C

l

_U

C

Z

).

Finally,

the intersection of

any two

cliques, C;

and

C

j

,

is

_present

in all

_cliques

and

_separators

on the

_{unique path}

between

_C;

and

C

j

.

2.3.5. Transformation of a

_Bayesian

network into a

_junction

tree

In

_general,

there is no

_{unique junction}

tree for a

_{given Bayesian}

network.

However,

the

_algorithm

of Lauritzen and

_{Spiegelhalter}

_[16]

_generates

a

_junction

tree for _any

_Bayesian

network with the

_property

that the

_{cliques generally}

be-come as small as

_possible.

This is

_important

as small

_cliques

make calculations

more efficient. In the

_following

_section,

we introduce some basic

_operations

of that

_{algorithm, transforming}

the

_Bayesian

network shown in

_figure

la into a

junction

tree.

The network is first turned into an undirected

_{graph, by}

_removing

the directions of the links. Links are then added between

_parents.

The added links

_(seen

in

_figure

1 b as the dashed

_links)

are denoted ’moral

links’,

and the

resulting

graph

is called the ’moral

_graph’.

The next

_step

is to

_{’triangulate’}

the

graph.

If

_cycles

of

_length

_greater

than three

_exist,

and no other links connect variables in that

_cycle,

extra ’fill-in links’ must be added until no such

_cycles

exist. After links are added between

_parents,

as shown in

_figure

_1,

there is a

cycle

of

_length

four which contains the variables w2, , w5W7 and w6. An extra fill-in link must be added either between w2and W7or as shown with the thick

link between W5 and ws.

Finally,

from the

triangulated

graph,

the

junction

tree

is established

_{by identifying}

all

_{’cliques’.}

These are defined as maximal sets of variables that are all

_pairwise

linked. In other

_words,

a set of variables that

are all

pairwise

connected

_by

links must be in the same

_clique.

These

cliques

must be

_arranged

into a

_graph

with no

_loops,

in such a _way,that for each

_pair

(13)

C

j

contain the intersection

_C;

f1

_C

_j

_.

This

_requirement

ensures that variables in

C;

and

_C

_j

are

conditionally independent, given

variables on the

path

between

them.

2.3.6. Exact

_computations

in a

_junction

tree

To

_perform

exact

_{calculations,}

the

_junction

tree is initialised

_by

con-structing

belief tables for all

_cliques

_(B(C

_i

_),...,B(C’

_n

₍

_c

₎₎₎

and

_separators

(B(5*i),...

,-B(6n(s)))-

Each belief table conforms to the

_{joint probability}

dis-tribution of variables in that

_clique

or

_separator,

and contains the current belief of the

_{joint posterior}

distribution of these variables. This is also called the belief

_potential

of these

_{cliques/separators.}

For

_example,

_B(C

_l

₎

_represents

p( c

l

ly)

in

_figure

2a. In the

_following

we assume that individual 8 in

figure

1 a

has a

_phenotypic

record.

_Then,

the belief tables are initialised

_by

first

_setting

all entries in each belief table to one. Prior

probabilities

of variables

with-out

_parents

are then

multiplied

onto

exactly

one

_arbitrarily

chosen

clique

in

which the variable is

present.

Finally,

the conditional

_{probabilities}

of variables with

parents

are

multiplied

onto the

unique clique

which contains that

vari-able and its

_parents.

_Following

this

_procedure,

the

_junction

tree in

_figure

Ic c

could be initialised as follows:

C

l

=

,

)

W5

,

W2

Wl

(

is initialised with

B(c

I

) =

p(wi)p(w2)p(w5Wi,W

2 ),C*2

=

(w

2 ,W

5 ,w

e

)

is initialised with all ones for

B(c

2 ),

3 c

=

(w

2 ,w

g

,we)

is initialised with

)

B(C

3

=

p(w

3 )p(we!w

2 ,w

g

),

C

4

=

(w

4 ,

W

g,

W7

)

is initialised with

) #

4 B(C

P(W

4 )P(W

7l

W

4 ,W

5 ),C

5

=

(

W5

,

W6

,

W7

)

is initialised with all ones for

_B(C

₅

_),

C

6

=

(w

s

, w

7 , w

s

)

is ini-tialised with

_B(C

₆

₎

=

),

s

w

, W7 )p(

Ysl

WSIW6

(

P

and

separators

are initialised

with all ones. After

having

initialised the

junction

tree in this way, we note that the

_product

of the belief

potentials

for all

cliques

is

_equal

to the

_joint

posterior

distribution of all variables:

The

_general

rule of this

property

is

_{given by:}

2.3.7. Junction tree

_propagation

The initialisation described in the

_previous

section is

_arbitrary

in the sense

that

_p(w

₂

₎

could have been

_multiplied

onto

_B(C

₂

₎

instead of

_{B (Ci ) .}

Therefore,

the belief tables do not at this

_point

reflect the

_knowledge

of variables in the

(14)

with the information on all other variables.

Propagation

in

_junction

trees is a

means of

_updating

the belief with such information.

This

_updating

is

_{performed by}

means of an

_operation

called

’absorption’.

This has the effect of

_propagating

information between

_{neighbouring cliques.}

For

_example,

if information is

_propagated

from

_B(C

₆

₎

to

_B(C

₅

₎

as in

figure

_2,

B(C

5 )

is said to absorb from

_B(C

₆

_),

_or,

_{equivalently, C}

₆

is said to send a

message to

_C

₅

_.

The

_{absorption operation}

consists of the

_following

calculation:

B

*

(C

5 )

=

B(CS) B*((! ))’

where

_B

_*

_(S

₅

_{) =}

C6BS5

B(C

6 ).

The

_absorption

can

B(55 )

_c!js!

_{06 BS5}

be

_regarded

as

_updating

the belief

_potential

of

,

W5

p(

_W6_,

W7) (B (W5,

_W6_,

W7

))

with information on the belief

potential

of p(w

6 ,

_w₇_,

W6

,

w

)(B(

S

_w₇_,

w8 ) ) .

This

is

_{accomplished by}

first

_finding

the conditional belief of variables in

C

5 given

variables in

_C

₆

_by

_B(

_W5I

_w

_6’

_W7

₎

w6

) .

w7

’

=

’

w5

(

B

The

_joint

belief of variables

j3(we,W7)

in

_C

₅

is then

_updated

with new information from

₆

_C

_by

B

*

(w

s

,W

e

,W

7 ) =

B*(we,W7)B(w5!W6,W7),

where

_B

_*

_(we,W

₇

_,Wg).

The

_junction

tree is invariant to the

_absorptions.

This means that after an

absorption, equation

(11)

is still true.

The

_object

is now to

_perform

a _sequenceof

_absorptions.

In this

_study,

(15)

junction

tree towards a

single

clique.

Consequently,

calling

collect evidence

from any

clique

results in the belief table

being

equivalent

to the

joint

pos-terior distribution of the variables it contains. As an

example,

figure

2 shows

that

_calling

collect evidence from

C

r

results in:

_{B (C}

_I

₎

ex:

1 ,

P

W

(

W2,

w

5I

ys),

and the order of

absorptions

is established as follows.

First,

c

I

requests

to absorb

information from its

_neighbours

_(C

₂

_).

_However,

this

_operation

is

_only

allowed if

_C

_Z

has

_already

absorbed from all its other

_neighbours

_(C

₃

and

_C

₅

_).

Since this is not the _case,

_C

₂

will

_recursively

_request

for

absorption

from these

_cliques.

This is

_granted

for

C

3 ,

but

_C

₅

still has not absorbed from

C

4

and

_C

₆

_,

which

it

_requested.

This is

_{finally granted,}

and the

absorptions

can be

_performed

in

the order illustrated in

_figure

2a.

Distribute evidence from

C

r

in

_figure

2b is

_{performed by allowing}

c

I

to send

a _messageto all its

_neighbours.

When a

_clique

has received a _{message it}will

send a new _messageto all of its

neighbours,

except

to the

clique

it has

_just

received a _messagefrom. In our

_example

the order of _messages

(absorptions)

is illustrated in

_figure

2b.

If ’collect evidence’ is followed

_by

the routine ’distribute evidence’ from the

same

_clique,

then for _any

clique

B(C;)

cc

p(c¡Jy) [15].

This is a _{very attractive}

property

because it is then

_possible

to find the

_{marginal posterior density}

of _any

variable, by summing

other variables out of any

clique

in which it is

present,

rather than

_summing

all other variables out of the

_joint

distribution.

2.3.8.

_Example

of exact calculations

An

_example

is

_provided

in this section to illustrate the

_relationship

between

exact calculations with or without the use of the

_junction

tree

_{representation.}

Should we want to

_compute

the

_marginal

posterior

probability

distribution

P

(

Wl

I

Y8

),

this can be carried out

_{directly using}

standard methods of

_probability:

However,

the

_independence

structure between

_genotypes

allows for recursive

factorisation:

and we can write

equation

(9)

as:

The

_junction

tree

_algorithm

is then used as follows.

_First,

the

_junction

tree

(16)

evidence is then called from

_C

_l

to calculate

_{p(Cl !y8).}

As

_{already described,}

this call consists of the series of

absorptions

ordered as illustrated in

_figure

2a.

The

_{corresponding}

calculations are as follows.

_{First, absorptions}

indicated

_by

1

in

_{figure 2a:}

_{B* (}

_C

_{5 )}

=

4 >

(s

B*

B

*

(s

5 )

where B*

(54)

#

£

B(C4)

and in

_figure

2a:

_B*(c5)

=

B(C5)

B(S44)

B (S5)

where

B * (S4)

L B(c4)

and

!(64)

B(55) !*

( ! )

W4

*

B

*

(S

5 ) =

£ B(C8 )

and

_B

_*

_(C

₂

₎

=

B (C

2 )

B(S22) ,

where

_B

_*

_(S

₂

_{) =}

L

B(c

3 ).

!—’

W8

±i)J2)

’ ’

!—’

W3

B*(S )

B

Second, absorptions

indicated

_by

2 in

_figure

2a:

₎

₂

_(C

_**

_B

=

B

(C

2 )

B

*

(

S3

)

B(S3)

where

_B

_*

_(S

₃

_{) =}

_!B*(C5).

_{Finally, absorptions}

indicated

_by

3 in

_figure

2a:

W 7

After collect evidence has been

_completed,

_p(w

_]

_jy)

can be found

_by

p(w

i

]y)

cc

L

B

*

(C

l

).

_Writing

these calculations

_together,

and

_substituting

W 2 W S

the initial

_{probabilities}

_(without

the tables of all

_ones),

we obtain:

This is

_exactly

the same calculation as

_equation

_(10),

which illustrates that

junction

tree

_propagation

is

_basically

a method to

_separate

calculations into

smaller

_steps,

and to _arrangethe order of

_these,

such that the correct result is

obtained.

2.3.9. Random

_propagation

Another

_propagation

_algorithm,

which relies on the

_junction

tree

_structure,

is ’random

_{propagation’,}

_developed

_by

Dawid

_[3].

This method

_provides

a

random

sample

from the

_{joint posterior}

distribution of all variables in the

Bayesian network,

p(V!e).

Random

_propagation

is initialised

_{by calling}

collect evidence from an

_arbitrarily

chosen

clique C

o

.

As mentioned

previously,

this results in

_B(C

_o

₎

_being

_equal

to the

_{joint posterior}

distribution of variables contained in

_C

_o

_,

_(-B(C

_o

₎

cc

P(Co!e)). B(C

o

)

is then used to

_sample

the variables

(17)

neighbouring cliques

(C.),

by absorption

from

_C

_o

to

_C

_n

_.

The belief tables of

_e

_n

will then be

_proportional

to the

_{joint posterior}

distribution of variables contained in the

_{given cliques,}

conditional on the variables

_already

_sampled.

That

_is,

_B(C

_n

₎

cc

p(C’!C’o,e)

=

p(C

nB

{Co

_n

Cn}!Co,e).

After

_{normalisation,}

variables of

_C

_nB

_{C

_o

fl

_C

_n

_}

are

_sampled,

and

_absorptions

are

_performed

to their

_{neighbouring cliques.}

_Sampling

and

_sending

_messagesis continued in this

manner until the entire network is

sampled.

The order in which

_sampling

is

performed

follows the order of _{messages in}distribute evidence

_(figure

_2b).

In

our

_{genetic example,}

we can first collect evidence to

_C

_l

_.

_Performing

the random

propagation

algorithm

then involves

_sampling

from the

_following

distributions:

2.3.10.

_Creating

blocks

_by

_conditioning

The method of random

_propagation

of Dawid

_[3]

can be used to obtain a

random

_sample

of all variables from their

_joint

_posterior

_{distribution,}

_p(V!e),

or

_equivalently

p(wlb,

_{u, m,}

_f,

(7 e 2,G2, u

y).

_However,

if the

Bayesian

network is

large

and

_complex,

the

_cliques

of the

_junction

tree _{may contain many}variables. This is a

_problematic

_scenario,

as dimensions of the

_{corresponding}

belief tables

are

_exponential

in the number of variables the

_cliques

contain.

_Therefore,

the

storage

requirements

of

_junction

trees _maybecome

_{prohibitive, preventing}

the

performance

of the

_operations

described earlier. If this were the case, it would not be

_possible

to obtain a random

sample

from the

_joint

distribution of all variables.

Conditioning

on a variable allows a new

_Bayesian

network to be

_drawn,

where the variable conditioned on is

_separated

into several clones. This will often break

_loops

in the

_network,

as illustrated in

_figure

3. When

_loops

are

broken,

fewer fill-in links are needed to render the

_graph

_{triangulated,}

and

_{consequently,}

fewer and smaller

_cliques

are created. It follows that the

storage requirements

of the

_{corresponding junction}

tree are smaller and random

propagation

can be

performed.

The

_concept

of

_conditioning

is illustrated in

figure

3,

where two different

_variables,

_W5 and w7, are conditioned on. The

resulting Bayesian

networks are illustrated in

figure

3a and c, and the reduced

junction

trees are illustrated in

_figure

3b and d. This

_corresponds

to the creation

of two

blocks,

B

I

=

{wiW

2 WgW

4 W

6 W

7 Wg}

and

₂

_B

=

81 W

C

W

5 W

W

4

3 W

2 W

l

fW

-The reduced

_junction

trees demonstrate that

_storage

_requirements

of the

junction

trees are

reduced,

because

_loops

in the

_original

_Bayesian

network are

broken. This occurs as the

_junction

trees contain fewer

_cliques

and some of

the

_cliques

contain fewer unknown variables. In

_figure

3b and d variables with letter

_subscripts

are assumed known. It _{is easy}to see that a _very

_large

_junction

tree can be reduced

_sufficiently

with

_respect

to

_storage

_constraints,

if _many variables are conditioned on.

Blocks were created

_by

_choosing

variables

_through

the

_following

iterative

steps.

First,

the variable

_yielding

the

_highest

reduction in

_storage

_requirements

when conditioned on was identified.

_Second,

this variable was conditioned on in

(18)

calculated. If the

_{storage requirements}

were

_larger

than what was

_available,

the

steps

were

_repeated

finding

the next variable to condition on. This was

continued until all blocks satisfied the

storage

constraints of the

_system.

All variables were, of course,

required

to be in at least one block. Jensen

[10]

provides

more information on the block selection

_algorithm.

Blocks

_B

_l

_{, ... ,}

_B

_n

are constructed such that each contains most of the vari-ables of the network. The

_{complementary}

sets

_(i.e.

the variables that are

condi-tioned

_on)

are called

_{Bc, ... , B’.}

In this _way,each

_{set, B;}

U

_{B! ,}

contains all the

major genotypes

in the

_pedigree.

As the

_junction

tree of each block can now

be stored in the

_computer,

exact inference can be

_performed,

and a

_joint

sam-ple

of all variables in the block can be obtained

using

the random

propagation

method.

_{Therefore, using}

the described form of

_blocking,

we can obtain random

samples

from the

_joint

distribution of a

_block,

conditional on the

complemen-tary

set of other variables in the MIM and

_data,

_{p(B; !B! ,}

_b,

_{u, m,}

_{ae, aU,}

_f,

_y).

In the

MIM,

the Gibbs

sampling algorithm using

general blocking

can thus

be summarised as follows:

1)

form

_optimal

blocks with

_respect

to

_storage

constraints

_{by conditioning;}

2)

construct the

_junction

tree for each

Blocking Gibbs sampling in the mixed inheritance model using graph theory

HAL Id: hal-00894251

https://hal.archives-ouvertes.fr/hal-00894251

Submitted on 1 Jan 1999

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Blocking Gibbs sampling in the mixed inheritance model

using graph theory

Mogens Sandø Lund, Claus Skaanning Jensen

To cite this version:

Original

article

Blocking

Gibbs

sampling

in the mixed

inheritance model

using graph theory

Mogens

Sandø

Lund

Claus

Skaanning

Jensen

a

DIAS,

Breeding

Genetics,

Foulum,

Tjele,

b

AUC,

Department

Computer Science,

Bajers Vej 7E.,

Aalborg 0,

(Received

February

1998;

accepted

1998)

(MIM),

including

single

polygenic effect,

(MCMC)

algorithm

single

sampled

large

joint

requires

joint

given block,

complicated.

joint

using graph

Bayesian

example

pedigree

algorithm

algorithms

updating

algorithms using blocking

offspring.

algorithm

utilising genetic

information,

potential

reducibility problem

_sampling

_{using graph theory}

_Skaanning

_Genetics,

_Tjele,

_{Computer Science,}

_{Bajers Vej 7E.,}

_{Aalborg 0,}

_February

_1998;

_accepted

₁₉₉₈₎

_(MIM),

_including

_single

_{polygenic effect,}

_algorithm

_requires

_{given block,}

_complicated.

_joint

_pedigree

_algorithm

_algorithms

_updating

_{algorithms using blocking}

_offspring.

_algorithm

_{utilising genetic}

_information,

_potential

_{reducibility problem}

_©

_{Inra/Elsevier,}

_/

_{Échantillonnage}

_(un

_{polygénique),}

_présente

_algorithme

_(MCMC)

_lequel

_génotypes

_importants

_partir

_jointe

_exige

_conjointe

_qui

_compliquée.

_pedigree

_suggère

_algorithme

_plus

_algorithmes

_jour

_qui

_risque

_quand

_applique

_©

_{Inra/Elsevier,}

_/

_(MIM),

_phenotypes

_by

_genotypes

_single

_component

_{using analytical techniques.}

_marginalise

_single

_genotypes

_computations

_genotypes

_sampling

_algorithm

_technique

_complex

_{sampler only requires}