Computing approximate monogenic model likelihoods in large pedigrees with loops

(1)

HAL Id: hal-00894117

https://hal.archives-ouvertes.fr/hal-00894117

Submitted on 1 Jan 1995

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Computing approximate monogenic model likelihoods in

large pedigrees with loops

Llg Janss, Jam van Arendonk, Jhj van der Werf

To cite this version:

(2)

Original

article

Computing approximate monogenic

model likelihoods in

_large

_pedigrees

with

_loops

LLG Janss

JAM

Van

_Arendonk,

JHJ Van der Werf

Wageningen Agricultural University, Department of

Animal

_Breeding,

PO Box

_338,

6700 AH

_Wageningen,

The Netherlands

(Received

30

January

1995;

accepted

11

September

1995)

Summary - In this

_study

’iterative

_peeling’

is

_introduced,

a method

_equivalent

to

the traditional recursive

_peeling

method for

computing

exact likelihoods in

_nonlooped

pedigrees,

but which can also be used to obtain

approximate

likelihoods in

_looped

pedigrees.

Iterative

peeling

is an

_interesting

tool for animal

_breeding,

where exact recursive

peeling

is

_generally

unfeasible due to the abundant number of

loops

in animal

pedigrees.

In

_simulations,

_{hypothesis testing}

and _parameterestimation were

compared

based on

approximate

likelihoods in

_{looped pedigrees}

and exact likelihoods in

_nonlooped

_pedigrees,

showing

no biases introduced

_by

the

_{approximation}

in

_{looped pedigrees.}

likelihood

_/

pedigree peeling

/

major gene

/

looped pedigree

Résumé - Calcul approximatif de vraisemblance pour un modèle monogénique dans de grands pedigrees à boucles. Dans cette étude on introduit une

_procédure

itérative

de condensation de

_{l’information}

contenue dans un

pedigree, appelée « épluchage »,

qui

est

équivalente

à

_{l’épluchage récursif pour}

le calcul des vraisemblances exactes dans des

pedigrees

sans

_boucles,

mais qui est

_également

utilisable pour le calcul de vraisemblances

ap-proximatives dans les

_pedigrees

à boucles.

L’épluchage itératif

est une méthode intéressante en

_génétique

animale où la méthode récursive exacte est

généralement inapplicable

à cause du

grand

nombre de boucles dans les

pedigrees

animaux.

À

l’aide de

simulations,

on a

comparé

des tests

_{d’hypothèse}

et l’estimation de

paramètres

basés sur des vraisemblances approximatives dans des

_pedigrees

à boucles et des vraisemblances exactes dans des

pedi-grees sans

_boucles,

montrant

qu’il n’y

a _pasde biais introduit _{par le}calcul

approximatif

dans des

pedigrees

à boucles.

(3)

INTRODUCTION

Research into the use of

_major

_genemodels in animal

_breeding

has been aimed

mainly

at

_{approximations}

to a mixed inheritance

_model,

_{including polygenes,}

in

one

_generation

half-sib structures

_(Hoeschele,

_1988;

Le

_Roy

et

_{al, 1989;}

Knott et

_al,

1992).

Because of the

_{pedigree loops}

that arise in animal

_{breeding situations,}

ex-tension to

_{multigeneration}

_pedigrees

is difficult. A

_{pedigree loop}

arises when 2 individuals are connected

_by

more than one

_path

of descent or

_marriage

relation-ships.

Lange

and Elston

₍₁₉₇₅₎

described various

types

of

loops,

among which are

inbreeding

loops, marriage rings

and

_{marriage loops.}

In animal

_breeding

_pedigrees

these kinds of

_loops

are _verycommon. In

_{particular, multiple matings}

which are

generally

applied

to males and often to

_females,

result _{in many}

_marriage

_loops

and

marriage rings.

For

_genotype

_probability

and likelihood

_{computation, loops}

can

only

be dealt

with in an exact manner in

_pedigrees

with a few

_{simple non-overlapping loops using}

the traditional recursive

_peeling

method

_(Elston

and

_{Stewart, 1971;}

_Cannings

et

_al,

1976;

Cannings

et

_al,

_1978).

_However,

in

_{highly looped}

_pedigrees,

common in animal

breeding,

exact recursive

_peeling

is too

_demanding

_{computationally}

and recursive

peeling

is not flexible

_enough

to allow for

_approximate

_{computations.}

In this

_study

we introduce ’iterative

_peeling’.

Iterative

_peeling

is

_developed

as

an exact method for

application

in

_{nonlooped pedigrees, equivalent}

to recursive

peeling,

but

_which,

unlike the

_original

recursive

_variant,

can be used without

modifications in

_looped

_pedigrees

to obtain

_approximate

likelihoods. The main

objective

of this paper is to introduce iterative

_peeling

for such

_{approximations}

in

_looped

_pedigrees,

_allowing

for a more

_{general application}

of

_major

gene models in animal

_breeding.

_{Using simulations,}

the usefulness of the

_{approximation}

for likelihood-based

_{hypothesis testing}

and

_parameter

estimation in

_looped

_pedigrees

is

_{investigated.}

A

_monogenic

model will be

_considered,

which can be extended to a

mixed inheritance

model,

as will be discussed.

RECURSIVE AND ITERATIVE PEELING

In the first

_section,

recursive

_peeling

is described for

_{obtaining monogenic}

model likelihoods in

_{nonlooped pedigrees.}

In the second

_section,

’iterative

_peeling’

is introduced as an

_equivalent

method for exact

_computations

in

_nonlooped

_pedigrees.

The

_equivalent

exact method in

_nonlooped

_pedigrees

can be used as an

_approximate

method in

_{looped pedigrees.}

Recursive

_peeling

Probability

and likelihood

_computations

in

_nonlooped

_pedigrees

can be done

_by

recursive

_peeling

_(Elston

and

_{Stewart, 1971;}

_Cannings

et

_al,

_1976;

_Cannings

et

_al,

1978)

using

2 basic

_{peeling operations}

of

_{’peeling up’}

and

_’peeling

down’.

Roughly,

considering

a

_{single family,}

a

_peel-up

_operation

_represents

the information in a

family

in

_{probabilities}

for the

_genotype

G

i

of a

_parent

_i,

and a

_peel-down

_operation

(4)

of the

peel-up operation

is denoted

_by

_prog(G

_i

₎

and the result of the

_peel-down

operation

is denoted

_by

_prior(G

_k

_).

The

_{corresponding}

notation in

_Cannings

et al

(1976, 1978)

is the

_R

_*

_(..;G

_i

₎

function for

_peeling

up and the

R+(..;G!)

function for

peeling

down.

Peeling operations

are used

_recursively,

_eg,the

_computation

of a _progterm for

a

_parent

based on _progenydata _mayinclude

_{previously computed}

_{prog terms}of

those progeny,

representing

information from

_{grand-progeny.}

The aim of

_peeling

is

to condense all information from a

_pedigree

into a

_prior

and _progterm for a

single

individual

1, obtaining

the likelihood L for all data in the

_pedigree

as:

where

_{f (y, I G}

_i

₎

is the

_penetrance

_function,

which is the

_probability

for the observed data _y_l on

_{individual l, given}

that it has

_{genotype G}

_i

_.

The individual

_may

be an

individual from the base

_population,

in which case the

_{base-population}

_genotype

frequency

P(G!)

is used in

_place

of

_prior(G

_i

_).

Individual

_{l may}

also have no own

data or no _{progeny, in}which case the

_{corresponding}

_penetrance

term or _progterm

is removed.

_{Computationally}

this is

_{implemented using}

a

_penetrance

or _progterm

containing

l’s.

Peeling equations

A

_peeling

_equation

for an individual is obtained

_by

_considering

the collection of

possible base-population

genotype

frequencies,

genotype

transmission

_{probabilities,}

penetrance

probabilities

and other

_peeling

terms

_pertaining

to the individuals in its

family

and

_summing

over all

_possible

_genotypes

of the

_family

members. The terms

thus

_entering

a

_{peeling equation}

are difficult to

_give

in

_{general. Here, equations}

will be

_given

to use

peeling

in a

_pedigree

structure with dams nested within sires. In this structure a

_family

is a half-sib

_family

of one sire with several

_mates,

_containing

groups of full sibs which are, across _groups,

_paternal

half-subs. Three different

peeling equations

are considered: 2 for

_peeling

_up,

_dependent

on whether this is

done for a sire or a

_dam,

and 1 for

_peeling

down. In the

_peeling

_equations,

_prior,

prog and

penetrance

functions on

family

members are

specified

in all

_places

where

they

can enter. When these are not

_relevant,

_eg,when a _progenydoes not have

progeny of its own, these are removed _or,

computationally,

terms

_containing

l’s are used. Prior terms for individuals in the base

_populations

are substituted with

base-population

genotype

frequencies.

To condense all information in a _progterm for a sire i the

_following

_expression

is used:

prog(Gi) = r

l

jrgj

pr2or(Gj) fly7lC-T7)Hk!Gk

Pl!’klGi,

Gj)

f(

y

kl

-G

k

)

!r!9(CTk) [2]

where j

=

1 to ni are mates of

_i,

each mate

_having

k = 1 to

n

ij progeny, and

P(Gk !Gi, G! )

is the

_genotype

transmission

_probability

of sire i and a

_{dam j}

_to

offspring

k. To condense all the information from a half-sib

family

into a _{prog term}

(5)

where i is the sire of the

_family,

_prog-_j_.

₎

_i

_(G

is like in

_equation

_[2],

but

_excluding

dam

_j

_*

and k =

1,

n2!* are _progenyof dam

j

*

.

To condense all the information in

a

_prior

term for 1

_particular

_progeny with dam *k

_j

_*

_,

the

_following

_expression

is used:

where i is the sire of the

_family,

_phs(G

_i

₎

is a term that includes information on

the

_paternal

half-sibs of

_k

_*

_,

which is a function of the

genotype

of its sire i and is

computed

as:

Iterative

_peeling

Iterative

_peeling

is

_equivalent

to recursive

_peeling

used in

_{nonlooped pedigrees.}

Iterative

_peeling

is based on

_{algebraic partitioning}

of the likelihood and on

_repeated

computation

of

_peeling

_equations,

based on the idea of iterative

_computation

of

genotype

probabilities

(Van

Arendonk et

_al,

_1989).

Partitioning

of likelihood

The aim of

_obtaining

the likelihood of all data

_{using equation}

_[1]

_requires

families

to be handled in a certain order and

_requires

_peeling,

within each

_family,

to be

in a certain direction.

_{Peeling operations}

can be used to

_partition

the likelihood

pertaining

to

_parts

of the

_pedigree.

This

_partitioning

is continued until

_parts

are

obtained

_pertaining

to

_single

families. This allows a

_family-wise

evaluation of the

likelihood,

and the

_requirement

of

_peeling

to have a direction within each

_family

becomes obsolete.

Consider the

_pedigree

with 5 individuals in

_figure

1. In this

_pedigree

2 families

are

_present,

one

_family

with individuals

_1,

2 and

_3,

and a second with individuals

_3,

4 and 5.

_Here,

one

_partitioning

above and below individual 3 divides the

_pedigree

in

2

_families,

with individual 3

_being

in both families. Individual 3 is called a

_linking

individual. The likelihood for a

_{monogenic model, assuming}

data is available on all

5

_individuals,

is

_computed

as:

Now,

L is

_multiplied

and divided

by Li =!1!2!03

P(Gi) P(G

2 )

P(G

3 ) Gi , G

2 )

!(

Y1

IG

1

(6)

where the

part !01!02

P(G

i

)

P(G

2 )

P(G

3 ) Gi ,

)

G

2 !(

Y1

IG

d

f(y

2 ) G2 )

has been

iso-lated. This

_part

is

_prior(G

₃

_).

The term defined as

_L

₁

can be rewritten as

E

G3

I;

G1

I;c

2 P(G

1 ) P(G

2 ) P(G

3I

G

1 , G

2 ) !(

Y1I

G

1 ) !(

Y2I

G

2 ),

which is

_{I;c3 prior(G3).}

This

_simplifies

L to:

where

_prio&dquo;

_sC

_{( G}

₃

₎

stands for a

scaled,

or

normalised, prior

term. Now the likelihood can be written as L =

L

1 L

2 ,

or

ln(L)

=

ln(L

i

)

₊

ln(L

2 ),

with one likelihood term

per

family.

This is a

partitioning using

a

_prior

term for the

_linking

individual. It shows that for this

_type

_partitioning

_(i)

in the

_family

where the

_linking

individual is

a _progeny,after the

partitioning,

information on the

linking individual,

ie own data and progeny

data,

is

ignored;

and

(ii)

in the

family

where the

linking

individual is

a

_parent,

a scaled

_prior

term is used for the

_linking

individual. This term is used

in a manner like a

_{base-population}

_genotype

_frequency

for base individuals. The

scaled

_prior

term for a

linking

individual

1,

is

computed

in

general

as:

Although

the

_partioning

is shown

only

for 1

_example,

the

_partitioning

is _very

general.

The term

_L

₁

above is in

_general

the sum of the

_prior

term for a

_linking

individual

1,

which is the collection of all

_probability

terms

_pertaining

to anterior individuals of and the transmission

_probability

to

_l,

summed over all

_possible

genotypes

of l and of its anterior individuals. At the same time this term

_represents

the likelihood of the entire anterior

part

of the

_pedigree

and

l, excluding

data on

l. The

_remaining

_part

after the

partitioning,

L

2

in the

_example,

is the likelihood of the

posterior

part

of the

_pedigree

of

l, including

l with a scaled

prior

term. In

larger pedigrees

this

partitioning

is

_repeated

to

_yield

_parts

_{corresponding}

to

_single

families. When

_repeating

the

_{partitionings,}

results of earlier

partitionings

must be taken into

_account,

_eg,the result that after a

_partitioning

information on a

_linking

(7)

The likelihood of a

pedigree

can be

_{partitioned entirely using prior}

terms.

However,

the iterative

_computation,

as will be introduced

_hereafter,

can be

_speeded

up

by

also

using

a

_partitioning

of the likelihood

using

a _{prog term.}

_Showing

this based on the

example,

the likelihood L is

_multiplied

and divided

_by

a term

representing

the likelihood of

_family

_2,

_ignoring

data on individual

_3,

_L2

=

E

G3

E

G4

E

G5

P

(Ga) P(G

5I

G

3 ,

Ga) !

(

Y3I

G

3 )

/(!!G4),

which leads to:

2

Here a term

_E

_G4

_E

_G5

_P(G

₄

₎

_P(G

_5l

_G

₃

_,

₎

_G

₄

_!(

_Y4I

_G

₄

₎

_!(

_Y5I

_G

₅

₎

has been

_isolated,

which is

_prog(G

₃

_).

The division

_by

_L2

_scales

this

_term,

_L2

_being

_E

_G3

_!rog(G3).

Hence,

L is written as:

where

_prot

_C

_{( G}

₃

₎

denotes the scaled or normalised _progterm. For a

_partitioning

using

a _progterm it is seen that

(i)

in the

_family

where the

_linking

individual is

a _progeny,a

_prog

_s

_’

term is added as information for the

_individual;

and

_(ii)

in the

family

where the

linking

individual is a

_parent,

all information from observations

and from

_prior

terms is

_ignored.

The scaled _progterm for a

_linking

individual l is

generally

computed

as:

Partitioning

in a nested

_design

In a nested

_{design, partitionings}

are carried

_through

until

_parts

are obtained

corresponding

to sire families. In such

_families,

several female

parents

can be

present.

The

_linking

individuals are all the sires and dams of the

families,

_except

when

_they

are in the base

_population.

In this

_design

we consider a

_partitioning

using

a _progterm for each male and a

_prior

term for each female that is a

_linking

individual. When all

_parents

of a

family

are in the base

_population,

the

_part

of the

likelihood

_pertaining

to such a

_family

is

_computed

as:

where i indicates the sire of

_family

_{s, j}

sums over the dams of the

_family,

k indicates

male _progenythat are

_linking

_individuals,

1 indicates female _progenythat are

_linking

individuals and m indicates all other _progeny.When the sire of the

_family

is not in

the base

_population,

the term

_P(G

_i

₎

_{f (y2!Gi)}

on the first line of

[5]

is removed and

(8)

of

_[5]

is

_replaced

with

_{priof’c (G j)}

_’

The considered

_{partitionings using}

_progterms

for all male

_linking

individuals lead to this removal of information from sires on the

first line of

_[5]

when sires are not in the base

population

and lead to the inclusion of the

prog

sc

for males on the third line of

_equation

!5!.

The considered

_{partitionings}

using prior

terms for all female

linking individuals,

lead to the inclusion of a

_priors!

term on the second line of

[5]

when dams are not in the base

_population

and the removal of all information of females on the fourth line of

_equation

!5!.

Based on

the results from the

_previous

_paragraph,

the likelihood of the entire

_pedigree

after the

_{partitionings}

is:

Repeated computation

of

_peeling

_equations

Iterative

peeling

uses

_repeated

_computation

of

_{peeling equations.}

The

_repeated

computation

is a method to establish the order in which

_equations

should be handled.

Therefore,

iterative

_peeling

does not

_{require knowledge}

of such an order

beforehand,

as is

required

for recursive

peeling.

For each individual a

_prior

and a _progterm is

_computed

and remains stored because results of

_peeling

terms can be

_required

as

_input

for the

_computation

of

other

peeling

terms. Iterative

_peeling

_computes

a series of solutions

_{priorlo, rtorlil,}

etc,

for these terms.

_Starting

values are taken for individual i as

pr!or!(Gt) =

P(G

i

),

the

genotype

frequencies

in the base

population

and

_I

_l

_°

_prog

_(G

_i

₎

_equals

1 for all

_G

_i

_.

Iterative

_computation

starts

_{by computing}

prior

[l

] (G

i

)

for each individual

i,

in order of

descending

age. Evaluation of these

prior

terms is based on

prior!l!

terms

of

_parents,

which are available because older individuals are

_updated

before _younger

individuals,

and on

prog

lol

terms of sibs.

_{Subsequently,}

_{proglo (Gi)}

is

_computed

for each individual

_i,

in order of

_ascending

_age.Evaluation of these prog terms is

based on

prior!l!

terms of

_mates,

on

prog[l]

terms of _progeny,which are available

because now _youngerindividuals are

updated

before older

individuals,

and for

female

parents,

on a

prog

iol

or

prog

[l]

term of their male mate. Whether this last

term is

_{already updated}

as

prog

[l]

depends

on the order in which _progterms are

computed.

After

_computation

of all

_prior!l!

and

_prog

_[l]

terms is

_completed,

a new

iteration starts

_computing

prior

[2

]

and

_prog!2!,

etc.

Starting

values are such that

prior

lol

terms are correct for all individuals in the base

_populations,

and

_prog

_lol

terms are correct for all individuals without progeny.

Terms that can be correct after the first

_cycle

of

_computations

are, for

instance,

prior

l

1]

terms of individuals

_descending

from 2 base individuals and

_prog

_[l]

terms

of

_parents

without

_{grandprogeny.}

Correct

_computation

of a term is shown when in

the next

_{cycle recomputed}

terms are

_equal

to old terms. Once it is found that a

term is

_{correctly computed, recomputation}

can be omitted in

_following

iterations

of the

_algorithm.

The order in which terms are found correct

_gives

information on

the order in which recursive

_peeling

could be used.

_Generally,

in each

_iteration,

reasonably large

groups of terms _appearcorrect,

keeping

the number of

_cycles

required

to

_compute

all terms

_correctly

_{reasonably small, typically}

about the number of

_generations

in the data set. When all terms are found

_correctly

_computed,

(9)

Application

in

_looped

_pedigrees

The series of solutions

_prior

_f

_°

_]

_,

_prio!ll,

_etc,

obtained with iterative

_peeling

can be

considered as

_temporary

solutions for the

_required

terms,

corresponding

to solutions based on a not

_yet

_fully

determined

_peeling

order.

_{’Temporary’}

likelihoods can

also be

_{computed using}

_[5]

and

_[6]

based on a not

_yet

_fully

determined order. In

_nonlooped

_pedigrees,

a

_peeling

order can

_eventually

be found and

_temporary

solutions become exact. In

_looped

_pedigrees,

a

_peeling

order for recursive

_peeling

cannot be determined. In the iterative

_{peeling algorithm}

the

_{impossibility}

of

_finding

a

_peeling

order in

_{looped pedigrees}

is shown

_by

_continuing

_changes

in

_peeling

terms. In

_{looped pedigrees,}

these

_changes

were found to decrease in size

quickly

and

temporary

likelihoods were found to

_{stabilise, supplying}

an

_{approximation.}

Because

in iterative

_peeling

_every

_following

_update

of terms includes information from

50%

less related

_individuals,

a

_geometric

rate of _convergenceis

_plausible.

As a

_stopping

rule to use the

_{approximation}

in

_looped

_pedigrees,

we used the _averageabsolute

difference between

subsequent

normalised

_{heterozygote probabilities,}

based on

computed

peeling

terms. For

_convenience,

_only

the

_heterozygote

_probability,

which

changed

the

_most,

was monitored.

SIMULATION STUDY

Application

of iterative

_peeling

to obtain

_approximate

likelihoods in

_looped

pedi-grees was the aim of this

_study.

Simulations were therefore

_performed

to

_investigate

the usefulness of this

_{approximation.}

Because exact

_computations

are unfeasible in

large looped

pedigrees,

approximate

likelihoods could not be

_compared

with exact

ones.

_Hence,

an indirect way to

study

the

approximation

was found

_{by studying}

the distribution of test statistics and of

_parameter

estimates over a number of

replicated

analyses

in

_looped

as well as in

_{nonlooped pedigrees.}

In

_{nonlooped pedigrees}

exact

likelihoods could be

_{computed, serving}

as a reference. Simulations and

_analysis

are

based on a biallelic autosomal locus and a normal

_penetrance

function. Simulated data

Data sets had a nested structure each

_generation,

with full-sibs nested within

paternal

half-sibs. Three different data structures were used

_{(table I),}

1 structure

without

_loops

and 2 structures with

_loops.

The data structures were

_designed

to

contain

_{approximately}

the same number of

_{observations,}

the same number of base

individuals

_(structure

1 vs

₂₎

and the same

family

sizes

(1

vs

3).

In structures 2 and

3,

the third

_generation

was

_{produced by taking}

1 son from each sire and 1

_daughter

from each

_{dam, maintaining}

the same

_breeding

structure across

_generations.

No directional selection was

_practised,

and

_breeding

females for a male were each

taken from a different

_sire-family.

Half- and full-sib

_matings

were

_avoided,

so that

inbreeding

was absent within the 3

_generations

considered. The additional third

generation

in structures 2 and 3 caused _many

_{pedigree loops}

in the form of

_marriage

loops.

All individuals used for

_breeding

the last

_generation,

ie 120 for structure 2

and 60 individuals for structure

_3,

were involved in 1 or more such

_loops,

often

(10)

Genotype

G

i

of an individual

_equals

_1,

2 or 3

_{corresponding}

to

_genotypes

_A

₁

_A

_1’

A

l

A

2

and

A

2 A

2

at an autosomal locus.

_Genotypes

for individuals in the base

population

were

_{randomly sampled}

_using

_genotype

_{frequencies according}

to

Hardy-Weinberg

proportions,

after which

_genotypes

of other individuals were

_randomly

sampled

based on realised

parental

_genotypes

_assuming

Mendelian transmission

probabilities.

For each individual a random

normally

distributed environmental

component

was

_sampled

and added to a

_{pre-determined}

effect of each

_genotype

to

obtain a

phenotypic

observation. Random numbers were

_generated

_using

GGUBFS

and

_GGNQF

_{(IMSL, 1984).}

Details on the

_parameters

used for these simulations

are

_given

in the

_following

sections. Model and model

_fitting

The statistical model can be

_{specified by}

the

_probability

terms in

_!2!,

_[3]

and

_[4]

which are

P(G

i

),

the

_genotype

_frequency

in the base

population

for individual

_i,

P(GiIG

s

,

G

d

),

the transmission

_probability

for individual i

_given

the

_genotypes

of its sire s and dam

d,

and the

_penetrance

function

/(<

/

t!G,),

the

_probability

for

the data _y₂ on individual i

_given

the

_genotype

_G

_i

of individual i. From these 3

terms,

transmission

_{probabilities}

are assumed known to be Mendelian.

_Genotype

frequencies

in the base

_{population depend}

on the unknown

_{frequency f}

of the

A

1 allele, assuming Hardy-Weinberg proportions

of

_genotypes.

The

_penetrance

function for an individual i is taken as:

This

_penetrance

function is a normal

_{probability density}

function with variance

a-2

around

the mean _JiGi for

_{genotype G}

_i

_.

No dominance is assumed. For

_analysis,

means attributed to the

genotypes

are

expressed

as _Ji₁ =

p -

1/2t,

!2 = tc and A’3 =

J

i +

1/2t,

where t is the difference between

homozygotes,

referred to as the

gene effect. The unknown

parameters

in the model are then

_f,

_p.,

_t,

and

Q2

.

Likelihoods were

computed using

iterative

peeling.

For structure

_1,

without

loops,

computations

were done

_exactly

_by

_repeating

the

_computations

until no

further

_changes

_occurred,

_having

found the order for recursive

_computation.

For the

looped

pedigrees

of structures 2 and

_3,

iterative

_peeling

was used to obtain

approximate

likelihoods. The

_stopping

rule was a

change

less than

10-

8

for the average absolute

heterozygote probabilities

of all individuals. The maximum of the

likelihood was

_{sought using}

the downhill

simplex algorithm

(Nelder

and

Mead,

1965),

using

as _{convergence criteria}the variance of likelihood values of

_points

in

(11)

Comparisons

Looped

and

nonlooped

pedigrees

were

compared

in

hypothesis

tests and

_parameter

estimation. In

_hypothesis

_testing,

a null

_{hypothesis postulating}

the absence of a

major

gene is

used,

described

by

a model with

parameters it

and

_a

₂

_,

and an

alternative

_{hypothesis postulating}

the presence of a

_major

_{gene is}

used,

described

by

a model with

_parameters

_f,

_/1,

t and

a2.

Tests are based on the likelihood ratio

(LR)

test

_statistic,

which is twice the natural

_logarithm

of the ratio of maximum likelihoods under each

_{hypothesis. Type}

I error and _power,the

complement

of

_type

II _error,were

_investigated

at their nominal

_level,

ie

_assuming

the

_expected

classical

asymptotic

X2

distribution

for the LR test statistic under the null

hypothesis

(Wilks, 1938).

Using

the classical

_{rules, rejection}

thresholds were obtained from a

X2

2

distribution with 2

_degrees

of

_freedom,

ie the difference in number of

_parameters

between the null and alternative

_hypothesis.

It should be noted that for

testing

mixtures,

these classical rules do not lead

_exactly

to the nominal

type

I errors

(Titterington

et

_al,

_1985),

but this is not of

_importance

for the

_comparisons

between

looped

and

_nonlooped

_pedigrees

to be made here. The likelihood

_L

_o

for the null

hypothesis

is

_computed

as:

where y

i

are observations with i =

1, ... , N,

the total number of

observations,

assumed

_normally

and

_{independently}

distributed. Under the null

hypothesis,

the maximum likelihood estimate for the mean is

_íi

=

&dquo;

BYi/

N

and for the variance is

(;

2 =

_&dquo;B(

_Yi

_-

_íiO)

₂

_/N.

Type

I error of the test for a

_major

_genewas

_investigated

_by

_simulating

1 000

data sets of each structure

_{(table I),}

_generating

for each individual

_only

a

_randomly

distributed error term with

U

2

= 100 as

phenotype.

Likelihoods for the null

hypothesis

and the alternative

_hypothesis

were

_computed

in each of these

_replicated

data

_sets,

and the likelihood ratio test statistic was obtained. The number of

significant

tests in these 1 000 data sets was counted

_{using rejection}

thresholds

of 4.605 and

_5.991,

_{corresponding}

to nominal

_type

I errors of 10 and

5%.

Power to

detect a

_major

_genewas

_{investigated by}

_simulating

100 data sets of each structure

(table I)

for 3 different gene effects t =

_5,

t = 7.5 and t = 10 and

_using

allele

frequency f

= 0.5 and residual variance

U

2

= 100.

Hence,

relative _geneeffects

t

l

a

were

_0.5,

0.75 and 1. Power was based on a nominal

_type

I error of

5%,

_using

a

rejection

threshold of 5.991. Parameter estimates were

compared using

the 100 data

sets of each structure

_(table

_I)

used to

_investigate

_powerwith t = 10.

RESULTS

Type

I errors were

significantly

lower than their

_nominal,

ie

_{asymptotically}

ex-pected, level,

but

_comparison

of

_type

I errors between

_looped

and

_nonlooped

struc-tures did not show

_significant

differences

_(table

_II).

This indicates that absolute values of

_approximate

likelihoods obtained are on _averageclose to

_expected

and that the distribution of the test statistic over a number of

_replicates

is not

(12)

estimates for _geneeffect under the alternative

_hypothesis

are biased in

_general,

but estimates for gene effects as well as allele

frequency

do not differ between

_looped

and

_nonlooped

structures

_{(table IV).}

This indicates that location of the maximum

is,

on _averageover

_replicates,

not altered for

_approximate

likelihoods.

(13)

DISCUSSION AND CONCLUSIONS

An alternative

_peeling

_algorithm,

called iterative

_peeling,

has been

_presented.

The iterative

_{peeling algorithm}

includes an

_algorithm

to find an order for

_evaluating

peeling

equations.

When an order cannot be

found,

as in

_looped

_pedigrees,

an

approximate

likelihood is

supplied.

In this _case,use of a

_{partitioned computation}

of the likelihood is also crucial. Traditional recursive

_peeling

does not involve such

approximations,

because this method

_only

_computes

the exact likelihood once a

peeling

order is found and

computes

the likelihood

_{by representing}

all

_pedigree

information in terms for a

_single

individual. Usefulness of iterative

_peeling

as

an

_approximate

method in

_looped

_pedigrees

was

_{investigated by}

simulations. At an

_aggregate

level,

ie

_compared

on _averageover a number of

_replicated

data

sets,

no differences were found between

_looped

and

_{nonlooped pedigrees.}

Exact

computations

were unfeasible due to the

_large

number of

_loops

in the

_typical

animal

breeding pedigrees

we

considered,

and

_properties

of iterative

_peeling

could not be studied

_comparing

exact and

_approximated

likelihoods in individual data sets.

The iterative

_peeling

method _maybe of interest for

_application

in animal

breeding.

In human

_populations,

_pedigrees

are

_generally

small and

_loops

are not

abundant so that exact

_computations

can be considered

_using

more

_complicated

forms of

_peeling

_(see

_Cannings

et

al,

1978).

These more

_complicated

forms of

peeling

consider

_genotypes

on sets of individuals

_jointly.

_{Larger pedigrees}

and

more abundant

_looping

in animal

_{breeding, however,}

makes the sets of

_genotypes

considered

_jointly

too

_large

for exact

_computations

to be feasible.

_Therefore,

approximate

methods are

_required

for

_application

in animal

_breeding.

Iterative

peeling

seems _very

suited,

_being

exact without

_loops,

and

_{automatically supplying}

approximate

likelihoods when

_loops

are

_present.

Note

_that,

due to the

_partitioned

computation

of

_likelihood,

iterative

_peeling

also

_{automatically}

handles

_pedigrees

consisting

of

_{independent families,}

ie data

_{traditionally}

handled with sire or

sire-and-dam models. The

_equations

and

_{partitionings given}

here could be extended to

allow for more

general pedigrees.

In

particular,

allowance could be made for females

being

mated with several males. In this _case,

_{partitionings}

should accommodate for

’linking

individuals’

_being

_parents

in several

_families,

rather than

_just

one. The

monogenic

model used could also be extended to a mixed inheritance

model,

the

model

_{usually required}

for

_analysis

of animal

_breeding

data. In iterative

_{peeling only}

uni- and bivariate functions of

_genotypes

are considered on

_single

families. This can

be combined with for instance a Hermitian

_integration

_(Le

_Roy

et

al, 1989;

Knott

et

_al,

₁₉₉₂₎

to include a

_polygenic

_component.

ACKNOWLEDGMENTS

This research was

supported financially by

the Dutch Product Board for Livestock and

Meat, the Dutch

Pig

Herdbook

Society,

Bovar

_BV,

VOC Nieuw-Dalland BV, Euribrid BV

(14)

REFERENCES

Cannings C, Thompson EA,

Skolnick MH

₍₁₉₇₆₎

The recursive derivation of likelihoods

on

_{complex pedigrees.}

Adv

Appl

Prob _8,622-625

Cannings C, Thompson EA,

Skolnick MH

₍₁₉₇₈₎

_Probability

functions on

_complex

pedigrees.

Adv

_Appl

Prob

_10,

26-61

Elston

RC,

Stewart J

₍₁₉₇₁₎

A

_general

model for the

genetic analysis

of

_pedigree

data. Hum Hered 21, 523-542

Hoeschele I

₍₁₉₈₈₎

Genetic evaluation with data

_presenting

evidence of mixed

_major

gene

and

polygenic

inheritance. Theor

_Appl

Genet

_76,

81-92

IMSL

₍₁₉₈₄₎

_{Library Reference Manual,}

Edition 9.2, International and Statistical

Li-braries, Houston,

TX

Knott

_{SA, Haley CS, Thompson}

R

₍₁₉₉₂₎

Methods of

_{segregation analysis}

for animal

breeding

data: a

_comparison

of _power.

_Heredity

_68,299-311 1

Lange K,

Elston RC

₍₁₉₇₅₎

Extensions to

_{pedigree analysis}

I. Likelihood calculations for

simple

and

_{complex pedigrees.}

Hum Hered

_25,

95-105

Le

Roy P,

Elsen JM, Knott SA

₍₁₉₈₉₎

Comparison

of four statistical methods for detection

of a

_major

_{gene in}a _progenytest

_design.

Genet Sel Evol 21, 341-357

Nelder

JA,

Mead R

₍₁₉₆₅₎

A

simplex

method for function minimization.

_Comp

J 7,

147-151

Titterington

DM, Smith

AFM,

Makov EU

₍₁₉₈₅₎

Statistical

_{Analysis of}

Finite Mixture Distributions.

Wiley

and

Sons,

New

_York,

NY

Van Arendonk

_JAM,

Smith

_{C, Kennedy}

BW

₍₁₉₈₉₎

Method to estimate _genotype

probabilities

at individual loci in farm livestock. Theor

_Appl

Genet

_78,

735-740 Wilks SS

₍₁₉₃₈₎

The

_{large sample}

distribution of the likelihood ratio for