Genetic variation of traits measured in several environments. II. Inference on between-environment homogeneity of intra-class correlations

(1)

HAL Id: hal-00894084

https://hal.archives-ouvertes.fr/hal-00894084

Submitted on 1 Jan 1995

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Genetic variation of traits measured in several

environments. II. Inference on between-environment

homogeneity of intra-class correlations

C Robert, Jl Foulley, V Ducrocq

To cite this version:

(2)

Original

article

Genetic

variation

of traits measured

in

several

environments.

II. Inference

on

between-environment

homogeneity

of

intra-class

correlations

C Robert

JL

_Foulley

V

_Ducrocq

Institut national de la recherche agronomique, station de

g6n6tique quantitative

et

_appliquee,

centre de recherche de

_{Jouy-en-Josas,}

78352

_{Jouy-en-Josas cedex,}

Rance

(Received

28

_{April 1994; accepted}

26

_September

₁₉₉₄₎

Summary - This paper describes a further contribution to the

problem

of

_testing

homo-geneity

of intra-class correlations among environments in the case of univariate linear

models,

without

_making

any

assumption

about the

genetic

correlation between

environ-ments. An iterative

_{generalized expectation-maximization}

_(EM)

_algorithm,

as described in

_Foulley

and

_Quaas

_(1994),

is

_presented

for

computing

restricted maximum likelihood

(REML)

estimates of the residual and

_{between-family}

_componentsof variance and

co-variance. Three different

_{parameterizations}

_(cartesian,

_polar

and

_spherical

_coordinates)

are

_proposed

to _computeEM-REML estimators under the reduced

_(constant

intra-class correlation between

_{environments)}

model. This

_procedure

is illustrated with the

_analysis

of simulated data.

heteroskedasticity

/

parameterization

/

intra-class correlation

_/

expectation-maximization

_/

restricted maximum likelihood

Résumé - Variation _génétiquede caractères mesurés dans _plusieursmilieux. II. Infé-rence relative à des corrélations intra-classe constantes entre milieux. Cet article décrit

une

_approche

_permettantd’estimer les composantes de variance-covariance entre milieux dans le cas de corrélation intra-classe

_homogènes

entre

_{milieux, sans faire d’hypothèse}

sur

les corrélations

_génétiques

entre milieux pris 2 à 2. Un

_{algorithme itératif}

d’espérance-maximisation

_(EM),

_comparable

à celui décrit _par

_Foulley

et

_Quaas

_(1994),

est

_proposé

pour calculer les estimations du maximum de vraisemblance restreinte

_(REML)

des com-posantes résiduelles

_{et familiales}

de variance covariance. Trois

_{paramétrisations différentes}

(coordonnées

cartésiennes, polaires

et

_sphériques)

sont

_proposées

pour calculer les esti-mateurs EM-REML sous le modèle réduit

(les

corrélations intra-classe sont

_supposées

toutes

_égales

à une même

constante).

Cette

_procédure

est illustrée par

l’analyse

de données simulées.

hétéroscédasticité

_/

_{paramétrisation}

_/

corrélation intra-classe

_/

(3)

INTRODUCTION

Statistical

_procedures

based on the

_theory

of the

_generalized

likelihood

_ratio,

previously proposed by Foulley

et al

_(1994),

Shaw

₍₁₉₉₁₎

and Visscher

_(1992),

have been

_applied

to test the

_homogeneity

of

_genetic

and

_phenotypic

_parameters

against

Falconer’s

₍₁₉₅₂₎

saturated model. In

_particular,

Robert et al

₍₁₉₉₅₎

have described a

procedure

for

_estimating

_components

of variance and covariance

between environments and for

_testing

the

_homogeneity

of the

_following

_parameters:

(a)

a constant

_genetic

correlation between

_{environments;}

and

_(b)

constant

_genetic

and intra-class correlations between environments.

The

_objective

of this article is to

_present

a

procedure

for

dealing

with

homo-geneous intra-class correlations among environments without

making

any

as-sumption

about the

_genetic

correlations between environments. The method is based on restricted maximum likelihood estimators

(REML)

and on a

general-ized

_{expectation-maximization}

_(EM)

_algorithms

as

_proposed

initially by Foulley

and

_Quaas

₍₁₉₉₄₎

for heteroskedastic univariate linear models. Three

parameteri-zations of variance-covariance

_components

are

_suggested

for

_solving

this

_problem.

A simulated

_example

is

_presented

to illustrate this

_procedure.

THEORY

A model often used to deal with

_genotypic

variation in different environments is the

_2-way

crossed

genotype

(random)

x environment

_(fixed)

linear model with interaction. In

_particular,

this model has been

_proposed

as an alternative to a

multiple-trait

approach

when variance and covariance

_components

are

_homogeneous

and

_genetic

correlations between environments are

_positive

(Foulley

and

Henderson,

1989).

It has also been

_{employed by}

Visscher

₍₁₉₉₂₎

to

_study

the _powerof likelihood ratio tests for

_{heterogeneity}

of intra-class correlations between environments when

genetic

correlations among them are assumed

_equal

to

_unity.

The aim of this paper

is _{to go}one

_step

further in

_addressing

the same

_problem

with the same model but with a

_{heterogeneous}

structure of variance-covariance

components.

The full model

Let us assume that records are

_generated

from a cross-classified

_layout.

The model is defined as follows:

where It

is the

_{mean, h}

_i

is the fixed effect of the ith environment:

_{a Si sj is}

the random

_{family j}

contribution such that

_{s! !}

_NID(0,1)

and

_Q

_s

_v

is the

_family

variance for records in the ith

_environment;

_0’!;!!,

is the random

_family

x

(4)

Model

_[1]

can be written more

generally

_using

matrix notation as:

where _Yiis a

(n

2

x

₁₎

vector of observations in environment

_i;

₁₃

is a

_(p

x

₁₎

vector of fixed effects with incidence matrix

_X

_i

_;

_ui

=

(s) )

and

u2

=

{h,s !

} are

2

_independent

random normal

_components

of the model with incidence matrices for standardized effects

_Zit

and

Z

2i

respectively;

_{cr! !}

and

_Q

_u

₂

_,.

are the

corresponding

_components

of

_{variance, pertaining}

to stratum i and ei is

the

vector of residuals for stratum i assumed

_{N( 0 , a}

_f

_{l, In, ) .}

The reduced model

The null

_hypothesis

_(H

_o

₎

consists of

_assuming

_homogeneous

intra-class correlations between environments

_{(ie, d i, ti}

=

(a;i +a!8i) / (!9!+!hsi+!e!)

= t). The

variance-covariance structure of the residual is assumed to be

_diagonal

and heteroskedastic. Under model

_[I],

this

_hypothesis

is tantamount to

_assuming

a constant ratio of variances between environments: V

_i,

_afl

_/

_(as.

+

a!8i)

=

8

2 ,

where 8 is a constant. Under this

_hypothesis,

3 different

parameterizations

will be considered to solve this

_problem.

Cartesian coordinates

where 6 is a

_positive

real number. Polar coordinates

where pi and 6 are

_positive

real numbers.

Spherical

coordinates

where

!2

is a

_positive

real number. Under this

_{parameterization}

6’

=

tan’

a.

An EM-REML

_algorithm

A

_{generalized expectation-maximization}

_(EM)

_algorithm

to

_compute

REML

(5)

where y is the set of estimable

parameters

for each of the 3 models

_(under

each

parameterization

considered).

Ei

l

[.]

represents

the conditional

_expectation

taken with

_respect

to the distribution of fixed and random effects

_given

the data vector and

y =

y[

t

].

Ei

l

(.!

can be

_expressed

as a function of bilinear forms and a trace of

parts

of the inverse coefficient matrix of the mixed-model

_equations

_(as

described in

_Foulley

and

_Quaas,

_1994).

So,

for each

_{parameterization,}

we derive function

_[3]

with

respect

to each

parameter

of y

and we solve the

resulting

_system

8Q(Yly[t])

/

9 y

= 0. After

some

_algebra

and

_using

the method of

’cyclic

ascent’

(Zangwill, 1969),

we obtain

the 3

_{following algorithms.}

For model

_[2]

and

_using

cartesian

_coordinates,

the

_algorithm

at iteration

_[t,

I

_+1]

can be summarized as follows. Let

8 2ft

,

l]

,

0 ,[t,l]

and

Q!t2!!.

be the values at iteration

[t, 1].

The next iterates are obtained as:

0

![tlc+i1

is the

only

_positive

root of the

following

cubic

_equation:

with

(6)

with

For model

_[2]

and

_{polar coordinates,}

the

_algorithm

at iteration

_!t,

I +

_1]

can be

summarized as follows. Let

8

2 [

t

,

1 ],

ll

p

,

ft

and

0&dquo; !

be the values at iteration

_{[t, I].}

The next iterates are obtained as:

v

.

p!t,l+11

_{is the}

_{only positive}

_root_{of the}

_{following quadratic}

_equation:

with:

.

0i’!!U

is the solution of _the

_equation

7-!!

=

tan(!’!!/2)

where

Z

f

t

,

t+11

is

the

_only

_positive

root of the

_{quartic equation:}

(7)

For model

_[2]

and

_spherical

coordinates,

the

_algorithm

at iteration

_[t,

l +

_1]

can

be summarized as follows. Let

1/1l

t

,

l]

,

o

pi

’

and

al!,4

the values at iteration

[t, l!.

The

next iterates are obtained as:

9

1/

1l

t

,

l+1]

is the

only

positive

root of the

following

quadratic

equation:

with:

.

a!t,!+1!

_{is the}solution of the

_equation

,!!t’t+1!

=

tan!(a!-’+!/2)

where

xi’!!U

is the

only positive

root of the cubic

_equation:

(8)

The convergence of the EM-REML

procedure

is measured as the norm of the

vector of

_changes

in variance-covariance

_components

between iterations. In our

simulation and for the 3

_{parameterizations,}

_{convergence is assumed when the}norm

is less than

10-

6 .

In

_practice,

the number of inner iterations is reduced to

_only

one in the method of

_’cyclic

ascent’. The

_algebraic

solution of

_quadratic,

cubic or

quartic equations, using

the discriminant

_method,

demonstrates that each time

_only

one root is

_possible

in the

_parameter

_space.In the simulated

_example,

the

_polar

parameterization

converged

the fastest.

Testing

procedure

Let

_L(y;

_y)

be the

_{log-restricted}

_likelihood,

F be the

_complete

_parameter

_space and

_r

_o

a subset of it

_pertaining

to the null

_{hypothesis H}

_o

_.

_H

_o

is

_rejected

at the level a if the statistic

((y) =

_2Max

_r

_L(y;

_{y) - 2Maxr}

_o

_L(y;

_y)

exceeds

_(o

where

₍

₀

corresponds

to

_{Pr[X2 r , > (}

_o]

= _a

(

X2

is

the

_chi-square

distribution with r

_degrees

of freedom

_given

_by

difference between the number of

_parameters

estimated under the full and the reduced

_models).

Formulae to evaluate

_{-2MaxL(y; y)}

can

_easily

be made

_explicit:

where B is the coefficient matrix of the mixed-model

_equations.

NUMERICAL EXAMPLE

This

_procedure

is illustrated from a

_hypothetical

data set

_{corresponding}

to a

balanced,

crossed

_design

with 3

_{environments,}

20 families per environment and

50

_replicates

_per

_family

_(p

=

_3,

_s= 20 and n =

50).

The 20 families were

randomized within each environment. Basic ANOVA statistics for the

between-family

and

_{within-family}

sums of _squaresand

cross-products

are

_given

in table I.

Table II

_presents

the estimation of

_genetic

and residual

parameters

under the full and reduced

_(hypothesis

of a constant intra-class correlation between

_{environments)}

models

_{respectively,}

and the likelihood ratio test of the reduced model

_against

the full model. The P values in table II indicate that there are no

_significant

differences

(9)

*

1,2,3

3 = the 3 environments.

8

Sums of

cross-products

between

_{families: n !(y2 j. -}

_{!/t..)(yt’?. !}

_Yi

_’

_..)

8 n j=1 8 n

Sums of _squareswithin

families: L L(Yijk -

Yijf

2

j=1 k=1

DISCUSSION AND CONCLUSION

In this _paper,estimation and

_testing

of

_homogeneity

of intra-class correlations among environments have been studied with heteroskedastic univariate linear models. Another

_{possible approach}

to account for

_’genotype

x environment’ effects

would be to consider the

_{multiple-trait}

linear

_approach,

defined

_by

Falconer

_(1952).

As described

_hereafter,

these 2

_approaches

_{may or}may not be

equivalent.

In this

discussion,

the conditions

required

to have

_equivalence

between the

_{multiple-trait}

and the univariate linear models will be established.

In Falconer’s

_approach,

_expressions

of the trait in different environments

_{(i, i’)}

are those of 2

_genetically

correlated

_traits,

with a coefficient of correlation

d(i,

i’),

Pii

’ =

!s!!,

/

aBaB.,.

The model is defined as follows:

where _lJ2!k

_is

the

_performance

of the kth individual

_(k

=

_{1, 2, ... ,}

n)

of the

jth family

(j

=

1,2,...,

s)

evaluated in the ith environment

(i = 1, 2, ... , p);

b

ij

is the random effect of the

_{jth family}

in the ith

_environment,

assumed

_normally

distributed such that

_Var(b

_ij

₎

=

a

1 i,

Cov(b

ij

,

bi!!)

= a

Biil for i

7! i’

and

Cov(bi!, bi.!!)

= 0

for j #

j’

and _{any i}and

_i’;

_ljk is a residual effect

_pertaining

to the kth individual in the subclass

_ij,

assumed

_normally

and

_{independently}

distributed with mean zero and

variance

_{o,2 wi}

Under the

_hypothesis

of

_homogeneity

of intra-class correlations between

(10)

a

Likelihood ratio _test;

_b

_degrees

of freedom =

_2;

* same EM-REML estimates under the

multiple

trait

_approach.

number of

parameters.

Model

_[1]

has

_[2p

+

_1]

genetic

and residual

_parameters

and model

_[4]

has

_[(p(p

+

_1)/2)

+

_1]

parameters.

For p =

_3,

whatever the

hypotheses considered,

even

_though

these 2 models have

the same number of estimable

parameters,

the

parameter

spaces are not

_exactly

the same. Two conditions must be added to

_satisfy

the

equivalence

between the

multiple-trait

and the univariate linear models. The univariate linear model does

not allow the estimation of a

_{negative genetic}

correlation between

_{environments,}

since it is a ratio of variances.

_Thus,

we have the

_following

condition:

(11)

Then we have:

and

By definition,

_or

₂

_Si

and

_a!8i

are

_positive

_parameters,

so the

_following

relation must

be satisfied:

&dquo; &dquo;

It is worth

_noticing

that the condition in

_[6]

means that the

_partial

_genetic

correlation between any

pair

( j, k)

of environments for environments i fixed is also

positive.

The

problem

of

_testing

_homogeneity

of intra-class correlations between

environ-ments was

_finally

solved under 3 different

_assumptions

about the

_genetic

correla-tions between environments:

_equal

to one

_{(Visscher, 1992);}

constant and

_positive

(Robert

et

_al,

_1995);

and

_{just positive}

_{(this work).}

For more than 3

_traits,

model

[1]

is no

_{longer equivalent}

to the

_multiple

trait

approach

of Falconer. As a matter of

fact,

it

_generates

fewer

parameters

than

!4!,

2p

vs

p(p

+

_1)!2

for

_[1]

and

_[4]

_{respectively.}

This

parsimony

might

be an

_{interesting feature,}

because the difference in numbers of

_parameters

increases with the number of traits considered

_(eg,

10 vs

15

_parameters

for 5

_traits).

_Comparison

of

_approaches

on real

genetic

evaluation

problems

such as sire evaluation of

dairy

cattle in several countries would be of

great

interest.

REFERENCES

Falconer DS

₍₁₉₅₂₎

The

problem

of environment and selection. Am Nat

_86,

293-298

Foulley JL,

Henderson CR

₍₁₉₈₉₎

A

simple

model to deal with sire

_by

treatment

interactions when sires are related. J

Dairy

Sci _72,167-172

Foulley JL, Quaas

RL

₍₁₉₉₄₎

Statistical

analysis

of

heterogeneous

variances in Gaussian linear mixed models. Proc 5th World

_Congress

Genet

_Appl

Livest

Prod,

Univ

_Guelph,

Guelph, ON, Canada,

18, 341-348

Foulley JL,

Hébert

D, Quaas

RL

₍₁₉₉₄₎

Inference on

_homogeneity

of

_{between-family}

components of variance and covariance among environments in balanced cross-classified

designs.

Genet Sel

_{Evol 26,}

117-136

Lawley DN,

Maxwell AE

₍₁₉₆₃₎

Factor

_Analysis

as a Statistical Method. Butterworths Mathematical

Texts, London,

UK

Robert

_{C, Foulley JL, Ducrocq}

V

₍₁₉₉₅₎

Genetic variation of traits measured in several environments. I. Estimation and

testing

of

_homogeneous

and intra-class correlations between environments. Genet Sel

Evol 27,

111-123

Shaw RG

₍₁₉₉₁₎

The

comparison

of

_{quantitative genetic}

_parametersbetween

_populations.

Evolution

_45,

143-151

Visscher PM

₍₁₉₉₂₎

On the power of likelihood ratio tests for

detecting heterogeneity

of intra-class correlations and variances in balanced half-sib

_designs.

J

_Dairy

Sci

_73,

1320-1330

Zangwill

(1969)

Non-Linear

Programming:

A