Statistical analysis of ordered categorical data via a structural heteroskedastic threshold model

(1)

HAL Id: hal-00894134

https://hal.archives-ouvertes.fr/hal-00894134

Submitted on 1 Jan 1996

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Statistical analysis of ordered categorical data via a

structural heteroskedastic threshold model

Jl Foulley, D Gianola

To cite this version:

(2)

Original

article

Statistical

_analysis

of

ordered

_categorical

data via

a

structural heteroskedastic

threshold model

JL

_Foulley

D

Gianola

2

1

Station de

_génétique

quantitative et

_appliquée,

Institut national de la recherche

agronomique,

centre de recherches de

_Jouy,

78352

Jouy-en-Josas cedex, France;

2

Department of

Meat and Animal

_{Science, University of Wisconsin-Madison,}

Madison, WI 53706,

USA

(Received

2 October

_{1995; accepted}

25 March

₁₉₉₆₎

Summary - In the standard threshold

_model,

differences _amongstatistical

subpopulations

in the distribution of ordered

_{polychotomous}

_responsesare modeled via differences in

location _parametersof an

_underlying

normal scale. A new model is

_{proposed whereby}

subpopulations

can also differ in

_dispersion

(scaling)

_parameters.

_{Heterogeneity}

in such parameters is described

_using

a structural linear model and a

_loglink

function

_involving

continuous or discrete covariates. Inference

(estimation,

_{testing procedures, goodness}

of

fit)

about parameters in fixed-effects models is based on likelihood

_{procedures. Bayesian}

techniques

are also described to deal with mixed-effects model structures. An

application

to

calving

ease scores in the US Simmental breed is

_presented;

the heteroskedastic threshold model had a better

goodness

of fit than the standard one.

threshold character

_/

_{heteroskedasticity}

_/

maximum

_likelihood/

mixed linear model

_/

calving difficulty

Résumé - Analyse statistique de variables discrètes ordonnées par un modèle à seuils hétéroscédastique. Dans le modèle à seuils

_classique,

les

_différences

de

_réponses

entre

sous-populations

selon des

_catégories

discrètes ordonnées sont modélisées _par des

différences

entre

_paramètres

de _positionmesurés sur une variable normale

sous-jacente. L’approche présentée

ici _suppose_queces

_{sous-populations diffèrent}

_{aussi par}leurs

paramètres

de

dispersion

(ou

paramètres

d’échelle).

L’hétérogénéité

de ces

paramètres

est

décrite _parun modèle linéaire structurel et une

fonction

de lien

logarithmique impliquant

des covariables discrètes ou continues.

_{L’inférence}

_(estimation,

_{qualité d’ajustement,}

test

d’hypothèse)

sur les

paramètres

dans les modèles à

effets fixes

est basée sur les méthodes du

maximum de vraisemblance. Des

_{techniques bayésiennes}

sont

_{également proposées}

_pourle

traitement des modèles linéaires mixtes. Une

_application

aux notes de

_difficultés

de

_vêlage

en race Simmental américaine est

_présentée.

Le modèle à seuils

hétéroscédastiqué

améliore dans ce cas la

_qualité

de

_{l’ajustement}

des données _par_rapportau modèle standard.

caractères à seuils

_/

hétéroscédasticité

_/

maximum de vraisemblance

_/

modèle linéaire

(3)

INTRODUCTION

An

_appealing

model for the

_analysis

of ordered

_categorical

data is the so-called

threshold model.

_Although

introduced in

_population

and

_quantitative

_genetics

by Wright

(1934a,b)

and discussed later

_by

_Dempster

and Lerner

₍₁₉₅₀₎

and Robertson

_(1950),

it dates back to Pearson

_(1900),

Galton

₍₁₈₈₉₎

and Fechner

(1860).

This model has received attention in various areas such as human

_genetics

and

_{susceptibility}

to disease

_(Falconer,

1965;

Curnow and

_{Smith, 1975), population}

biology

(Bulmer

and

_{Bull, 1982; Roff,}

_{1994), neurophysiology}

_{(Brillinger, 1985),}

animal

_breeding

_{(Gianola, 1982),}

_survey

_analysis

_(Grosbas,

_1987),

_{psychological}

and social sciences

_(Edwards

and

Thurstone, 1952;

McKelvey

and

Zavoina,

1975),

and econometrics

_(Kaplan

and

_{Urwitz, 1979;}

_Levy,

_1980;

_Bryant

and

_{Gerner, 1982;}

Maddala,

1983).

The threshold model

_postulates

an

underlying

(liability)

normal

distribution

rendered discrete via threshold values. The

_probability

of response in a

_given

_expressed

as the difference between normal cumulative distribution

functions

_having

as

_arguments

the _upperand lower thresholds minus the mean

liability

for

_{subpopulation}

divided

_by

the

_{corresponding}

standard deviation.

Usually

the location

_parameter

₍

_?7i

₎

for a

_{subpopulation}

is

_expressed

as a linear

function

77i

=

t’O

of _some

explanatory

variables

_(row

incidence vector

_ti)

_(see

theory

of

_generalized

linear

models,

McCullagh

and

Nelder, 1989;

Fahrmeir and

Tutz, 1994).

The vector of unknowns

_(e)

may include both fixed and random

effects and statistical

_procedures

have been

_developed

to make inferences about

such a mixed-model structure

_(Gianola

and

_Foulley,

1983;

Harville and

Mee, 1984;

Gilmour et

_al,

_1987).

In all these

_studies,

the standard deviations

_(also

called the

scaling

parameters)

are assumed to be known and

_equal,

or

_proportional

to known

quantities

(Foulley,

1987;

Misztal et

al,

1988).

The purpose of this paper is to extend the standard threshold model

(S-TM)

to a model

allowing

for

heteroskedasticity

(H-TM)

with

_modeling

of the unknown

scaling

parameters.

For

_simplicity,

the

_theory

will be

_{presented using}

a fixed-effects

model and likelihood

procedures

for inference. Mixed-model extensions based on

Bayesian

techniques

will also be outlined. The

_theory

will be illustrated with an

example

on

_{calving difficulty}

scores in Simmental cattle from the USA.

THEORY

Statistical model

The overall

_population

is assumed to be stratified into several

_{subpopulations (eg,}

subclasses of _sex,

_parity,

_age,

_{genotypes, etc)}

indexed

by

i =

_{1, 2, ... ,}

I

_representing

potential

sources of variation. Let J be the number of ordered _response

_categories

indexed

_by

_j,

_{and y}_i+₌

_{

_Yij+

_}

be the

_(J

x

₁₎

vector _{whose element y2!+}is the

total number of responses in

category j

for subpopulation

i. The vector y2+ can be

(4)

and

₍₃₎

I - 1 contrasts _among

_log-scaling

_parameters

_(eg,

_ln(Q

_i

_{) -}

_ln(<7i)

for

i =

2, ... , I)

or,

equivalently,

I - 1 standard deviation ratios

_{(eg, O}

_&dquo;d

_O

_&dquo;d,

with

one of these

_arbitrarily

set to a fixed value

_(eg,

_<7i =

1).

This makes a total of

2I + J - 3 identifiable

parameters,

so that the full H-TM reduces to the saturated

model for J = 3

categories,

see

_examples

in Falconer

(1960),

_chapter

18.

More

_parsimonious

models can also be envisioned. For

instance,

in a two-way crossclassified

layout

with A rows and B columns

(I

=

AB),

there _are16

additive models that can be used to describe the location

₍

_?7i

₎

and the

_{scaling (o}

_j

₎

parameters.

The

_simplest

one would have a common mean and standard deviation

for the I = AB

populations.

The _most

complete

one would include the main effects

of A and B factors for both the location and

_dispersion

parameters.

Here there are

2(A+B)+J-5

estimable

_parameters,

_ie,

J-1 thresholds

_plus

twice

_(A-1)+(B-1).

Under an additive model for the location

_parameters

_71i_,it is

possible

to fit the H-TM to

_binary

data. For the crossclassified

_layout

with A rows and B

_columns,

there are AB -

2(A

+

_B)

+ 3

_degrees

of freedom available which means that we

need A

_{(or B) !}

4 to fit an additive model

_using

all the levels of A and B at both the location and

_dispersion

levels.

_Finally,

it must be noted that in this

_particular

case,

dispersion

parameters

act as substitutes of interaction effects for location

parameters.

Estimation

Let T =

{7!}

for j

=

1, 2, ... ,

J - 1 and _a=

(

T

’, (3’,

b’)’.

In fixed-effects models

with multinomial

data,

inferences about a can be based on likelihood

procedures.

Here,

the

_log

likelihood

_{L(a; y)}

can be

_expressed,

_apart

from an additive

_constant,

as: T T

with, given

!4!,

[5]

and

_!6!,

The maximum likelihood

_(ML)

estimator of a can be

computed using

a

second-order

_algorithm.

A convenient choice for multinomial data is the

_scoring

_algorithm,

because Fisher’s information measure is

_simple

here. The

_system

of

_equations

to

solve

_iteratively

can be written as:

-

-where

_{U(a; y)}

=

<9L((x;y)/<9<x

and

J(a)

_

-E

[å2L(a;y)/åaåa’]

are the score

function and Fisher’s information matrix

_{respectively;}

k is iterate number.

Analyt-ical

_expressions

for the elements of

_{U(a; y)}

and

_J(a)

are

_given

in

Appendix

1.

These are

generalizations

of formulae

given by

Gianola and

Foulley (1983)

and

Misztal et al

_(1988).

In some

_instances,

one _mayconsider a

backtracking procedure

(Denis

and

_Schnabel,

₁₉₈₃₎

to reach _convergence,

_ie,

at the

_beginning

of the iterative

process,

compute

a!k+1!

as

a[k+1]

=

ark]

₊

,cJ[k+1]!a[k+1]

with 0 _<

w[

k+1

] ::::;

1.

A constant value of w = 0.8 has been

_satisfactory

in all the

_examples

run so far

(5)

(over

the ni observations made in

subpopulation i)

of indicator vectors yir =

(Yilr i Yi2r i... i Yijri ... i YiJr)l

such that

_!r=l

1 or 0

_depending

on whether a

response for observation

(r)

in

population

(i)

is in

(j)

or not.

Given ni

_independent

repetitions

of

Yin

the sum _y_i+_is

_{multinomially}

distributed

j

with

_{parameters n}

_i

_{= !}

_y_ij+ and

_probability

vector

_Ii

_i

=

{lIi

j

}.

j = l

In the threshold

_model,

the

_{probabilities}

_1I

_jj

are connected to the

_underlying

normal variables

_X

_ir

with threshold values _Tj via the statement

with To = -oo and _Tj₌

₊

₀₀

_,

so that there are J - 1 finite thresholds.

With

_Xi

_r

_rv

_N(!2,Q2),

this becomes:

where

_!(.)

is the CDF of the standardized normal distribution.

The mean

_liability

₍

_?

_7i)

for the ith

_{subpopulation}

is modeled as in Gianola and

Foulley (1983)

and Harville and Mee

_(1984),

and as in

_generalized

linear models

(McCullagh

and

Nelder,

1989)

in terms of the linear

_predictor

Here,

the vector

_(p

x

₁₎

of unknowns

₍₀₎

involves fixed effects

_only

and _x_iis the

corresponding

(p x 1)

vector of

qualitative

or

quantitative

covariates.

In the

H-TM,

a structure is

_imposed

on the

_scaling

_parameters.

As in

_Foulley

et al

_{(1990, 1992)}

and

_Foulley

and

_Quaas

_(1995),

the natural

_logarithm

of Qi is written as a linear combination of some unknown

_(r

x

₁₎

real-valued vector of

parameters

(6),

1

p’

being

the

_{corresponding}

row incidence vector of

_qualitative

or continuous

covariates.

Identifiability

of

_parameters

In the case of I

_{subpopulations}

and J

_categories,

there is a maximum of

_{I(J - 1)}

identifiable

_{(or estimable) parameters}

if the

_margins

ni are fixed

_{by sampling.}

These

are the

_parameters

of the so-called saturated model.

What is the most

_complete

H-TM

_(or

’full’

_model)

that can be fitted to the data

using

the

_approach

described here? One can estimate:

₍₁₎

J - 1 finite threshold

(6)

Goodness of fit

The two usual

_statistics,

Pearson’s

X

Z

and the

_(scaled)

deviance D* can be used

to check the overall

_adequacy

of a model. These are

where

_fig

=

77 tj

((x)

is the ML estimate of

1I

j

,

and

Above,

D* is based on the likelihood ratio statistic for

_fitting

the entertained

model

_against

a saturated model

_having

as _many

parameters

as there are

alge-braically

independent

variables in the data

_vector,

_ie,

_{1(J -}

₁₎

here. Data should

be

_grouped

as much as

_possible

for the

_asymptotic

_chi-square

distribution to hold in

_[9]

and

_[10]

_(McCullagh

and

_{Nelder, 1989;}

Fahrmeir and

_Tutz,

_1994).

The

_degrees

of freedom to consider here are

I(J - 1) (saturated model)

minus

((J - 1)

+

_rank(X)

+

_rank(P)]

_(model

under

_study),

where X and P are the

inci-dence matrices for

(3

and b

_{respectively.}

It should be noted

that

_[9]

and

_[10]

can be

computed

as

particular

cases of the _power

divergent

statistics introduced

_by

Read

and Cressie

_(1984).

Hypothesis testing

Tests of

_hypotheses

_{about y}=

6’)’

can be carried out via either Wald’s test or

the likelihood

ratio

_(or

_deviance)

test. The first

procedure

relies on the

_properties

of

_consistency

and

_asymptotic

_normality

of the ML estimator.

For linear

_hypotheses

of the form

_H

_o

_:

_K’y

= m

_against

its alternative

_H

_l

=

H

o

,

the

Wald

statistic

is:

which under

_H

_o

has an

_asymptotic

_chi-square

distribution with

rank(K) degrees

of

freedom.

_Above,

_r(y)

is an

_appropriate

block of the inverse of Fisher’s information

matrix evaluated at _y=

y

, where

y

is the ML estimator.

The likelihood ratio statistic

_(LRS)

allows

_testing

nested

_hypotheses

of the form

Ho :

y E

_no

_against

_H

₁

_:

_y - E n

no

where

no

and ,f2 are the restricted and

unrestricted

_parameter

_spaces

_{respectively,}

_pertaining

to

_H

_o

and

_H

_o

U

_H

_l

_.

The

LRS is:

where

y

and

_{y are}

the ML estimators of y under the restricted and unrestricted

models

_{respectively.}

The criterion

!#

also can be

_computed

as the difference in

(7)

This is

_equivalent

to what is

_usually

done in ANOVA

_except

that residual sums of squares are

replaced

_by

deviances.

Under

_H

_o

_,

A#

has an

_asymptotic

_chi-square

distribution with r =

dim(D)

-

dim(

Do

)

degrees

of freedom. Under the same null

hypothesis,

the Wald and LR

statistics

have the same

_asymptotic

distribution.

_However,

Wald’s

statistic

is based on a

_{quadratic approximation}

of the

_{loglikelihood}

around its maximum.

Including

random effects

In many

applications,

the

_y

_i

_{r ’s}

cannot be assumed to be

_{independent repetitions}

due to some cluster structure in the data. This is the case in

quantitative

_genetics

and

animal

_breeding

with

_genetically

common environmental effects

and

_repeated

measurements on the same individuals.

Correlations can be accounted for

_conveniently

via a mixed model structure on

the

_’T7!S,

written now as

where the fixed

component

x!13

is as

before,

and u is a

(q

x

₁₎

vector of Gaussian

random effects with

_{corresponding}

incidence row vector

_zi.

For

_simplicity,

we will consider a _one-wayrandom

model, ie,

u !

N(O,

Ao,

u

2 )

(A

is a

positive

definite matrix of known elements such as

kinship

coefficients),

but the extension to several

_u-components

is

_{straightforward.}

The random

_part

of the location is rewritten as in

_Foulley

and

Quaas

(1995)

as

Z!O&dquo;Ui u*

where u* is a

vector

of standard normal

_deviation,

_{and a}_u_,is the _squareroot of the

_u-component

of

_variance,

the value of which _maybe

_specific

to

_{subpopulation}

i. For

_instance,

the sire variance _{may vary}

_according

to the environment in which the progeny of the

sires is raised.

_Furthermore,

it will be assumed that the ratio

_0’

_{.,,, /a}

_i

_,

_{where a}

_i

is now

the residual

variance,

is constant across

subpopulations.

In a sire

by

environment

layout,

this is tantamount to

_assuming

_homogeneous

intraclass correlations

_(or

heritability)

across

_{environments,}

which seems to be a reasonable

assumption

in

practice

(Visscher, 1992).

Thus,

the

_argument

_h2!

of the normal CDF in

_[4]

and

_[7]

becomes

where _p=

<

7 u,

/

<T:.

In the fixed

_model,

_parameters

T,

(3

and b were estimated

_by

maximum

likelihood. Given p, a natural extension would be to estimate these and u*

_by

the mode of their

_{joint posterior}

distribution

_(MAP).

To mimic a mixed-model

structure,

one can take flat

_priors

on _T_,

₁₃

and b. The

only

informative

_prior

is then

on u

*

,

ie,

u* rv

_{N(O, A).}

Thus

MAP solutions can be

_computed

with minor modifications from

_[8].

The

_only

changes

to

_implement

are to

_replace:

_(i)

a =

’, (3’,

T

(

6’)’

_by

0 =

(

T

’,

0, 6’,

u#’)’

with

u#

=

_pu

_*

_;

(ii)

X

=J;xi,X2,...,x!,...,x//

by

S =

(S1, · .. , Si, ... , SI)’

with

S

’

=

(8)

to the random effects on the left hand side

and _p-

2 A -

lU[k]

to the

_u#-part

of the

right

hand side

_(see

_Appendix

_1).

A test

_example

is shown in

_Appendix

2.

A further

_step

would be to estimate _p

_using

an EM

_marginal

maximum likelihood

procedure

based on

This _mayinvolve either an

_approximate

calculation of the conditional

expecta-tion of the

_quadratic

in

u#

as in Harville and Mee

(1984),

Hoeschele et al

₍₁₉₈₇₎

and

Foulley

et al

_(1990),

or a Monte-Carlo calculation of this conditional

_expectation

using,

for

_example,

a Gibbs

_sampling

scheme

_{(Natarajan, 1995).}

Alternative pro-cedures for

_estimating

_p

_might

be also

_envisioned,

such as

the

iterated

_re-weighted

REML of

_Engel

et al

_(1995).

NUMERICAL APPLICATION

Material

The data set

_analyzed

was a

_contingency

table of

_calving

_difficulty

scores

_(from

1 to

₄₎

recorded on

purebred

US-Simmental cows distributed

according

to

sex of calf

(males, females)

and _ageof dam at

_calving

_{in years.}Scores 3 and 4 were

_pooled

on account of the low

frequency

of score 4. Nine levels were considered for _age

of dam: < 2 years,

2.0-2.5, 2.5-3.0, 3.0-3.5, 3.5-4.0, 4.0-4.5, 4.5-5.0, 5.0-8.0,

and

> _{8.0 years. In the}

_analysis

of the

_{scaling parameters,}

six levels were considered

for this factor: < 2 _years,

_{2.0-2.5, 2.5-3.0, 3.0-4.0, 4.0-8.0,}

and > _{8.0 years. The} distribution of the 363 859 records

_by

_{sex-age of}dam combinations is

_displayed

in table

I,

as well as the

frequencies

of the three

_categories

of

_calving

scores. The raw data reveal the usual

pattern

of

_{highest calving difficulty}

in male calves out of

younger dams.

However,

more can be said about the

_phenomenon.

Method

Data were

_analyzed

with standard

(S-TM)

and heteroskedastic

(H-TM)

threshold

models. Location and

_scaling

parameters

were described

_using

fixed models

involv-ing

sex

(S)

and age of dam

(A)

as factors of variation. In both _cases,inference was

based on maximum

likelihood

procedures.

A

_log-link

function was used for

_scaling

parameters.

With J = 3

categories,

the _most

highly parameterized

S-TM

that

can be

fitted

for the location structure includes J - 1 = 2 threshold values

(or, equivalently,

the

difference between thresholds

₍

₇₂

_-

₇₁

₎

and a baseline

_population

effect

/

-

l

),

_plus

sex

(one contrast),

age of dam

(eight contrasts)

and their interaction

(eight

contrasts)

as elements of

_(3;

this

gives r(X)

= 17 which

yields

19 _asthe total number of

(9)

The H-TM to start with was as in the S-TM for location

_parameters

_(3.

With

respect

to

_dispersion

_{parameters 6,}

the model was an additive _one,with sex

_(S

_*

_:

one

_contrast)

and _ageof dam

_(A

_*

_:

five

_contrasts)

so that

_r(P)

= 6

(

Q

= 1 in

male calves and < _{2.0 year}old

_dams).

Thus,

the total number of

parameters

was

19 + 6 = 25

and,

the

df

_were

equal

_to36 - 25 = 11.

RESULTS

All factors considered in the S-TM were

significant

(P

<

_0.01),

_especially

the sex

by

age of dam interactions

(except

the first one, as shown in table

_II).

_Hence,

the

model cannot be

_simplified

further. This means that differences between sexes were

not constant across _ageof dam

_subclasses,

_contrary

to results of a

_previous

_study

in Simmental also obtained with a fixed S-TM

(Quaas

et

_al,

_1988).

Differences

in

_liability

between male and female calves decreased with _ageof dam. However the S-TM did not fit well to the

data,

as the Pearson statistic

(or

deviance)

was

X

2

= 419 on 17

_degrees

of

freedom,

_resulting

in a nil P-value. An examination of

the Pearson residuals indicated that the S-TM leads to an underestimation of the

(10)

(11)

As shown in table

_II,

_fitting

the H-TM decreased the

X

2

and deviance

_by

a

factor of 20 and led to a

satisfactory

fit. The

significance

of _manyinteractions

vanished,

and this was reflected in the LRS

(P

<

_0.088)

for the

_hypothesis

of no

S x A interaction in the most

_{parameterized}

model. Several models were tried and

tested as shown in table III. The

_scaling

_parameters

_depended

on

the

_ageof the

dam,

the

effect

of sex

_being

not

_significant

_(P

<

_0.163).

Relative to the baseline

population,

the standard

deviation increased

_by

a factor of about

_{1.05, 1.15, 1.25,}

1.40 and 1.50 for cows of

_{2.0-2.5, 2.5-3.0, 3.0-4.0, 4.0-8.0,}

and > _{8.0 years}of _age

at

_{calving respectively (table IV).}

The H-TM made differences between sex liabilities across _agesof dam

practically

constant as the

interaction

effects were

negligible

relative to the main effects. The difference between male and female calves was about 0.5.

_Eventually,

a model

(12)

Logistic

heteroskedastic models have been considered

_{by McCullagh (1980)}

and

Derquenne

(1995).

Formulae are

_given

in

_Appendix

1 to deal with this distribution.

When the Simmental data are

_analyzed

with the

_logistic

(table VI),

the

(13)

data. Wald’s and deviance statistics were in _very

good

_agreement

in that

_respect,

with P values of 0.08 and 0.16

_{respectively,}

for the SA interaction. It should be observed that this heteroskedastic model has even fewer

parameters

₍₁₆₎

than the

two-way

S-TM considered

initially

(19 parameters).

In

_spite

of

this,

the Pearson’s

chi-square

(and

also the

_deviance)

was reduced from about 419

(table

II)

to 32

(table V)

with a P-value of 0.04. This fit is remarkable for this

_large

data set

(N

=

363859),

where _onewould

_expect

_manymodels to be

rejected.

Although

the H-TM _mayhave

_captured

some extra hidden variation due to

ignoring

random

_effects,

it is

_unlikely

that the _poorfit of the S-TM can be attributed

solely

to the

_{overdispersion phenomenon resulting}

from

_{ignoring genetic}

and other

clustering

effects. The

_large

value of the ratio of the observed

X

2

to its

_expected

value

_(419/17

=

25)

suggests that

the

dependency

of the

_{probabilities}

_77,!

with

respect

to sex of calf and _ageof dam is not described

_{properly by}

a model

with

constant variance. Whether the poor fit of the S-TM is the result of

ignoring

random

effects,

heterogeneous variance,

or

_{both, require}

further

_{study, perhaps}

simulation.

These

results

_suggest

that in beef cattle

_breeding

the

_goodness

of fit of a constant

variance threshold model for

_calving

ease can be

_improved

by

_{incorporating}

scale

effects _{for age}of dam either as discrete

_classes,

as in this

study,

or

alternatively

Qi

as a

_polynomial

_regression

of

_log

ai on _age.

DISCUSSION

Other distributional

_assumptions

The

_underlying

distribution was

_supposed

to be normal which is a standard

assumption

of threshold models in a

_genetic

context

_{(Gianola, 1982).}

_However,

other

distributions

_might

have been considered for

_modeling

_77!

in

_!3!.

A classical

choice,

especially

in

_{epidemiology,}

would be the

_logistic

_{distribution with mean 7}_7i

(14)

probit.

Interestingly,

there is not much difference between the

_complete

_(S+A+SA)

and the additive model

_(S

+

_A),

the interaction

_(SA)

_{being non-significant}

_(P

=

0.30). Taking

into account the variation in variance in addition to that

_explained

by

the additive model on location

_parameters

does not

improve

the fit

_greatly.

In that

_respect,

the main source of variation turned out to be sex rather than _ageof

dam.

Other

_options

include the t-distribution

_(Albert

and

_Chib,

_1993),

the

_Edgeworth

series distribution

_{(Prentice, 1976)}

and other non-normal classes of distribution functions

_{(Singh, 1987).}

In

_fact,

the t distribution

_tv(!2,

_s

₂

₎

with

_spread

_parameter

8

2

and v

degrees

of freedom is the

marginal

distribution of a mixture of a normal

distribution

_!V(!,<7?),

with

_Q2

_randomly

_varying

_according

to a scaled inverted

X

2 (v,s

2 )

distribution (Zellner,

1976).

Therefore,

a threshold model based on such a t distribution is embedded in our

_{procedure by taking}

a _one-wayrandom model for

_Incr?,

_ie,

_ln<r?

=

In

8

2

₊_a_i with the

_density

function

p(a

i

lv)

of the random

variable ai as

_presented

in

_Foulley

and

_(auaas

_(1995,

formulae 21 and

_22);

₍₁₉₉₆₎

for a

_{specific study}

of the threshold model based on

the t-distribution in animal

_breeding.

Relationships

with variable thresholds

Conceptually, heterogeneity

of the

_a§s

is viewed here in the same way as in

Gaussian linear models since it

_applies

to an

_underlying

random variable that is

normally

distributed.

However,

the

_underlying

variables are not

observable,

and

the

_{corresponding}

real line includes cutoff

_points,

the

_thresholds,

that make the

outcomes discrete. It is of interest to address the

question

of how

changes

in

dispersion

can be

interpreted

with

_respect

to the threshold

concept.

Let us illustrate this

_by

a

_{simple example involving}

J = 3

categories,

and _a

one-way classification model

(i

=

1, 2, ... , I)

as, for

example,

sex of calf in the Simmental

breed. We will assume that the

_origin

is at the first

_threshold,

and that the unit of measurement is the standard deviation within males

₍

_QM

_{= 1).}

The difference

(15)

where

II

M1

,

lI

M2

are the

_{probabilities of response}

in the first and second

categories,

respectively,

for male calves. A similar

_expression

is

obtained

for female calves

_(F),

so

This is

_precisely

the ratio of the difference between thresholds 1 and 2 that would be obtained when evaluated

_separately

in each sex. Thus

Formulae

_[19]

and

_[20]

are

_analogous

to

_{expressions given by}

Wright

_{(1934b) (the}

reciprocal

of the distance between the thresholds on this scale

_gives

the standard

deviation on the

_postulated

scale on which the

thresholds

are

_{separated by}

a unit

distance,

p

545)

and Falconer

(1989,

formula

18.5,

p

307)

except

that these authors

set to

_unity

the difference between thresholds in the baseline

_population,

rather than the standard

deviation,

which we find more

_appealing

_{conceptually.}

In the case of the Simmental data shown in

table

I,

applying

formulae

_[19]

and

[20]

using

observed

frequencies

of _responses

_gives:

If more than three

_categories

are

_observed,

this formula also holds for the

differences T3 - 72, T4 - 73,..., - -17JTJ-2, so that the ratio between standard deviations in

_{subpopulation}

_(i)

and a reference

_population

(R)

can be

_expressed

as:

which involves

_{(J - 2)}

_{algebraically}

_independent

_equalities.

In the case of three

_categories

and a

_single

classification,

the saturated model

has 21

_parameters

₍

_T2

_-

_T

_{i ,}

_,₁_J.L _U2_{-. - ;/!7)}

_{<!2/<7’i) - - -}

_ai!a1,

... ,

arla1).

Numerical

values

_{of a}

_i!

_a

₁

_computed

from

_[20]

are also ML estimates

_{(eg, â}

_2/

_¡

_h

=

0.973).

Formula

_[21]

indicates that there is a link between H-TM and models with

variable thresholds

_{(Terza, 1985).}

As

_compared

to

_these,

the main features of the H-TM are:

i)

a

multiplicative

model on ratios of standard deviations or differences between

thresholds,

rather than a linear model on such

differences;

ii)

a lower dimensional

_{parameterization}

due to the

_{proportionality}

_assumption

made in

_[18]

rather

than a

category-specific

_{parameterization,}

ie:

where

_{6 j}

is the vector of unknowns

_pertaining

to the difference

_{(T! -}

_Ti

_).

For J =

3,

the _twomodels

_generate

the _samenumber of

_parameters

but

they

_are

(16)

Further extensions

The H-TM opens new

_perspectives

for the

analysis

of ordinal _responses.

Interesting

extensions _mayinclude:

i) implementing

other inference

_approaches

for mixed models such as Gilmour’s

procedure

based on

quasi-likelihood,

or a

fully Bayesian analysis

of

parameters

using

Monte-Carlo Markov-Chain methods

_along

the lines of Sorensen et al

_(1995);

ii)

assessing

the

_potential

increase in response to selection

by selecting

on

estimated

breeding

values calculated from an H-TM versus an

S-TM;

iii)

incorporating

a mixed linear model on

_{log-variances}

as described in San Cristobal et al

₍₁₉₉₃₎

and

_Foulley

and

_Quaas

₍₁₉₉₅₎

for Gaussian

_{observations;}

iv)

carrying

out a

_joint

_analysis

of continuous and ordered

polychotomous

traits

as

already proposed

for the S-TM

_{by Foulley}

et al

_(1983),

Janss and

_{Foulley (1993)}

and Hoeschele et al

_(1995).

Further research is also needed at the theoretical level to look at the

_sampling

properties

of estimators based on

_{mis-specified}

models. For

_instance,

one _maybe

interested

in the

_asymptotic

_properties

of the ML estimator of

₍

_T

_{’, 0’,}

_6’l’

derived under the

_assumption

of

_independence

of the

_y

_i

_,’s

when this

_hypothesis

does not hold. This

_problem

has been discussed in

_{general by}

White

_(1982),

and it may be

conjectured

from the results of

_Liang

et al

₍₁₉₉₂₎

that the ML estimators of

these

parameters

remain consistent. It

_might

also be worthwhile to assess the effect of

departures

from

_independence

on

_testing

procedures.

The

generalized

chi-square

procedure

for

_goodness

of fit derived

_by

McLaren et al

₍₁₉₉₄₎

_might

be useful in

that

_respect

for

_analyses

based on

_{large samples.}

ACKNOWLEDGMENTS

Part of this research was conducted while JL

_Foulley

was a

_Visiting

Professor at the

University

of Wisconsin-Madison. He

_{greatly acknowledges}

the support of this institution and of the Direction des

_productions

animales and Direction des relations

_{internationales,}

INRA. The

_manuscript

is

_largely

drawn from invited lectures

_{given by}

the senior author

at the Midwestern Animal Science

_Meeting

_(Des

_{Moines, Iowa,}

_April

12

_1995),

the

University

of Wisconsin-Madison

₍₁₂

April

1995),

Cornell

_University

₍₁₅

_May

₁₉₉₅₎

and the

_University

of Illinois

₍₂₈

_August

_1995).

_Special

thanks are

_expressed

to all those who contributed

_{substantially}

to the discussions

_{generated thereby,}

and also to B Klei and B

Cunningham

from the US Simmental Association for

providing

the data set on

calving

ease scores

_analyzed

in this

study.

The H-TM has been

_applied

to

_{calving performance}

of other cattle breeds

_(US

_Holstein,

French Blonde

_{d’Aquitaine,}

_{Charolais, Limousin,}

Maine

Anjou).

It has also been

applied

to

_prolificacy

records in

sheep.

We are

grateful

to J

Berger

(Iowa

State

_University,

_Ames),

F

_Ménissier,

J

Sapa

(INRA

Genetics,

Jouy)

and JP

Poivey

(INRA

Genetics,

Toulouse)

for

_providing

the

_{corresponding}

data sets.

REFERENCES

Albert

_JH,

Chib S

₍₁₉₉₃₎

_{Bayesian analysis}

of

_binary

and

polychotomous

response data.

J Am Stat Assoc

_88,

669-679

Brillinger

DR

₍₁₉₈₅₎

What do

_seismology

and

_{neurophysiology}

have in common ?

(17)

Bryant WK,

Gerner

₍₁₉₈₂₎

The demand for service contracts. J Business

55,

345-366 Bulmer

MG,

Bull JJ

₍₁₉₈₂₎

Models of

_polygenic

sex determination and sex ratio control.

Evolution

_36,

13-26

Collet

₍₁₉₉₁₎

_{Modelling Binary}

Data.

_Chapman

and

Hall,

London

Curnow

_R,

Smith C

₍₁₉₇₅₎

Multifactorial models for familial diseases in man. J R Stat

Soc A

_138,

131-169

Dempster ER,

Lerner IM

₍₁₉₅₀₎

_Heritability

of threshold characters. Genetics

_35,

212-236 Denis JE Jr, Schnabel RB

₍₁₉₈₃₎

Numerical

_{Methods for}

Unconstrained optimization and

Non-Linear

Equations.

Prentice-Hall

_{Inc, Englewood}

Cliffs

Derquenne

(1995)

Heteroskedastic

_logit

models. 50th Session of the International Statis-tical Institute

_{(ISI), Bejing, China,}

21-29

_August

Edwards

_AL,

Thurstone LL

₍₁₉₅₂₎

An internal

_consistency

check for scale values deter-mined

_by

the method of successive intervals.

_{Psychometrika 17,}

169-180

Engel B,

Buist

W,

Visscher A

₍₁₉₉₅₎

Inference for the threshold models with variance

components from the

_generalized

linear mixed model

perspective.

Genet Sel Evol 27,

15-22

Falconer DS

₍₁₉₆₀₎

An, Introduction to

_Quantitative

Genetics. 1st

_edition,

Oliver and

Boyd, London

Falconer DS

₍₁₉₆₅₎

The inheritance of

_liability

to certain diseases estimated from the incidence among relatives. Ann Hum Genet

29, 51-76

Falconer DS

₍₁₉₈₉₎

An Introduction to

_Quantitative

Genetics. 3rd

edition, Longman

Scientific and

_Technical,

Harlow

Fahrmeir

_L,

Thtz G

₍₁₉₉₄₎

Multivariate Statistical

_Modelling

Based on Generalized Linear

Models.

_{Springer-Verlag,}

New York

Fechner GT

₍₁₈₆₀₎

Elemente der

_{Psychophysik,}

2 vols: Vol 1 translated

_(1966).

In:

Elements

_{of Psychophysics}

_(DH

Howes,

EG

Boring,

eds),

Holt,

Rinehart and

Winston,

New York

Foulley

JL

₍₁₉₈₇₎

M6thodes d’evaluation des

_{reproducteurs}

pour des caract6res discrets a d6terminisme

_polygénique

en selection animale. PhD

_{thesis, University}

of

Paris-Sud-Orsay

Foulley JL,

Gianola

D,

Thompson

R

₍₁₉₈₃₎

Prediction of

genetic

merit from data on

categorical

and

_quantitative

variates with an

_application

to

_calving

_difficulty,

birth

weight

and

_{pelvic opening.}

Genet Sel

Evol 15,

401-424

Foulley JL,

Gianola

_D,

San Cristobal

_M,

Im S

₍₁₉₉₀₎

A method for

_assessing

extent and

sources of

_{heterogeneity}

of residual variances in mixed linear models. J

_Dairy

Sci

_73,

1612-1624

Foulley JL,

San Cristobal

_M,

Gianola

_D,

Im S

₍₁₉₉₂₎

_Marginal

likelihood and

_Bayesian

approaches

to the

_analysis

of

_{heterogeneous}

residual variances in mixed linear Gaussian models.

_Comput

Stat Data Anal

_13,

291-305

Foulley JL, Quaas

RL

₍₁₉₉₅₎

_{Heterogeneous}

variances in Gaussian linear mixed models. Genet Sel

Evol 27,

211-228

Galton F

₍₁₈₈₉₎

Natural Inheritance.

_Macmillan,

London

Gianola D

₍₁₉₈₂₎

_Theory

and

analysis

of threshold characters. J Anim Sci 54, 1079-1096 Gianola

_D,

_Foulley

JL

₍₁₉₈₃₎

Sire evaluation for ordered

_categorical

data with a threshold

model. Genet Sel

_{Evol 15,}

201-224

Gianola

_D,

Sorensen D

₍₁₉₉₆₎

Abstract to the

EAAP, Lillehammer, Norway,

25-29

August

Gilmour

A,

Anderson

_RD,

Rae A

₍₁₉₈₇₎

Variance _componentson an

_underlying

scale for

ordered

multiple

threshold

categorical

data

_using

a

generalized

linear model. J Anim Breed Genet

104,

149-155

Grosbras JM

₍₁₉₈₇₎

Les données manquantes. In: Les

_so!edages

_(JJ

Droesbeke,

B

_Fichet,

(18)

Harville

DA,

Mee RW

₍₁₉₈₄₎

A mixed model

_procedure

for

_analyzing

ordered

_categorical

data. Biometrics 40, 393-408

Hoschele

_I,

Gianola

_{D, Foulley}

JL

₍₁₉₈₇₎

Estimation of variance _componentswith

quasi-continuous data

_{using Bayesian}

methods. J Anim Breed Genet

_104,

334-349

H6schele

_I,

Tier

_B,

Graser HU

₍₁₉₉₅₎

_{Multiple-trait genetic}

evaluation for one

polychoto-mous trait and several continuous traits with

_missing

data and

_unequal

models. J Anim Sci

73,

1609-1627

Janss

_L,

_Foulley

JL

₍₁₉₉₃₎

Bivariate

_analysis

for one continuous and one threshold

dichotomous trait with

_{unequal design}

matrices and an

_application

to birth

_weight

and

_{calving difficulty.}

Livest Prod Sci 33, 183-198

Kaplan RS,

Urwitz G

₍₁₉₇₉₎

Statistical models of bond

_ratings:

a

methodological inquiry.

J Business

52,

231-261

Levy

F

₍₁₉₈₀₎

_Changes

in

_employment

_prospectsfor black males.

_{Brooking Papers 2,}

513-538

Liang KY, Zeger SL,

Qaqish

B

₍₁₉₉₂₎

Multivariate

_{regression analysis}

for

_categorical

data. J R Stat Soc B

_54,

3-40

Maddala GS

₍₁₉₈₃₎

Limited

_Dependent

and

_Qualitative

Variables in Econometrics.

Cam-bridge University Press,

New York

McCullagh

P

_{(1980) Regression}

models for ordinal data. J R Stat Soc

_B42,

109-142

McCullagh

P,

Nelder J

₍₁₉₈₉₎

Generalized Linear Models. 2nd

_{ed, Chapman}

and

Hall,

London

McKelvey R,

Zavoina W

₍₁₉₇₅₎

A statistical model for the

_analysis

of ordinal level

dependent

variables. J Math Soc

_4,

103-120

McLaren

CE,

Lecler

JM,

Brittenham GM

₍₁₉₉₄₎

The

_generalized

chi square

goodness-of-fit

test. Statistician

43,

247-258

Misztal

_I,

Gianola

_D,

H6schele I

₍₁₉₈₈₎

Threshold model with

_{heterogeneous}

residual variance due to

_missing

information. Genet Sel Evol

_20,

511-516

Natarajan

R

₍₁₉₉₅₎

Estimation in Models for Multinomial

Response

Data:

_Bayesian

and

Frequentist Approaches.

PhD

_thesis,

Cornell

_University,

Ithaca

Pearson K

₍₁₉₀₀₎

Mathematical contributions to the

_theory

of evolution. VIII. On the inheritance of characters not

_capable

of exact

_quantitative

measurement. Phil Trans

Roy

Soc London A

_195,79

Prentice RL

₍₁₉₇₆₎

Generalisation of

probit

and

logit

models. Biometrika

32,

761-768

Quaas RL,

Zhao

_Y,

Pollak EJ

₍₁₉₈₈₎

_Describing

interactions in

_dystocia

scores with a

threshold model. J Anim Sci

_66,

396-399

Read

I,

Cressie N

₍₁₉₈₈₎

_{Goodness-of-Fit}

Statistics

_for

Discrete Multivariate Data.

Springer-Verlag,

New York

Robertson A

₍₁₉₅₀₎

Proof that the additive

_heritability

on the _pscale is

_{given by}

the

expression

z2 h!/fj(¡.

Genetics

35,

234-236

Roff DA

₍₁₉₉₄₎

The evolution of

_dimorphic

traits:

_predicting

the

_genetic

correlation between environments. Genetics

_136,

395-401

San Cristobal

_{M, Foulley JL,}

Manfredi E

₍₁₉₉₃₎

Inference about

_{multiplicative}

het-eroskedastic components of variance in a mixed linear Gaussian model with an

ap-plication

to beef cattle

_breeding.

Genet Sel Evol 25, 3-30

Singh

M

₍₁₉₈₇₎

A non-normal class of distribution for dose

_binary

_responsecurve. J

_A

_PP

_I

Stat

_14,

91-97

Sorensen

DA,

Andersen

S,

Gianola

D, Korsgaard

I

₍₁₉₉₅₎

_Bayesian

inference in threshold models

_using

Gibbs

sampling.

Genet Sel

Evol 27,

229-249

Terza JV

₍₁₉₈₅₎

Ordinal

probit:

a

_{generalization.}

Commun Stat Theor Meth

14,

1-11 1

(19)

White H

₍₁₉₈₂₎

Maximum likelihood estimation

_{of misspecified}

models. Econometrica

_50,

1-25

Wright

S

_(1934a)

An

_analysis

of

_variability

in number of

_digits

in an inbred strain of

guinea pigs.

Genetics 19, 506-536

Wright

S

_(1934b)

The results of crosses between inbred strains of

_{guinea pigs differing}

in

number of

_digits.

Genetics

_19,

537-551

Zellner A

₍₁₉₇₆₎

_Bayesian

and

non-Bayesian analysis

of the

_regression

model with multivariate Student-t error terms. J Am Stat Assoc

_71,

400-405

APPENDIX 1

Expressions

for the score function U and the information matrix J

Concerning

derivatives with

_respect

to the vector

(3

of location

parameters,

one

has:

(20)

(21)

Finally,

U can be

expressed

as:

with

_expressions

for

_£&dquo;

_vpand v8

given

in

(A.l!, [A.2]

and

(A.3!.

Elements of the information matrix

_J(a)

include the

_expectations

of minus the second derivatives. The

_following

derivatives will be considered:

threshold-threshold; (3-threshold-threshold;

6-threshold; 13 - 13; (3-6;

and S - b.

(22)

(3-Threshold

derivatives

The

_expectation

of the first term vanishes because

_E(yz!)

=

nj

1 Ijj.

(23)

(24)

where T =

{t!!} is

_a

(J - 1)

x

_{(J -}

_{1) symmetric}

band matrix

_having

as elements:

t

ij

=

E(-å

2 L/årJ),

and

_t

_j

_,

_j+1

=

’

)

j+1

j

år

L/år

2 E(-å

_given

_in

[A.4]

and

(A.5!.

These

_expressions

can be extended to

obtain the

MAP of

parameters

in a mixed-model structure

_{by replacing}

(i)

13 by

e =

((3!,

u#’)’

with

u# r-

N(0,

P

Z

A)

(p

2

=

U

2 i

/

0 ,?

=

constant);

(ii)

X by

S =

₍

_Sl

S2 .... , Si,...,

)’

S

I

with

s!

=

(x!, (Jiz

D

;

and

_making

the

_appropriate

_adjustments

for

_prior

information as shown below:

(25)

APPENDIX 2

(26)

Statistical analysis of ordered categorical data via a structural heteroskedastic threshold model

HAL Id: hal-00894134

https://hal.archives-ouvertes.fr/hal-00894134

Submitted on 1 Jan 1996

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Statistical analysis of ordered categorical data via a

structural heteroskedastic threshold model

Jl Foulley, D Gianola

To cite this version:

Original

article

Statistical

analysis

of

ordered

categorical

data via

a

structural heteroskedastic

threshold model

JL

Foulley

D

Gianola

2

génétique

appliquée,

agronomique,

Jouy,

Jouy-en-Josas cedex, France;

Department of

Science, University of Wisconsin-Madison,

Madison, WI 53706,

(Received

1995; accepted

1996)

model,

subpopulations

polychotomous

underlying

proposed whereby

subpopulations

dispersion

(scaling)

Heterogeneity

using

loglink

involving

(estimation,

testing procedures, goodness

fit)

procedures. Bayesian

techniques

application

calving

presented;

goodness

/

/

likelihood/

/

classique,

différences

réponses

sous-populations

catégories

différences

paramètres

sous-jacente. L’approche présentée

_analysis

_categorical

_Foulley

_génétique

_appliquée,

_Jouy,

_{Science, University of Wisconsin-Madison,}

_{1995; accepted}

₁₉₉₆₎

_model,

_{polychotomous}

_underlying

_{proposed whereby}

_dispersion

_{Heterogeneity}

_using

_loglink

_involving

_{testing procedures, goodness}

_{procedures. Bayesian}

_presented;

_/

_/

_likelihood/

_/

_classique,

_différences

_réponses

_catégories

_paramètres

_{sous-populations diffèrent}

_{L’inférence}

_(estimation,

_{qualité d’ajustement,}

_{techniques bayésiennes}

_{également proposées}

_application

_difficultés

_vêlage

_présentée.

_qualité

_{l’ajustement}

_/

_/

_/

_appealing

_analysis

_categorical

_Although

_population

_quantitative

_genetics

_by

_Dempster

₍₁₉₅₀₎

_(1950),

_(1900),

₍₁₈₈₉₎

_genetics

_{susceptibility}

_(Falconer,

_{Smith, 1975), population}

_{Bull, 1982; Roff,}

_{1994), neurophysiology}

_{(Brillinger, 1985),}

_breeding

_{(Gianola, 1982),}

_analysis

_(Grosbas,

_1987),

_{psychological}

_(Edwards

_(Kaplan

_{Urwitz, 1979;}

_Levy,

_1980;

_Bryant