• Aucun résultat trouvé

High breakdown inference for mixed linear models

N/A
N/A
Protected

Academic year: 2022

Partager "High breakdown inference for mixed linear models"

Copied!
137
0
0

Texte intégral

(1)

Thesis

Reference

High breakdown inference for mixed linear models

COPT, Samuel

Abstract

Mixed linear models are used to analyze data in many settings. These models have in most cases a multivariate normal formulation. The maximum likelihood estimator (MLE) or the residual MLE (REML) are usually chosen to estimate the parameters. However, the latter are based on the strong assumption of exact multivariate normality. Welsh and Richardson (1997) have shown that these estimators are not robust to small deviations from the multivariate normality. This means, in practice, that a small proportion of data (even only one) can drive the value of the estimates on their own. We present some of the most used models in the analysis of variance. We introduce the mixed linear model formulation and see that in most cases it is possible to extract independent subvectors of observation. The structure of the covariance matrix is derived for a great variety of models. Since the model is multivariate, we propose in this thesis a high breakdown multivariate robust estimator for very general mixed linear models, that include, for example, covariates. This robust estimator belongs to the class of S-estimators (Rousseeuw and Yohai [...]

COPT, Samuel. High breakdown inference for mixed linear models. Thèse de doctorat : Univ. Genève, 2004, no. SES 563

URN : urn:nbn:ch:unige-120533

DOI : 10.13097/archive-ouverte/unige:12053

Available at:

http://archive-ouverte.unige.ch/unige:12053

Disclaimer: layout of this document may differ from the published version.

(2)

HIGH BREAKDOWN INFERENCE FOR MIXED LINEAR MODELS

Samuel Copt

Submitted for the degree of Ph.D in Econometrics and Statistics Faculty of Economics and Social Sciences

University of Geneva Switzerland

Accepted on the recommendation of : Prof. Gérard Antille, University of Geneva, Prof. Chris Field, Dalhousie University, Canada,

Prof. Elvezio Ronchetti, University of Geneva,

Prof. Maria-Pia Victoria-Feser, University of Geneva, supervisor.

Thesis number 563 Geneva, 2004

(3)

La Faculté des sciences économiques et sociales,

sur préavis du jury, a autorisé l’impression de la présente thèse, sans entendre, par là, émettre aucune opinion sur les propositions

qui s’y trouvent énoncées et qui n’engagent que la responsabilité de leur auteur.

Genève, le 2 juillet 2004

Le Doyen : Pierre ALLAN

(4)

Abstract

Mixed linear models are used to analyze data in many settings. These models have in most cases a multivariate normal formulation. The maximum likelihood estimator (M LE) or the residual M LE (REM L) are usually chosen to estimate the parameters. However, the latter are based on the strong assumption of exact multivariate normality. Welsh and Richardson (1997) have shown that these estimators are not robust to small deviations from the multivariate normality.

This means, in practice, that a small proportion of data (even only one) can drive the value of the estimates on their own.

We present some of the most used models in the analysis of variance. We introduce the mixed linear model formulation and see that in most cases it is possible to extract independent sub- vectors of observation. The structure of the covariance matrix is derived for a great variety of models. Since the model is multivariate, we propose in this thesis a high breakdown multivari- ate robust estimator for very general mixed linear models, that include, for example, covariates.

This robust estimator belongs to the class of S-estimators (Rousseeuw and Yohai 1984) from which we can derive the asymptotic properties for inference. We also use it as a diagnostic tool to detect outlying subjects. We derive the estimating equation de…ning the high break- down estimator and we describe how it can be computed via a simple iterative algorithm. We study the behavior of the robust estimator through an extensive simulation study. It is com- pared to the maximum likelihood estimator under a great variety of con…guration implying di¤erent models, di¤erent contamination patterns and di¤erent samples size. We also discuss the advantages of this estimator and illustrate its performance with the analysis of four datasets.

We also consider robust inference for multivariate hypotheses as an alternative to the classical F-test by using a robust score type test statistic proposed by Heritier and Ronchetti (1994) and study its properties by means of simulations and real data analysis.

(5)

Résumé

Les modeles lin. mixtes sont utilisés pour analyser les données issues de recherches dans de nombreux domaines des sciences humaines et sociales. Pour estimer ces modèles et tester des hypothèses, nous proposons dans ce travail une approche dite robuste qui protège les analyses de bias potentiels dus à la présence dans les échantillons d’une minorité de données atypiques.

Nous partons d’une formalisation multivariee de ces modeles et proposons un estimateur ro- buste appartenant à la classe des estimateurs S (Rousseeuw et Yohai 1984) dont nous dérivons les fonctions de scores ainsi que les propriétés asymptotiques pour l’inférence. Nous pouvons également l’utiliser comme outil de diagnostique a…n de pouvoir détecter de potentielles valeurs extrêmes. Les avantages de cet estimateur ainsi que son comportement sont illustrés au travers d’une étude de simulation ainsi que par l’analyse de quatre jeux de données réels.

Basée sur cet estimateur, une partie inférence robuste est également développée. Nous pro- posons une alternative robuste au test classique de Fisher. Il s’agit d’un test du score robuste proposé par Hériter et Ronchetti (1994). Le comportement de ce test est étudié aux moyens de simulations ainsi que de jeux de donnée réels.

(6)

Aknowledgments

I would like to express my gratitude to Prof. Maria-Pia Victoria-Feser for her valuable sugges- tions and her generous guidance throughout the course of this research. Her con…dence and her encouragements have made this work possible.

I am also grateful to Prof. Gérard Antille, Prof. Chris Field and Prof. Elvezio Ronchetti for their helpful comments and for taking the time and e¤ort to ensure my work was of quality.

Their valuable suggestions improved the clarity of presentation.

My thanks also go to my friends and colleagues of the Faculty of Economics and Social Sciences of the University of Geneva for their stimulating discussions and their moral support.

Finally, my thanks go to my wife Sonia, who deserves an award for her patience and under- standing during my PhD study and the writing of this thesis.

(7)

To my parents,

(8)

Contents

1 Introduction 4

2 Models formulation 9

2.1 mixed linear models . . . 9

2.2 One factor within-subject ANOVA model . . . 12

2.2.1 Model I . . . 12

2.2.2 Model II . . . 14

2.3 Within-subject ANOVA models with more than one factor . . . 15

2.4 Multilevel models . . . 19

2.5 Hierarchical models . . . 19

2.6 ANCOVA Models . . . 23

3 Estimation of mixed linear models 24 3.1 Classical estimators . . . 24

3.1.1 Maximum likelihood estimator . . . 25

3.1.2 REML in the general mixed linear models . . . 27

3.2 Robust estimators . . . 29

3.2.1 Estimation by maximizing a robusti…ed likelihood . . . 30

3.2.2 Estimating equations approach . . . 32

3.2.3 B-optimal estimator . . . 33

3.2.4 Robust versions of the restricted likelihood estimator . . . 34

(9)

3.3 Some remarks . . . 35

4 High breakdown multivariate estimator of location and constrained scale 36 4.1 Introduction . . . 36

4.1.1 ConstrainedS-estimator . . . 37

4.1.2 The maximum likelihood estimator for constrained covariance matrix . . 44

4.1.3 Starting point for the constrained robust estimator . . . 45

4.1.4 An algorithm procedure for the constrainedS-estimator . . . 50

4.1.5 Concluding remarks . . . 51

5 Simulation study for the CT BS estimator 53 5.1 One factor ANOVA model with repeated measures . . . 54

5.2 Two factors within-subject ANOVA model . . . 57

5.3 Hierarchical models . . . 63

5.4 Breakdown analysis . . . 64

5.5 Small sample behavior study . . . 68

6 Robust Inference 76 6.1 Robust Inference for the univariate case . . . 77

6.2 Robust multivariate tests . . . 79

7 Simulation study for the robust score type test 83 7.1 One factor within-subject ANOVA model . . . 84

7.2 Two factors within-subject ANOVA model . . . 87

7.3 Hierarchical model . . . 93

8 Data analysis 96 8.1 Electrode resistance data . . . 96

8.2 Semantic priming data . . . 102

8.3 Metallic oxide data . . . 109

(10)

9 Extensions 116 9.1 Incomplete Design . . . 118

10 Conclusion 122

(11)

Chapter 1

Introduction

Statistics is concerned with the variability that is inevitably present in any sets of data. The traditional method for isolating the sources of variability in a set of measurements is known as the analysis of variance. It has long been used in statistics and provides a simple and optimal description of a complete data set by means of models characterized by a certain number of pa- rameters. Its purpose is to determine the extent to which the e¤ect of an independent variable, often called a factor, is a major component of the variability. The partition of the variability is summarized in what is known as an ANOVA table.

Historically, the original work of Fisher (1925) is at the beginning of the development of the analysis of variance. Fisher’s description of the analysis of variance methodology was based on sums of squares of di¤erences among observed means. In the recent decades, the trend has been to present many of the ideas behind the analysis of variance in terms of what are called nowadays linear models and more particularly that class of linear models which is called …xed e¤ects models. Variations among data can be studied through di¤erent classes of linear models, those known as random e¤ects models and also those called mixed e¤ects models, which are models that have mixed features of …xed and random models. In general, data analysis using mixed linear models is closely related to the classical analysis of variance, but this is not always true for linear and random e¤ects models. To have a thorough history of the development of the analysis of variance or to have a good documented bibliography on the subject, consider the excellent survey of Sahai, Khuri, and Kapadia (1985).

(12)

Considering a more standard terminology, an ANOVA model is also de…ned through an exper- imental design. An experimental design refers to a plan for assigning subjects to experimental conditions. Such designs involve a certain number of speci…cations such as the determination of the experimental conditions (independent variable) to be used, the measurements (dependent variables) to be recorded and the nuisance variable that must be controlled. Of course, those designs are also de…ned through the statistical hypothesis that are made about the parameters and the population from which the subjects are drawn. Given the choices made about the design and the statistical hypothesis, the model will be referred to as a …xed, random or mixed e¤ects model. In this work, the terms model and design will be used indi¤erently. For a review of the various designs used in the statistical literature see e.g. Kirk (1982).

In this work we consider more particularly designs that involve repeated measurement of the same group of individuals. Such design is called a repeated measures design. More speci…cally, when separate groups of individuals are studied, the number of individuals required is some multiple of the combined levels of each factor that is introduced. With repeated measures, rather than randomly assigning the individuals to the various conditions, the same individuals can be measured at all levels. The advantage of such studies is that di¤erences in measures cannot be attributed to individual characteristics (for example, motivation, intelligence). An- other advantage is that we need fewer individuals, the same individual being measured at all levels. In either case, the design is termed as repeated measures, because the same individuals are measured on a number of occasions corresponding to each treatment level. It is also referred to as randomized block design, with each individual designated as a “block”.

Repeated measures studies have been introduced in virtually all behavioral and social sciences:

psychology, medicine, education, sociology, political science, economics, business and industry.

There are many research hypotheses that can be tested using repeated measures designs, such as hypotheses that compare the same subjects under di¤erent treatments, or those that follow performance over time. Repeated measures designs are quite versatile and are called by many di¤erent names. For example, a one-way repeated measures ANOVA model may be known

(13)

as a one factor within-subjects ANOVA model, a treatments-by-subjects ANOVA model, or a randomized blocks ANOVA model. A two-way repeated measures ANOVA model may be referred to as a two factors within-subjects ANOVA model, a two-way ANOVA model with repeated measures on both factors, a multiple treatments-by-subjects ANOVA model. In fact, repeated measures designs are special cases of the randomized complete block design wherein each subject is considered to be a block and is observed under all treatment levels.

Actually, the …rst insight of this work was to develop a robust estimator for designs involving repeated measurements only. We will see that the results of this research are applicable not only to repeated measures models but also to other kind of models. As long as it is possible to write the corresponding models as a multivariate normal model with constrained covariance matrix, we can apply the robust method we propose. These models belong to the class of mixed linear models and include, for example, hierarchical or multilevel models (random nested models), longitudinal data (repeated measures) and others. The aim of this research is thus to investigate robust procedures for ANOVA models which can be described as multivariate nor- mal with constrained covariance matrix and to propose new robust inference tools. In terms of robustness, most of the proposals in the …eld of ANOVA (or mixed linear models) are based on a weighted version of the corresponding log-likelihood function. Welsh and Richardson (1997) present a survey of the various robust methods developed so far.

Here we propose a di¤erent approach that we hope will improve the performance of the robust estimators especially in terms of breakdown point (see later). It starts from a reformulation of the models as multivariate normal distributions. For example, consider the one factor within- subject ANOVA model given by the following structural equation

yij = + j+si+"ij i= 1; ::; n j= 1; :::; l

where yij is an observation for subject i at treatment level j, is the grand mean, j is a

…xed e¤ect for the jth treatment levels and lj=1 j = 0,si is a random variable that explains the random e¤ect of the ith subject on the response variable yij and "ij is a residual error

(14)

term. We suppose that the unobservable random variablessi and"ij are independent and have independent N(0; 2s) and N(0; 2")respectively. An equivalent multivariate formulation of the model is given by

yi= +elsi+"i i= 1; :::; n

where yi = [yi1;:::;yil]T, = [ + 1; + 2: : : : ; + l]T ,el is a vector of ones of length l and

"i = ["i1; :::"il]T. Given the assumptions on si and "ij, the yi are independent multivariate

normalN( ; ) random variables with

= 2"Il+ 2sJl

withJl being thel l matrix of ones. For other models, the covariance matrix has always a particular structure (see Chapter 2). The problem is then reduced to the estimation of and and our aim is to propose a robust estimator for a structured or constrained . We can say that the covariance matrix is constrained to have a particular structure given the assumptions on the structural model and we will use this structure in the de…nition of our robust estimator.

Estimating the parameters in mixed linear models is only a …rst step into inference. Therefore in a second part of our research, we will also consider robust testing for comparing treatment levels or in other words the elements of the mean vector. Fisher (1925) developed the widely used F-test. Unfortunately, the F-test is known as being not robust to small deviations such as extreme observations (see for example Braun and McNeil, 1981). It is true however that the robustness of the classical F-test is still a subject of research. While it can remain stable in presence of misspeci…ed models, it can break completely in some other cases. As we will see later, there are two main reasons that a¤ect the robust properties of the F-test. To our knowledge, no formal robustness study has been done in the repeated measures settings and we will show that small amounts of extreme observations can considerably bias the decisions taken on the basis of theF-test. With this idea in mind, we adapt a robust testing procedure based on the results of Heritier and Ronchetti (1994), who propose robust versions of the classical Wald test, score type test and likelihood ratio test for general parametric models. Since those

(15)

small, we will also investigate the small sample properties of the robust tests.

This work is organized as follows. In Chapter 2 we present some of the most used models in the analysis of variance. We introduce the mixed linear models formulation and see that in most cases it is possible to extract independent subvectors of observation. The structure of the covariance matrix is derived for a great variety of models. In Chapter 3 we review both the classical and robust methods of estimation that exist so far. The estimating equation de…ning the high breakdown estimator are derived in Chapter 4. We describe how it can be computed via a simple iterative algorithm. In Chapter 5, we study the behavior of the robust estimator through an extensive simulation study. It is compared to the maximum likelihood estimator un- der a great variety of con…guration implying di¤erent models, di¤erent contamination patterns and di¤erent samples size. Robust inference is developed in Chapter 6 and analyzed through a simulation experiment in Chapter 7. Chapter 8 analyzes four data sets using classical and robust estimators. In Chapter 9, we summarize the questions that have been left open during the course of this work and suggest di¤erent directions for further research. Finally, Chapter 10 concludes.

(16)

Chapter 2

Models formulation

In this chapter, we review some of the most used designs in analysis of variance through the mixed linear models formulation. Indeed, the mixed linear models formulation has the advan- tage of being very ‡exible for complex ANOVA designs. We present the formulation of mixed linear models as multivariate normal distribution with constrained covariance matrices. We ex- plore di¤erent kinds of models and see that in each case it is always possible to extract random independent subvectors that are multivariate normal.

2.1 mixed linear models

mixed linear models can be expressed generally by the regression type equation

y=X +Z +" (2.1)

where y is the N vector of all measurements (observations), X is a N q0 design matrix for the …xed e¤ects component , a q0-vector of unknown …xed e¤ects, Z is theN q design matrix for the random e¤ect vector and" is the N-vector of independent residual errors.

can actually be partitioned into a series ofr sub-vectors,

= [ T1; T2; :::; Tr]T (2.2)

(17)

E(y) =X

var(") = 2"IN (2.3)

We suppose that for each j

var( j) = 2jIqj (2.4)

withqj being the number of elements in j. Moreover

cov( j; jT) = 0 j6=jT (2.5) and similarly

cov( ;") = 0 (2.6)

Using (2.4)-(2.6), the covariance structure of is

D=var( ) = 2 66 66 66 4

21Iq1

22Iq2

. ..

2rIqr 3 77 77 77 5

(2.7)

Then partitioning Zin submatrices as in (2.2) i.e.

Z= [Z1;Z2; :::;Zr] with each submatrix Zj of dimensionN qj, (2.1) becomes

y=X + Pr j=1

Zj j+" (2.8)

Hence,

(18)

V=var(y) =ZDZT + 2"IN = Pr j=1

2

jZjZTj + 2"IN

A useful extension of this formulation is to treat"just as another j, say 0, and incorporate it into (2.8) by de…ning

0 =", Z0 =IN, and 20 = 2"

and so we have

y=X + Pr j=0

Zj j (2.9)

and

V= Pr j=0

2

jZjZTj (2.10)

We assume that all the q0+r+ 1 e¤ects are identi…able and concentrate on models for which we can write

V=diag( i) (2.11)

with i = ;8i= 1; : : : ; n. For such models, we have an equivalent multivariate formulation for (2.1) which is

yi N( ; ) (2.12)

with yi the p-vector of independent observations obtained by partitioning y according to the covariance structure in (2.11) and =x withx a p q0 matrix obtained by partitioning X according to the covariance structure in (2.11). The case in which = i=xi , i.e. with the presence in the model of covariates, will also be discussed. We can actually write

= Xr j=0

2

jzjzTj (2.13)

withzj being the design matrices that de…ne the structure of each block ofV.

(19)

Most of the well-known models can actually be written as in (2.12) with a covariance matrix de…ned with (2.13) and we now present in detail some of them.

2.2 One factor within-subject ANOVA model

We begin with a simple repeated measures model with only one experimental factor that is usually termed as one factor within-subject ANOVA model. It is not of great interest in practice but has the advantage of being simple, easy to describe and theoretically interesting. There are actually two possible models and we will de…ne them by means of structural equations.

The …rst model, which is called Model I, expresses the response variable as a function of the

…xed experimental factor and a random factor “subject”. For the second model, Model II, the response variable is a function of the …xed experimental factor, a random factor “subject”

and an interaction random factor which represents the interaction between the subject and the experimental factor. We treat here the case where the model is balanced i.e. all yij for all i and j are observed.

2.2.1 Model I

The one factor within-subject ANOVA model can be written as

yij = j+si+"ij i= 1; ::; n j = 1; :::; l (2.14) where yij is the response of subject i at treatment level j. There are l levels for the experi- mental factor (in this casep=l) and lmeasurements on n subjects are taken on the response variable. j is a …xed e¤ect for each levelj of the experimental factor and si is a variable that represents the random e¤ect of theith subject on the response variable. The unobservable ran- dom variablessiand"ij are supposed independent and have independentN(0; 2s)andN(0; 2") respectively8ij.

Usually, in the literature, an alternative formulation of model (2.14) is proposed. This model can be written as

(20)

yij = + j+si+"ij (2.15) where is the overall mean and j is the …xed e¤ect deviation from the overall mean for level (or treatment) j. We need lj=1 j = 0. The mean j in the structural model (2.14) is related to and j with

j = + j

Here we will use formulation (2.15). Given the assumption onsi and"ij, it follows that theyij are jointly normally distributed with mean + j and

cov(yij; ylk) = 8>

>>

<

>>

>:

2"+ 2s

2s

0

i=l; j =k i=l; j 6=k otherwise

The equivalent multivariate formulation is obtained by making n l-vectors of observations yi

which we can write as

yi= +elsi+"i i= 1; :::; n (2.16)

with = vec( + j). The yi are then independent multivariate normal N( ; ) random variables with

= var(yi) =var(elsi+"i) =eleTl 2s+ 2"Il (2.17)

= 2sJl+ 2"Il = 2 66 66 66 4

2"+ 2s 2s 2s

2s . .. ...

... . .. ...

2s 2

"+ 2s 3 77 77 77 5

To match the notation of (2.13) we can write

(21)

= Xr j=0

2

jzjzTj = 2"Il+ 2sJl

with z1 =el andz0=Il. Note that can also be expressed as =x . Using the constraint

l

j=1 j = 0, the …xed e¤ect vector is de…ned as

= [ ; 1; 2; :::; l 1]T (2.18)

and hence the design matrix xis

x= 2

4 el 1 Il 1 1 eTl 1

3

5 (2.19)

Finally, in the mixed linear models formulation, we have

y = (en x) + (In el) +" (2.20)

= X +Z

whereyis anl-vectors of responses, "is a nl-vectors of residual errors, and is the Kronecker product. We have 1 = (s1; : : : ; sn)T, Z1 = In el, so that Z1ZT1 = (In el) (In el)T. It follows thatE(y) =X , and that

V = var(y) = (In 1l) 2sIl(In 1l)T + 2"Inl

= 2s(In Jl) + 2"(In Il) =In ( 2sJl+ 2"Il)

2.2.2 Model II

Suppose now that we have to following structural model

yij = + j +si+ ( s)ij+"ij i= 1; ::; n j= 1; :::; l (2.21)

(22)

where yij, , j and si are de…ned as before. We suppose here that there is an interaction between the random e¤ect of the model (i.e. the subject) and the …xed experimental factor. A new random variable( s)ij is thus included in the model. The unobservable random variables si,"ij,( s)ij are supposed independent and have independentN(0; 2s),N(0; 2")andN(0; 2s) respectively.

As before we can use a multivariate formulation to get

yi= +elsi+ ( s)i+"i i= 1; :::; n (2.22)

and ( s)i = [( s)ij; :::;( s)il]T. The yi are then independent multivariate normal N( ; ) random variables with

= 2 66 66 66 4

2"+ 2s+ 2s 2s 2s

2s . .. ...

... . .. ...

2s 2

"+ 2s+ 2s 3 77 77 77 5

(2.23)

Note that it is clear from (2.23) that 2s cannot be separated form ( 2s+ 2s), so that model II cannot be estimated. In other words, the interaction e¤ect is confounded with the residual errors. This is not necessary the case with models that include more …xed e¤ects.

2.3 Within-subject ANOVA models with more than one factor

As we want to use the particular covariance structure of the di¤erent models (either model I or model II) we now enlarge the analysis to more complex models. Suppose we have a structural equation following the model II but this time with two …xed e¤ects factors and with respectivelyl and g levels, that is

yijk= + j+ k+ ( )jk+si+ ( s)ij + ( s)ik+"ijk (2.24)

(23)

withi= 1; ::; n; j= 1; :::; l and k= 1; :::; g (hence in this case p=gl). For the parameters to be identi…able, we need

l

j=1 j = 0; gk=1 k= 0 and lj=1 gk=1( )jk = 0:

Note that an alternative model is given by (2.24) in which the random interaction ( s)ij and ( s)ikare omitted. The unobservable random variablessi,"ij,( s)ij and( s)ikare independent and have independent N(0; 2s) , N(0; 2"), N(0; 2s) and N(0; 2s) respectively. Using the formulation given in (2.12) we have that the yi are multivariate normal N( ; ) with

=vec( + j+ k+ ( )jk) 8i

can also be de…ned as =x . An example of the structure of xand is given below.

For the covariance matrix, de…ne

1 = (s1; : : : ; sn)T Z1=In egl

2 = (( s)11;( s)12; : : : ;( s)nl)T Z2=In Il eg 3 = (( s)11;( s)12; : : : ;( s)ng)T Z3=In el Ig

so that Z2ZT2 =In Il Jg and Z3ZT3 =In Jl Ig, and therefore

= 2sJgl+ 2s(Il Jg) + 2s(Jl Ig) + 2"Igl

withz1=egl,z2=Il eg andz3= el Ig.

To get an idea of the structure of the covariance matrix and x , suppose now thatl= 2 and g= 2. Then would have the form

= 2

4 [ 1] [ 2] [ 2] [ 1]

3 5 with

(24)

1= 2

4 21 22

2

2 2

1

3

5; 2 = 2

4 23 24

2

4 2

3

3 5 and

2

1 = 2"+ 2s+ 2s+ 2s

2

2 = 2s+ 2s

2

3 = 2s+ 2s

24 = 2s

For the …xed e¤ect vector =x , one would get

= [ ; 1; 1;( )11]T and

x= 2 66 66 66 4

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

3 77 77 77 5

Note that in this case the interaction variance 2s and 2s can be separated from the residual variance.

The general structure of the covariance matrix remains actually the same when the number of

…xed e¤ects increases. Take a model with three …xed e¤ect factors , and . One would get for example the (rather undigest) structural equation,

y = + + + +( ) +( ) +( ) +( ) +s+( s) +( s) +( s) +"

(25)

withi= 1; ::; n j= 1; :::; l and k= 1; :::; g; m= 1; :::; hand we deduce that

=vec( + j+ k+ m+ ( )jk+ ( )jm+ ( )km+ ( )jkm) 8i

can also be written as =x with the structure ofx and depending of l,g and m. The covariance matrix has the form

= 2sJglq+ 2s(Il Jgq) + 2s(Jl Ig Jq) + 2s(Jgl Iq) + 2"Iglq Suppose that each factor has two levels, one gets the following covariance matrix

= 2 66 66 66 4

[ 1] [ 2] [ 3] [ 4] [ 2] [ 1] [ 4] [ 3] [ 3] [ 4] [ 1] [ 2] [ 4] [ 3] [ 2] [ 1]

3 77 77 77 5 with

1 = 2

4 21 22

2

2 2

1

3

5; 2 = 2

4 23 24

2

4 2

3

3 5; 3 =

2

4 25 26

2

6 2

5

3 5; 4 =

2

4 27 28

2

8 2

7

3 5

and

2

1= 2"+ 2s+ 2s+ 2s+ 2s 25 = 2s+ 2s+ 2s

22= 2s+ 2s+ 2s 26 = 2s+ 2s

2

3= 2s+ 2s+ 2s 27 = 2s+ 2s

24= 2s+ 2s 28 = 2s

We can see that for classical repeated measures models (having only crossed within-subject factors), the block structure of the covariance matrix remains always the same.

(26)

2.4 Multilevel models

In a multilevel model or random nested model, the random factors are only nested in random ones, leading to models of the type

yijk= + j+si+ i(k)+"i(k)j

with + j; j= 1; : : : ; lthe …xed e¤ect, andsi; i= 1; : : : ; nand i(k); k= 1; : : : ; g nthe random e¤ects. The distributional hypothesis onsi, i(k)and"i(k)j areN(0; 2s),N(0; 2)andN(0; 2") respectively. A simple example is the case in whichgmeasures are taken on each subjectiand each experimental conditionj. With this model, we have that =eg vec( + j), j = 1; :::; l, Z1=In egl,Z2 =In el Ig, so that

= 2sJgl+ 2(Jl Ig) + 2"Igl:

For example, suppose thatj= 2 and g= 2 then would have the form

= 2

4 [ 1] [ 2] [ 2] [ 1]

3 5 with

1= 2 4

2s+ 2 + 2" 2s+ 2

2s+ 2 2s+ 2 + 2"

3

5; 2 = 2 4

2s 2 s 2s 2 s

3 5

2.5 Hierarchical models

Until now, we have only presented models in which each level of a factor is combined with every level of another factor. Hierarchical models are models where only some levels of a factor are combined with the levels of another factor. More formally, suppose that we have two treatments and with respectively l and g levels. In the language of experimental design, if each level of treatment appears with onlyone level of treatment , is said to be nested in . Experi- mental designs with one or more nested treatments are particularly well suited for research in

(27)

in which two types of instruction materials (treatment levels 1 and 2) are to be evaluated using students in four classes (treatment levels 1, 2, 3 and 4). Two classrooms are randomly assigned to one type of programmed material. For obvious reasons, all children in a particular classroom are subject to the same type of material. We assume that each classroom contains the same number of children. Each classroom k appears with only one level of instruction material. Thus treatment is nested in .

One can also extend the models so as to include covariates. For example, we have the typ- ical experiment in which a measure is taken from n1 samples of typej = 1 and n2 samples of typej= 2, and in each sample the measure is taken ong“objects". For example, the “objects"

can be rats, the samples cages, n1 of which are given treatmentj = 1 and treatmentj = 2to the n2 others. This type of design is called a nested design. The covariate is here a dummy variable for the type of treatment. The corresponding model can be written as

yijk = + Ji(j) + j(i)+"j(i(k))

with

Ji(j) = 8<

:

0 j= 1 1 j= 2

+ Ji(j) the …xed e¤ect and j(i); i = 1; : : : n (n=n1+n2), with k= 1; : : : ; g n. We then have

i =eg( + Ji(j)) =eg (1; Ji(j)) ( ; )T =xi

1 = ( 1(1); : : : ; 1(n1); 2(n1+1); : : : ; 2(n))T,Z1 =In eg and therefore

= 2Jg+ 2"Ig

Note that the structure of is the same as with the one factor within-subject ANOVA model, the di¤erence lies in the mean that depends on the “sample" which plays here the role of ob- servation.

(28)

A more complicated example is a design reported by Fellner (1986) and also analyzed by Richardson and Welsh (1995) on the content of one type (j = 1) of metallic oxide measured in n1 = 18 lots and another type of metallic oxide (j = 2) measured in n2 = 13 other lots. Two samples were drawn from each lot and duplicate analyses were then performed by each of two chemists randomly selected for each sample. The model is

yijklm= + Ji(j) + j(i)+ j(i(k))+ j(i(k(l)))+"j(i(k(l(m)))) (2.25) with + Ji(j) the …xed e¤ect and j(i); i= 1; : : : n(n=n1+n2) the random e¤ect due to the lot, j(i(k)); k= 1; : : : ;2n, the random e¤ect due to the sample and j(i(k(l))); l = 1; : : : ;4n, the random e¤ect due to the chemist. We then have

i =e8( + Ji(j)) =e8 (1; Ji(j)) ( ; )T =xi and Z1 =In e8,Z2 =In I2 e4,Z3 =In I4 e2, so that

= 2J8+ 2I2 J4+ 2I4 J2+ 2"I8

Thus the parameters to be estimated are the means for each type of metallic oxide and the variances respectively associated with lots, samples and chemists. Suppose that we have 248 observations. We then can maken=31 independent sub-vectors yi of size 8. To give a better idea of the structure of each independent subvectoryi = [yi111; :::; yi222]T, the sub-vector y1 is de…ned as

(29)

2 66 66 66 66 66 66 66 66 66 64

y1111 y1112

y1121 y1122

y1211 y1212

y1221 y1222

3 77 77 77 77 77 77 77 77 77 75

= 2 66 66 66 66 66 66 66 66 66 64

1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0

3 77 77 77 77 77 77 77 77 77 75

2 4

3 5+

2 66 66 66 66 66 66 66 66 66 64

1 1 1 1 1 1 1 1

3 77 77 77 77 77 77 77 77 77 75

1+ 2 66 66 66 66 66 66 66 66 66 64

1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1

3 77 77 77 77 77 77 77 77 77 75

2 4 11

12

3 5+

2 66 66 66 66 66 66 66 66 66 64

1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1

3 77 77 77 77 77 77 77 77 77 75

2 66 66 66 4

111 112 121 122

3 77 77 77 5

+I8 2 66 66 66 66 66 66 66 66 66 64

"1111

"1112

"1121

"1122

"1211

"1212

"1221

"1222 3 77 77 77 77 77 77 77 77 77 75

The covariance matrix is given by

var(yi) = P3 r=1

2

rzrzTr + 2"I8 = 2

4 1 2

2 1

3 5

with

1 = 2 66 66 66 4

21 2

2 2

3 2

3 2

2 2

1 2

3 2

3 23 2

3 2

1 2

2 2

3 2

3 2

2 2

1

3 77 77 77 5

2 = 2 66 66 66 4

24 2

4 2

4 2

4 2

4 2

4 2

4 2

4 24 2

4 2

4 2

4 2

4 2

4 2

4 2

4

3 77 77 77 5

(30)

2

1 = 2 + 2 + 2+ 2e

2

2 = 2 + 2 + 2

2

3 = 2 + 2

2

4 = 2

2.6 ANCOVA Models

In psychology, ANCOVA models denote ANOVA models with covariates (like a pre-measurement).

As an example consider the model in which a covariate is added to a within-subject ANOVA model as in (2.15) for each level of the within-subject factor, i.e.

yij = + jxij + j+si+"ij (2.26) with + jxij+ j the …xed e¤ect andsi the random e¤ect. The parameter j is added when it is supposed that the intercepts of the regression lines are di¤erent. Similarly, one could add the constraint of equal regression slopes with j = 8j. We then have

i = vec + jxij + j = [el;diag(xij);Il] ( ; 1; : : : ; l; 1; : : : ; l)T

= xi ;

and Z1 =In el, so that

= 2sJl+ 2"Il:

Here, the primary goal of this research is to de…ne a robust estimator for mixed linear models.

In this chapter we have seen that it is possible to extract independent multivariate normal N( ; ) vectors of observations yi for some of the well-known mixed linear models, so that a robust estimator based on the multivariate normal modelN( ; ) with constrained covariance

(31)

Chapter 3

Estimation of mixed linear models

3.1 Classical estimators

The purpose of analysis of variance is to test di¤erences in means for statistical signi…cance.

This is accomplished by analyzing the variance, that is, by partitioning the total variance into the component that is due to the random error and the components that are due to di¤erences between means. Another classical approach to the estimation of the variance components is the maximum likelihood approach.

Suppose that we observe a response variable yij described through the structural equation (2.15). The analysis of sources of variability between observations and the grand mean proceeds with estimation of each component of the structural equation. Hence, the grand mean is estimated by y = (ln) 1 ni=1 lj=1yij. The e¤ect of each level of the treatment factor j is estimated by the di¤erence between each treatment’s level mean and the grand mean yj y withyj =n 1 ni=1yij .The e¤ect of each subjectsi is estimated by the di¤erence between each subject’s mean and the grand meanyi y withyi =l 1 lj=1yij . This component,between- subjects, re‡ects di¤erences among the subjects. Random errors, "ij, would be estimated by yij yi, the di¤erence between a score and the subject’s mean. This component,within-subjects, re‡ects variability within each subject. The simple estimators of 2" and 2s are obtained by

(32)

^2s = 1

n 1

Xn i=1

(yi y )2 1

l^2"; (3.1)

^2" = 1 n(l 1)

Xn i=1

Xl j=1

(yij yi)2 (3.2)

These estimators are unbiased estimators of the variance components. For other types of structural equation it is always possible to proceed with the classical computation of the various sums of squares that can become quite tedious when the complexity of the design increases.

We will not pursue this issue here, and consider the maximum likelihood to derive estimates for the various parameters.

3.1.1 Maximum likelihood estimator

As we have said, when the models become more complicated, it is more convenient to use the multivariate formulation used in mixed linear models. Recall that we can write a mixed linear models as

y=X +Z (3.3)

withZ partitioned as

Z = [Z0;Z1;Z2; :::;Zr] 2 66 66 66 4

0 1

...

r

3 77 77 77 5

= Pr j=0

Zj j (3.4)

where j is the vector for random factorj, and 0 ="and Z0=IN. The number of level for random factor j is denoted byqj. Recall that the random e¤ects j have the properties

E( j) = 0 and var( j) = 2iIqi with 2 = 2" and

(33)

cov( j; jT) = 0 forj6=jT Thus

var( ) =D withD de…ned in (2.7).

Using these assumptions we know that

E(y) =X

and

V=var(y) = Pr j=0

2jZjZTj (3.5)

Given that y N(X ;V), the likelihood function is

L=L( ;Vjy) = exp 12(y X )TV 1(y X ) (2 )12NjVj12

(3.6) wherejVjstands for the determinant of V. The log-likelihood

logL=l( ;Vjy) = 1

2Nlog(2 ) 1

2logjVj 1

2(y X )TV 1(y X ) (3.7) Recall that the unknown parameters are and 2"; 21; :::; 2r included in V. To maximize l( ;Vjy), we di¤erentiate (3.7), …rst with respect to which yields (for rules of matrix di¤er- entiation, see Graybill, 1983)

l( ;Vjy)

=XTV 1y XTV 1X (3.8)

Second, di¤erentiating (3.7) with respect to 2j with

(34)

V

2 j

=ZjZTj gives forj= 0; :::; r

l( ;Vjy)

2 j

= 1

2tr(V 1ZjZTj) +1

2(y X )TV 1ZjZTjV 1(y X ) (3.9) Equation (3.9) is a complicated function of the variance components. Hence expressions for the solutions have to be obtained numerically, usually using either an iteratively reweighted least squares or a Newton-Raphson procedure. In the mixed linear theory this problem has been extensively studied and numerous algorithms have been developed (see e.g. Jennrich and Schluchter, 1986 or Lindstrom and Bates, 1988).

3.1.2 REML in the general mixed linear models

Patterson and Thompson (1971) introduced restricted maximum likelihood estimation (REML) as a method of estimating variance components in the context of unbalanced incomplete block designs. REML is often preferred to maximum likelihood estimation because it takes into account the loss of degrees of freedom in estimating the mean and hence produces unbiased estimating equations for the variance parameters. Alternative and more general derivations of REML are given by Harville (1977).

As for the maximum likelihood estimator we …rst de…ne the REML in the case of the general mixed linear models. Rather than using the data vector y directly, REML is based on linear combinations of elements ofy, chosen in such a way that the resulting model does not contain any …xed elements. This arises from starting with a set of valueskTy where the vectors kT of size1 N are chosen so thatkTy= kTX +kTZucontains no term in , i.e. so that

kTX =0 (3.10)

Hence

(35)

The di¤erentiation with respect to 2j gives forj= 0; :::; r lR( ;Vjy)

2j

= 1

2tr(PZjZTj) +1

2yT PZjZTjV 1Py (3.13) with (see Kathri, 1966),

P=K(KTVK) 1KT

The transformation ofyintoKTyis not suitable for a multivariate normal formulation. Indeed, for example, in the one factor within-subject ANOVA model, X=en Il ( = ) thenM= IN (en Il) (en Il)+and( )+is the Moore-Penrose inverse. It is clear that it is not possible to recover a structure like in (2.12) from the REML formulation, so that we won’t pursue this route here. In terms of robustness, Richardson and Welsh (1994) have shown that the REML is not robust and hence have proposed a robust version (see later).

3.2 Robust estimators

Robust statistics is an extension of parametric statistics, taking into account that parametric models are only approximation of the reality. It is concerned with the behavior of statis- tical procedures (tests, estimators,...) under small model deviations. Moreover, robust sta- tistics must provide statistical procedures which are reliable and reasonably e¢ cient under deviations from the assumed parametric models. We de…ne a few basic concepts developed in robust statistics that will be used in the present work. We …rst de…ne a local concept, namely the in‡uence function. It measures the asymptotic bias caused to an estimator by an in…nitesimal amount of contamination at some particular point. This local concept is com- plemented by a global notion, the breakdown point which measures the maximal proportion of contamination that an estimator can tolerate without taking arbitrarily large values. For example, Maronna (1976) showed that robust estimator based on a weighting scheme that is not redescending (no weight of zero) fails to be robust in high dimension. This happens because for such estimators, their breakdown point is at most 1=(p+ 1), p being the dimen- sion of the data. When working in high dimension it is therefore crucial to consider high

(36)

breakdown estimators. An estimator can be robust in the in…nitesimal sense i.e. to have a bounded in‡uence function but not in the global one i.e. to have a low breakdown point.

For more details and formulae on robustness measures, see e.g. Hampel et al. (1986).

Most of the work about robustness in the variance components estimation and/or repeated measures analysis has been done through the mixed linear models. There is an extensive literature about robust analysis of variance in mixed linear models. A review of the various methods can be found in Stahel and Welsh (1997) and in Welsh and Richardson (1997). Several di¤erent robust procedures are proposed but most of them are based on the maximum likelihood principle. In fact, most of these methods propose the use of a robusti…ed log-likelihood instead of the classical log-likelihood objective function. To robustify the log-likelihood, one can for example replace the quadratic function in the log-likelihood by a slower growing one, see Huggins (1993a, 1993b) and Huggins and Staudte (1994). Another solution is to modify the estimating equations rather than the likelihood itself, see Richardson (1997). Richardson and Welsh (1995) propose two robust versions of the restricted maximum likelihood estimator in the general mixed linear models. All these methods are based on the fact that variance component models are based on an additive decomposition of variability into several components. Rocke (1983) states that this decomposition is a particular property of the variance which is not shared by other measures of spread. This is natural when the estimators are derived by modifying de…nitions of this form but other methods of de…ning estimators can be used as well. In particular, an estimator can be de…ned by the algorithm used to compute it and by modifying the algorithmic de…nition of such estimators to construct robust estimator. Rocke (1983) proposes an estimator of this type.

3.2.1 Estimation by maximizing a robusti…ed likelihood

A general method for obtaining robust estimators is to de…ne a robusti…ed likelihood by replac- ing the quadratic function in (3.7) by a slower growing one in the sense that it has bounded derivatives. In the context of mixed linear models, Huggins (1993a) and Huggins and Staudte (1994) propose using

(37)

kTX= 0 (3.11) Harville (1977) refers tokTy forkT satisfying (3.11) as being an “error contrast”: its expected value is zero,

E(kTy) =kTX =0

The maximum possible number of linearly independent error contrasts kT is N v where v = rank(X). For example, a particular set of N v linearly independent error contrast is given byKTywhereKT is a(N v) N matrix whose rows are anyN vlinearly independent rows kT of the matrixM=I X(XTX) XT:

ForKTX= 0, we then have the model

KTy N(0;KTVK)

The REML equations can be derived from the ML equations (3.9), in which the terms are transformed. Indeed by making use of the transformation by KT, we replace

y byKTy X byKTX= 0

Zby KTZ V by KTVK so that tr(V 1ZjZTj)in (3.9) becomes

tr[(KTVK) 1KTZjZTjK] =yTK(KTVK) 1KTZjZTjK(KTVK) 1KTy (3.12) Finally, letLRbe the likelihood function of KTyand de…ne the log-likelihoodlR as

lR( ;Vjy) = logLR( ;Vjy) = 1

2(N r) log(2 ) 1

2log KTVK 1

2(yTK(KTVK) 1KTy)

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Le succès des délégations uniques du personnel ne se dément pas : parmi les entreprises de moins de 200 salariés ayant la possibilité de mettre en place cette

The graphs of quantiles of the sample of each component estimations of the parameter vector for the 1000 generated samples of rankings with common size 200 against the

Filamentous structures resembling viral nucleocapsids (measles) have also been described in otosclerotic lesions from two patients (McKenna et al., 1986).. Considering the

In this paper we are concerned with the application of the Kalman filter and smoothing methods to the dynamic factor model in the presence of missing observations. We assume

Estimation of these parameters, the largest eigenvalues and per- taining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of

Abstract - This paper discusses the restricted maximum likelihood (REML) approach for the estimation of covariance matrices in linear stochastic models, as implemented

Meyer K (1989) Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. User