• Aucun résultat trouvé

Analysis of longitudinal imaging data with OLS & sandwich estimator standard errors

N/A
N/A
Protected

Academic year: 2021

Partager "Analysis of longitudinal imaging data with OLS & sandwich estimator standard errors"

Copied!
30
0
0

Texte intégral

(1)

Analysis of longitudinal imaging data with OLS

& Sandwich Estimator standard errors

Bryan Guillaume

GSK / University of Liège / University of Warwick

FMRIB Centre - 07 Mar 2012

Supervisors: Thomas Nichols (Warwick University) and Christophe Phillips (Liège University)

(2)

Outline

1 Introduction

2 The Sandwich Estimator method

3 An adjusted Sandwich Estimator method

(3)

Example of longitudinal studies in neuroimaging

Example 1

Effect of drugs (morphine and alcohol) versus placebo over time on Resting State Networks in the brain (Khalili-Mahani et al, 2011) 12 subjects 21 scans/subject!!! Balanced design Study design:

(4)

Example of longitudinal studies in neuroimaging

Example 2

fMRI study of longitudinal changes in a population of adolescents at risk for alcohol abuse (Heitzeg et al, 2010) 86 subjects 2 groups 1, 2, 3 or 4 scans/subjects (missing data) Total of 224 scans Very unbalanced design (no common time points for scans) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 16 18 20 22 24 26 Observation index Age

(5)

Why is it challenging to model longitudinal data in

neuroimaging ?

Longitudinal modeling is a standard biostatistical problem and standard solutions exist:

Gold standard: Linear Mixed Effects (LME) model

Iterative method → generally slow and may fail to converge

E.g., 12 subjects, 8 visits, Toeplitz, LME with unstructured intra-visit correlation fails to converge 95 % of the time. E.g., 12 subjects, 8 visits, CS, LME with random int. and random slope fails to converge 2 % of the time.

LME model with a random intercept per subject May be slow (iterative method) and only valid with

Compound Symmetric (CS) intra-visit correlation structure Naive-OLS (N-OLS) model which include subject indicator variables as covariates

(6)

Why is it challenging to model longitudinal data in

neuroimaging ?

Longitudinal modeling is a standard biostatistical problem and standard solutions exist:

Gold standard: Linear Mixed Effects (LME) model

Iterative method → generally slow and may fail to converge

E.g., 12 subjects, 8 visits, Toeplitz, LME with unstructured intra-visit correlation fails to converge 95 % of the time. E.g., 12 subjects, 8 visits, CS, LME with random int. and random slope fails to converge 2 % of the time.

LME model with a random intercept per subject May be slow (iterative method) and only valid with

Compound Symmetric (CS) intra-visit correlation structure Naive-OLS (N-OLS) model which include subject indicator variables as covariates

(7)

Why is it challenging to model longitudinal data in

neuroimaging ?

Longitudinal modeling is a standard biostatistical problem and standard solutions exist:

Gold standard: Linear Mixed Effects (LME) model

Iterative method → generally slow and may fail to converge

E.g., 12 subjects, 8 visits, Toeplitz, LME with unstructured intra-visit correlation fails to converge 95 % of the time. E.g., 12 subjects, 8 visits, CS, LME with random int. and random slope fails to converge 2 % of the time.

LME model with a random intercept per subject

May be slow (iterative method) and only valid with

Compound Symmetric (CS) intra-visit correlation structure Naive-OLS (N-OLS) model which include subject indicator variables as covariates

(8)

Why is it challenging to model longitudinal data in

neuroimaging ?

Longitudinal modeling is a standard biostatistical problem and standard solutions exist:

Gold standard: Linear Mixed Effects (LME) model

Iterative method → generally slow and may fail to converge

E.g., 12 subjects, 8 visits, Toeplitz, LME with unstructured intra-visit correlation fails to converge 95 % of the time. E.g., 12 subjects, 8 visits, CS, LME with random int. and random slope fails to converge 2 % of the time.

LME model with a random intercept per subject

May be slow (iterative method) and only valid with

Compound Symmetric (CS) intra-visit correlation structure

Naive-OLS (N-OLS) model which include subject indicator variables as covariates

(9)

Outline

1 Introduction

2 The Sandwich Estimator method

3 An adjusted Sandwich Estimator method

(10)

The Sandwich Estimator (SwE) method

Use of a simple OLS model (without subject indicator variables)

The fixed effects parameters β are estimated by ˆ βOLS = M X i=1 Xi0Xi !−1 M X i=1 Xi0yi

The fixed effects parameters covariance var( ˆβOLS)are

estimated by SwE = M X i=1 Xi0Xi !−1 | {z } Bread M X i=1 Xi0VˆiXi ! | {z } Meat M X i=1 Xi0Xi !−1 | {z } Bread

(11)

Property of the Sandwich Estimator (SwE)

SwE = M X i=1 Xi0Xi !−1 M X i=1 Xi0VˆiXi ! M X i=1 Xi0Xi !−1

If Vi are consistently estimated, the SwE tendsasymptotically

(Large samples assumption) towards the true variance var( ˆβOLS). (Eicker, 1963; Eicker, 1967; Huber, 1967; White,

(12)

The Heterogeneous HC0 SwE

In practice, Vi is generally estimated from the residuals

ri =yi− Xiβˆby

ˆ Vi =rir

0

i

and the SwE becomes

Het. HC0 SwE = M X i=1 Xi0Xi !−1 M X i=1 Xi0riri0Xi ! M X i=1 Xi0Xi !−1

(13)

Simulations: setup

Monte Carlo Gaussian null simulation (10,000 realizations) For each realization,

1 Generation of longitudinal Gaussian null data (no effect)

with a CS or a Toeplitz intra-visit correlation structure: Compound Symmetric       1 0.8 0.8 0.8 0.8 0.8 1 0.8 0.8 0.8 0.8 0.8 1 0.8 0.8 0.8 0.8 0.8 1 0.8 0.8 0.8 0.8 0.8 1       Toeplitz       1 0.8 0.6 0.4 0.2 0.8 1 0.8 0.6 0.4 0.6 0.8 1 0.8 0.6 0.4 0.6 0.8 1 0.8 0.2 0.4 0.6 0.8 1      

2 Statistical test (F-test at α) on the parameters of interest

using each different methods (N-OLS, LME and SWE) and recording if the method detects a (False Positive) effect

(14)

Simulations: LME vs N-OLS vs Het. HC0 SwE

● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 0 100 200 300 400 500 600 700 ● Het. HC0 SwE N−OLS LME

Linear effect of visits Group 2 versus group 1

Compound symmetry 8 vis., F(1,N−p) at 0.05 Number of subjects Relativ e FPR (%) ● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 0 100 200 300 400 500 600 700 ● Het. HC0 SwE N−OLS LME

Linear effect of visits Group 2 versus group 1

Toeplitz 8 vis., F(1,N−p) at 0.05

Number of subjects

Relativ

(15)

Outline

1 Introduction

2 The Sandwich Estimator method

3 An adjusted Sandwich Estimator method

(16)

Bias adjustments: the Het. HC2 SwE

In an OLS model, we have

(I − H)var(y )(I − H) = var(r ) where H = X (X0X )−1X0

Under independent homoscedastic errors, (I − H)σ2=var(r ) (1 − hik)σ2=var(rik) σ2=var  rik √ 1 − hik 

This suggests to estimate Vi by

ˆ Vi =ri∗r∗ 0 i where rik∗ = rik √ 1 − hik

(17)

Bias adjustments: the Het. HC2 SwE

Using in the SwE ˆ Vi =ri∗r ∗0 i where r ∗ ik = rik √ 1 − hik We obtain Het. HC2 SwE = M X i=1 Xi0Xi !−1 M X i=1 Xi0ri∗ri∗0Xi ! M X i=1 Xi0Xi !−1

(18)

Homogeneous SwE

In the standard SwE, each Vi is normally estimated from only

the residuals of subject i. It is reasonable to assume a common

covariance matrix V0for all the subjects and then, we have

ˆ V0kk0 = 1 Nkk0 Nkk 0 X i=1 rikrik0 ˆ

V0kk0: element of ˆV0corresponding to the visits k and k0 Nkk0: number of subjects with both visits k and k0

rik: residual corresponding to subject i and visit k

rik0: residual corresponding to subject i and visit k0

ˆ

(19)

Null distribution of the test statistics with the SwE

H0:L ˆβ =0, H1:L ˆβ 6=0

L: contrast matrix of rank q

Using multivariate statistics theory and assuming a balanced design, we can derive the test statistic

M − pB− q + 1

(M − pB)q

(L ˆβ)0(LSwEL0)−1(L ˆβ) ∼F (q, M − pB− q + 1)

q=1, the test becomes

(20)

Simulations: LME vs N-OLS vs unadjusted SwE

● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 0 100 200 300 400 500 600 700 ● Het. HC0 SwE N−OLS LME

Linear effect of visits Group 2 versus group 1

Compound symmetry 8 vis., F(1,N−p) at 0.05 Number of subjects Relativ e FPR (%) ● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 0 100 200 300 400 500 600 700 ● Het. HC0 SwE N−OLS LME

Linear effect of visits Group 2 versus group 1

Toeplitz 8 vis., F(1,N−p) at 0.05

Number of subjects

Relativ

(21)

Simulations: unadjusted SwE vs adjusted SwE

● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 80 100 120 140 160 180 200

● Het. HC0 SwE with F(1,N−p)

Hom. HC2 SwE with F(1,M−2)

Linear effect of visits Group 2 versus group 1

Compound symmetry 8 vis., F−test at 0.05 Number of subjects Relativ e FPR (%) ● ● ● ● ● Number of subjects Relativ e FPR (%) 12 50 100 200 80 100 120 140 160 180 200

● Het. HC0 SwE with F(1,N−p)

Hom. HC2 SwE with F(1,M−2)

Linear effect of visits Group 2 versus group 1

Toeplitz 8 vis., F−test at 0.05

Number of subjects

Relativ

(22)

Simulation with real design

Example 2

fMRI study of longitudinal changes in a population of adolescents at risk for alcohol abuse 86 subjects 2 groups 1, 2, 3 or 4 scans/subjects (missing data) Total of 224 scans Very unbalanced design (no common time points for scans) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 16 18 20 22 24 26 Observation index Age

(23)

Simulation with real design

Example 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 0 1 2 3 4 5 6 7 Scans index Relativ e age

(24)

Simulation with real design

Example 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 0 1 2 3 4 5 6 7 Scans index relativ e Age Scans index relativ e Age

(25)

Real design

Gr1 Age.L Gr1 Age.Q Gr1 Gr2 vs Gr1 Age.L Gr2 vs Gr1 Age.Q Gr2 vs Gr1 Het. HC0 SwE Hom. HC2 SwE N−OLS LME Relativ e FPR (%) 1092 % 1079 % 50 100 150 200 250 300 350 Real design Compound Symmetry F(1,M−2) at 0.05 for Hom. HC2 OTW F(1,N−p) at 0.05

(26)

Real design

Gr1 Age.L Gr1 Age.Q Gr1 Gr2 vs Gr1 Age.L Gr2 vs Gr1 Age.Q Gr2 vs Gr1 Het. HC0 SwE Hom. HC2 SwE N−OLS LME Relativ e FPR (%) 1074 % 954 % 50 100 150 200 250 300 350 Real design Toeplitz F(1,M−2) at 0.05 for Hom. HC2 OTW F(1,N−p) at 0.05

(27)

Outline

1 Introduction

2 The Sandwich Estimator method

3 An adjusted Sandwich Estimator method

(28)

Remarks about the SwE method

Power of the SwE method generally lower than the power of the LME method

Power loss not significant with a high number of subject (e.g., 86 subjects)

Power loss may be significant with a low number of subject and a low significance level α

Solution: spatial regularization of the SwE

Test statistic with an unbalanced design and a low number of subject

Estimation of the effective degrees of freedom of the test needed

(29)

Summary

Longitudinal standard methods not really appropriate to neuroimaging data:

Convergence issues with LME

N-OLS & LME with random intercepts: issues when CS does not hold

The SwE method

Accurate in a large range of settings Easy to specify

No iteration needed

Quite fast

No convergence issues

Can accommodate pure between covariates But, careful in small samples:

Adjustment essential

If low significance level, spatial regul. needed for power If unbalanced design, effective dof estimation needed

(30)

Références

Documents relatifs

We have performed an extensive set of Monte Carlo experiments on the bias and variance of the OLS of the autoregressive parameters, given a data generating process that is a

Modeling Wildfire Smoke Pollution by Integrating Land Use Regression and Remote Sensing Data: Regional Multi- Temporal Estimates for Public Health and Exposure

Therefore expanding human population and human economic activities, by CNLI model using DMSP/OLS data to estimate urbanization dynamics is patterns of urbanization, particularly

The aim of this study was: first, to define the appropriate chemi- cal method for the immobilization of flavan-3-ol monomers, dimers and oligomers onto a SPR sensor chip; second,

We prove minimax lower bounds for this problem and explain how can these rates be attained, using in particular an Empirical Risk Minimizer (ERM) method based on deconvolution

|S M | on the recognition performance. 2 showed the averaged classification accuracy for the basic and ensemble versions of the IISMC method using 600-ms long SSVEP data, where the |S

In this paper we proposed the use of the Buckley-James estimator as a nonparametric method of estimating the regression parameters of an accelerated failure time model with

We start the simulation study without additional covariates to compare our estimator in terms of `2 error to a competitor, the estimator of the row-column model 1 with different