• Aucun résultat trouvé

TrajeR an R package for the clustering of longitudinal data

N/A
N/A
Protected

Academic year: 2021

Partager "TrajeR an R package for the clustering of longitudinal data"

Copied!
31
0
0

Texte intégral

(1)

An R package for the clustering of longitudinal data

C´edric NOEL1, Jang SCHILTZ2

1University of Luxembourg and IUT of Thionville-Yutz, University of Lorraine, Espace

Cormontaigne Impasse Alfred Kastler F-57970 Yutz, France

2Department of Finance, University of Luxembourg, 6, rue Richard Coudenhove-Kalergi

(2)

Model’s presentation

Starting from a collection of individual trajectories, the aim of group-based trajectory modeling (GBTM) is to divide the

population into a number of homogeneous sub-populations and to estimate, at the same time, a typical trajectory for each

sub-population

(3)

Model’s presentation

Let Yi = yi1, yi2, ..., yiT be T measures of the variable Y , taken at

times t1, ..., tT for subject number i .

Denote the probability of a given subject to belong to group number k by πk for a number of groups K .

P(Yi) = K

X

k=1

πkgk(yi; Θk). (1)

(4)

Model’s presentation

The density gk is influenced by a function of time denoted Ai that

are related to the shape of the trajectories with parameters βk and

sometimes by some covariate denoted Wi with parameters δk.

The membership probability can be influenced by some covariate denoted Xi with parameters θk.

(5)

Model’s presentation

Trajectory

X1 X2 X3 ... Xn

t1 t2 t3 ... tn W1 W2 W3 ... Wn Covariate / membership probability

Data

Cluster

(6)

Model’s presentation

Nagin proposes 3 different distributions to modelize Y : • the censored normal distribution ;

• the logistic distribution ;

• the ZIP (Zero Inflated Poisson) distribution.

To fit his model Nagin & Jones have programmed a SAS and Stata procedure : traj.

(7)

trajeR

We propose a R package to fit this model, like traj for SAS.

(8)

trajeR

Main function :

trajeR(Y, A, Risk=NULL,TCOV =NULL, ng, degre,degre.nu =0, Model,Method =”L”,ssigma= FALSE,ymax =max(Y)+ 1,ymin=min(Y)- 1,hessian=TRUE,itermax= 100,paraminit =NULL,EMIRLS =TRUE,refgr =1,fct= NULL,diffct = NULL,nbvar =NULL,nls.lmiter)

(9)

trajeR - Censored Normal Model

We suppose that the variable Yit is a censored variable.

We consider a variable Yit∗ that is normally distributed and such yit∗ = f (ait; βk, δk) + it = βkAit+ δkWt+ it (2) where it ∼ N (0; σ), Ait = (1, ait, a2it, · · · , a nβ−1 it )t, Wt = (wi 1, · · · , winδ)t, βk = (βk1, · · · , βknβ) and δk = (δk1, · · · , δknδ)

Furthermore, we can link yit∗ to the observed and censored data yit.

yit = ymin if yit∗ < ymin (3)

yit = yit∗ if ymin≤ yit∗ ≤ ymax (4)

(10)

trajeR - Censored Normal Model

we simulate 500 trajectories to fit this model

(11)

trajeR - Censored Normal Model

We suppose K = 3 and we search polynomial shapes with degree 0, 3 and 4.

(12)

trajeR - Censored Normal Model

(13)

trajeR - Censored Normal Model plot(solL) 5 10 15 20

Values and predicted trajectories for all groups

V

(14)

trajeR - Censored Normal Model

plot(solL, Y= data[,2:11], A= data[,12:21], col = vcol)

2 4 6 8 10 −10 0 10 20 30

Values and predicted trajectories for all groups

Time

V

alue

(15)

trajeR - Censored Normal Model

adding a covariate influencing group membership

(16)

trajeR - Censored Normal Model

(17)

trajeR - Censored Normal Model

Adding time dependent covariate

(18)

trajeR - Censored Normal Model

(19)

trajeR - Zero Inflated Poisson

The ZIP employ two different process : a binary distribution that generate structural zero, i.e. the excess zeros, and and poisson distribution that generate the counts.

We have

P (Yit = yit|Wi = wi , Ci = ci ) =

(

ρikt+ (1 − ρikt)e−λikt, yit = 0

(1 − ρikt)

λyitikte−λikt

yit! , yit > 0

(20)

trajeR - Zero Inflated Poisson

paraminit= c(1,1,-3.066653,0,0,-0.046659,0,0,-3,0,-3,0) solL=trajeR(Y= data[,2:6], A= data[,7:11], ng=2,

degre=c(2,2), degre.nu= c(1,1),Model=”ZIP”,Method= ”L”, hessian= TRUE,paraminit = paraminit)

(21)

trajeR - Zero Inflated Poisson

1 2 3 4 5

Values and predicted trajectories for all groups

Time

V

(22)

trajeR - Logistic

Let ρikt = P(Yit = 1|Wi = wi, Ci = k) the probability of yit= 1

given membership in group k. ρikt=

eβkAit+δkWit

1 + eβkAit+δkWit (7)

(23)

trajeR - Logit

(24)

trajeR - Logit

2 4 6 8 10

Values and predicted trajectories for all groups

(25)

trajeR - Non Linear

We suppose that the variable Yit is defined by

yit = f (ait; βk) + it (8)

where it ∼ N (0; σk).

In this case, the function fk is not linear in the parameters βk and

(26)

trajeR - Non Linear Example with f (t; βk) = βk1t βk2+ t 0 2 4 6 8 10 0 1 2 3 4 5 6

Plot of the individual's trajectories

Times

V

alues

By analysing the graph above, we fit initial parmaters as paraminit=c(0.25,0.25,0.25,0.25,2,

0.1,2.4,0.1,2.8,0.1,3,0.1,0.2,0.2,0.2,0.2)

(27)

trajeR - Non Linear

fct< − function(t,betak,TCOV){return(

(betak[1]*t)/(betak[2]+t) )

diffct< − function(t,betak,TCOV){return(c( t/(betak[2]+t),

-(betak[1]*t)/(betak[2]+t)**2 ))

(28)

trajeR - Non Linear 0 2 4 6 8 10 0 1 2 3 4 5

Values and predicted trajectories for all groups

Time

V

alue

(29)

trajeR - Some functions

(30)

trajeR - Model performance

• Average Posterior Probability – AvePP(...) • Odds of Correct Classification – OCC(...)

• Estimated group probabilities versus proportion of the sample assigned to the group – propAssign(...)

• Confidence interval – ConfIntT(...) • Summary – adequacy(...)

(31)

trajeR - Model selection

• AIC – trajeRAIC(...) • BIC – trajeRBIC(...)

Références

Documents relatifs

Algorithms from and for Nature and Life: Classification and Data Analysis, chapter Clustering Ordinal Data via Latent Variable Models, pages 127–135. Springer International

In order to investigate the effect of the sample size on clustering results in high- dimensional spaces, we try to cluster the data for different dataset sizes as this phenomenon

The vignette discuss the detailed biological/clinical analysis strategy used at Institut Curie and presents an application to a gene expression

HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data.. Laurent Bergé, Charles Bouveyron,

This flattened approach can also be used for smaller inversions, and there is no limit in terms of depth or imbrication: if there is, in a given tradition, variation in the order

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data.. This package is

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des