A Finite Mixture Model with Trajectories Depending on Covariates

(1)

A Finite Mixture Model with Trajectories Depending on Covariates

Jang SCHILTZ (University of Luxembourg) joint work with

Jean-Daniel GUIGOU (University of Luxembourg),

& Bruno LOVAT (University of Lorraine)

June 11, 2014

(2)

Outline

1 The Basic Finite Mixture Model of Nagin

2 Generalizations of the basic model

3 Our model

(3)

Outline

3 Our model

(4)

Outline

3 Our model

(5)

Outline

3 Our model

(6)

General description of Nagin’s model

We have a collection of individual trajectories.

We try to divide the population into a number of homogenous

sub-populations and to estimate, at the same time, a typical trajectory for each sub-population.

Hence, this model can be interpreted as functional fuzzy cluster analysis. Finite mixture model Daniel S. Nagin (Carnegie Mellon University)

mixture : population composed of a mixture of unobserved groups finite : sums across a finite number of groups

(7)

General description of Nagin’s model

Hence, this model can be interpreted as functional fuzzy cluster analysis. Finite mixture model Daniel S. Nagin (Carnegie Mellon University)

(8)

General description of Nagin’s model

Hence, this model can be interpreted as functional fuzzy cluster analysis.

Finite mixture model Daniel S. Nagin (Carnegie Mellon University) mixture : population composed of a mixture of unobserved groups finite : sums across a finite number of groups

(9)

General description of Nagin’s model

Finite mixture model Daniel S. Nagin (Carnegie Mellon University)

(10)

General description of Nagin’s model

Finite mixture model Daniel S. Nagin (Carnegie Mellon University) mixture : population composed of a mixture of unobserved groups

finite : sums across a finite number of groups

(11)

General description of Nagin’s model

Finite mixture model Daniel S. Nagin (Carnegie Mellon University) mixture : population composed of a mixture of unobserved groups finite : sums across a finite number of groups

(12)

The Likelihood Function (1)

Consider a population of size N and a variable of interest Y .

Let Y_i = y_i₁, y_i₂, ..., y_i_T be T measures of the variable, taken at times t₁, ...t_T for subject number i .

πj : probability of a given subject to belong to group number j

⇒ π_j is the size of group j .

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

where P^j(Yi) is probability of Yi if subject i belongs to group j .

(13)

The Likelihood Function (1)

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

(14)

The Likelihood Function (1)

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

(15)

The Likelihood Function (1)

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

(16)

The Likelihood Function (1)

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

(17)

The Likelihood Function (1)

⇒ P(Y_i) =

r

X

j =1

π_jP^j(Y_i), (1)

j

(18)

The Likelihood Function (2)

Aim of the analysis: Find r groups of trajectories of a given kind (for instance polynomials of degree 4, P(t) = β0+ β1t + β2t²+ β3t³+ β4t⁴.

We try to estimate a set of parameters Ω =n

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, π_jo which allow to maximize the probability of the measured data.

Possible data distributions:

count data ⇒ Poisson distribution

binary data ⇒ Binary logit distribution

censored data ⇒ Censored normal distribution

(19)

The Likelihood Function (2)

Aim of the analysis: Find r groups of trajectories of a given kind (for instance polynomials of degree 4, P(t) = β0+ β1t + β2t²+ β3t³+ β4t⁴. We try to estimate a set of parameters Ω =n

count data ⇒ Poisson distribution binary data ⇒ Binary logit distribution

(20)

The Likelihood Function (2)

(21)

The Likelihood Function (2)

count data ⇒ Poisson distribution

binary data ⇒ Binary logit distribution

(22)

The Likelihood Function (2)

(23)

The Likelihood Function (2)

(24)

The case of a normal distribution (1)

Notations :

β^jt_i_t = β₀^j + β₁^jAge_i_t + β₂^jAge_i²

t + β₃^jAge_i³

t + β₄^jAge_i⁴

t. φ: density of standard centered normal law.

Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

φ y_i_t − β^jtit

σ

. (2)

It is too complicated to get closed-forms equations.

(25)

The case of a normal distribution (1)

Notations :

t + β₃^jAge_i³

t + β₄^jAge_i⁴

t.

φ: density of standard centered normal law. Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

σ

. (2)

(26)

The case of a normal distribution (1)

Notations :

t + β₃^jAge_i³

t + β₄^jAge_i⁴

Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

σ

. (2)

(27)

The case of a normal distribution (1)

Notations :

t + β₃^jAge_i³

t + β₄^jAge_i⁴

Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

σ

. (2)

(28)

The case of a normal distribution (1)

Notations :

t + β₃^jAge_i³

t + β₄^jAge_i⁴

Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

σ

. (2)

(29)

The case of a normal distribution (1)

Notations :

t + β₃^jAge_i³

t + β₄^jAge_i⁴

Then,

L = 1 σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

σ

. (2)

(30)

Available Software

SAS-based Proc Traj procedure

By Bobby L. Jones (Carnegie Mellon University).

Uses a quasi-Newton procedure maximum research routine.

Since the likelyhood is nor convex, nor a contraction, there are issues with local maxima.

R-package crimCV

By Jason D. Nielsen (Carleton University Ottawa). Just implements a zero-inflation Poission model.

(31)

Available Software

R-package crimCV

(32)

Available Software

R-package crimCV

(33)

Available Software

R-package crimCV

(34)

Available Software

R-package crimCV

By Jason D. Nielsen (Carleton University Ottawa).

Just implements a zero-inflation Poission model.

(35)

Available Software

R-package crimCV

By Jason D. Nielsen (Carleton University Ottawa).

Just implements a zero-inflation Poission model.

(36)

Model Selection (1)

Bayesian Information Criterion:

BIC = log(L) − 0, 5k log(N), (3)

where k denotes the number of parameters in the model.

Rule:

The bigger the BIC, the better the model!

(37)

Model Selection (2)

Leave-one-out Cross-Validation Apporach:

CVE = 1 N

N

X

i =1

1 T

T

X

t=1

y_i_t − ˆy_i^{[−i ]}

t

. (4)

Rule:

The smaller the CVE, the better the model!

(38)

Posterior Group-Membership Probabilities

Posterior probability of individual i ’s membership in group j : P(j /Yi).

Bayes’s theorem

⇒ P(j/Y_i) = P(Y_i/j )ˆπ_j

r

X

j =1

P(Yi/j )ˆπj

. (5)

Bigger groups have on average larger probability estimates.

To be classified into a small group, an individual really needs to be

(39)

An application example

The data : first dataset Salaries of workers in the private sector in Luxembourg from 1940 to 2006.

About 7 million salary lines corresponding to 718.054 workers. Some sociological variables:

gender (male, female)

nationality and residentship (luxemburgish residents, foreign residents, foreign non residents)

working status (white collar worker, blue collar worker) year of birth

age in the first year of professional activity

(40)

An application example

About 7 million salary lines corresponding to 718.054 workers. Some sociological variables:

(41)

An application example

About 7 million salary lines corresponding to 718.054 workers.

Some sociological variables: gender (male, female)

(42)

An application example

Some sociological variables:

(43)

An application example

(44)

An application example

working status (white collar worker, blue collar worker)

year of birth

(45)

An application example

(46)

An application example

(47)

Result for 9 groups (dataset 1)

(48)

Result for 9 groups (dataset 1)

(49)

Results for 9 groups (dataset 1)

(50)

Outline

3 Our model

(51)

Predictors of trajectory group membership

xi : vector of variables potentially associated with group membership (measured before t1).

Multinomial logit model:

π_j(x_i) = e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

, (6)

where θj denotes the effect of xi on the probability of group membership.

L = 1 σ

N

Y

i =1 r

X

j =1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

φ y_i_t − β^jt_i_t σ

. (7)

(52)

Predictors of trajectory group membership

xi : vector of variables potentially associated with group membership (measured before t₁).

r

X

k=1

e^xⁱ^θ^k

, (6)

L = 1 σ

N

Y

i =1 r

X

j =1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

. (7)

(53)

Predictors of trajectory group membership

r

X

k=1

e^xⁱ^θ^k

, (6)

L = 1 σ

N

Y

i =1 r

X

j =1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

. (7)

(54)

Predictors of trajectory group membership

r

X

k=1

e^xⁱ^θ^k

, (6)

L = 1 σ

N

Y

r

X e^xⁱ^θ^j

r

T

Yφ y_i_t − β^jt_i_t σ

. (7)

(55)

Adding covariates to the trajectories (1)

Let z₁...z_L be covariates potentially influencing Y . We are then looking for trajectories

y_i_t = β₀^j+β^j₁Age_i_t+β₂^jAge_i²_t+β₃^jAge_i³_t+β₄^jAge_i⁴_t+α^j₁z₁+...+α^j_Lz_L+ε_i_t, (8) where εit ∼ N (0, σ), σ being a constant standard deviation and z_l are covariates that may depend or not upon time t.

Unfortunately the influence of the covariates in this model is limited to the intercept of the trajectory.

(56)

Adding covariates to the trajectories (1)

Let z₁...z_L be covariates potentially influencing Y .

We are then looking for trajectories

(57)

Adding covariates to the trajectories (1)

(58)

Adding covariates to the trajectories (1)

(59)

Adding covariates to the trajectories (2)

(60)

Adding covariates to the trajectories (2)

(61)

Outline

3 Our model

(62)

Our model

Let x1...x_L and z_i₁, ..., z_i_T be covariates potentially influencing Y . We propose the following model:

y_i_t = β₀^j +

L

X

l =1

α^j_0lx_l+ γ₀^jz_i_t

!

+ β₁^j +

L

X

l =1

α^j_1lx_l+ γ₁^jz_i_t

! Age_i_t

+ β₂^j +

L

X

l =1

α^j_2lx_l + γ₂^jz_i_t

!

Age_i²_t + β₃^j +

L

X

l =1

α^j_3lx_l+ γ₃^jz_i_t

! Age_i³_t

+ β₄^j +

L

X

l =1

α^j_4lx_l + γ₄^jz_i_t

!

Age_i⁴_t + ε_i_t,

where ε_i_t ∼ N (0, σ), σ being a constant standard deviation.

(63)

Our model

Let x1...x_L and z_i₁, ..., z_i_T be covariates potentially influencing Y .

We propose the following model:

y_i_t = β₀^j +

L

X

l =1

!

+ β₁^j +

L

X

l =1

! Age_i_t

+ β₂^j +

L

X

l =1

!

L

X

l =1

! Age_i³_t

+ β₄^j +

L

X

l =1

!

where ε_i_t ∼ N (0, σ), σ being a constant standard deviation.

(64)

Our model

Let x1...x_L and z_i₁, ..., z_i_T be covariates potentially influencing Y . We propose the following model:

y_i_t = β₀^j +

L

X

l =1

!

+ β₁^j +

L

X

l =1

! Age_i_t

+ β₂^j +

L

X

l =1

!

L

X

l =1

! Age_i³_t

+ β₄^j +

L

X

l =1

!

(65)

Men versus women

(66)

Statistical Properties

Both Nagin’s and our model can be written as L = 1

σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

φ observed data - modelled data σ

. (9)

Nagin’s model:

yit = β₀^j+β^j₁Ageit+β₂^jAge_i²_t+β₃^jAge_i³_t+β₄^jAge_i⁴_t+α^j₁z1+...+α^j_LzL+εit, (10) Our model:

yit = β₀^j +

L

X

l =1

α^j_0lx_l+ γ₀^jzit

!

+ β₁^j +

L

X

l =1

α^j_1lx_l+ γ₁^jzit

! Ageit

+ β₂^j +

L

X

l =1

α^j_2lx_l + γ₂^jzit

!

L

X

l =1

α^j_3lx_l+ γ₃^jzit

! Age_i³_t

+ β₄^j +

L

X

l =1

!

(67)

Statistical Properties

Both Nagin’s and our model can be written as L = 1

σ

N

Y

i =1 r

X

j =1

πj T

Y

t=1

φ observed data - modelled data σ

. (9)

Nagin’s model:

yit = β₀^j+β^j₁Ageit+β₂^jAge_i²_t+β₃^jAge_i³_t+β₄^jAge_i⁴_t+α^j₁z1+...+α^j_LzL+εit, (10) Our model:

yit = β₀^j +

L

X

l =1

α^j_0lx_l+ γ₀^jzit

!

+ β₁^j +

L

X

l =1

α^j_1lx_l+ γ₁^jzit

! Ageit

+ β₂^j +

L

X

l =1

!

L

X

l =1

! Age_i³_t

(68)

Parameter estimation

A way of estimating our model with the existing software (if group membership does not depend on the covariates) :

Use the latest version of proc.traj to test if the covariates have indeed an influence on the trajectories.

Apply proc.traj to the data without covariates do the clustering and obtain the number of groups and the constitution of the groups. Use your favorite regression model software to get the trajectories separately for each group.

(69)

Parameter estimation

Apply proc.traj to the data without covariates do the clustering and obtain the number of groups and the constitution of the groups. Use your favorite regression model software to get the trajectories separately for each group.

(70)

Parameter estimation

Apply proc.traj to the data without covariates do the clustering and obtain the number of groups and the constitution of the groups.

Use your favorite regression model software to get the trajectories separately for each group.

(71)

Parameter estimation

Apply proc.traj to the data without covariates do the clustering and obtain the number of groups and the constitution of the groups.

Use your favorite regression model software to get the trajectories separately for each group.

(72)

Bibliography

Nagin, D.S. 2005: Group-based Modeling of Development.

Cambridge, MA.: Harvard University Press.

Jones, B. and Nagin D.S. 2007: Advances in Group-based Trajectory Modeling and a SAS Procedure for Estimating Them. Sociological Research and Methods 35 p.542-571.

Nielsen, J.D. et al. 2012: Group-Based criminal trajectory analysis using cross-validation criteria.

http://www.probability.ca/jeff/research.html.

Muth´en, B., Shedden, K. 1999: Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm. Biometrics 55 p.463-469.

Guigou, J.D, Lovat, B. and Schiltz, J. 2012: Optimal mix of funded and unfunded pension systems: the case of Luxembourg. Pensions 17-4 p. 208-222.