A generalised finite mixture model

(1)

A generalised finite mixture model

Jang SCHILTZ (University of Luxembourg)

November 27, 2013

(2)

Outline

1 Nagin’s Finite Mixture Model

2 Generalization of the basic model

3 The Luxemburgish salary trajectories

(3)

Outline

(4)

Outline

(5)

Outline

(6)

General description of Nagin’s model

We have a collection of individual trajectories.

We try to divide the population into a number of homogenous

subpopulations and to estimate a mean trajectory for each subpopulation. This is still an inter-individual model, but unlike other classical models such as standard growth curve models, it allows the existence of subpolulations with completely different behaviors.

(7)

General description of Nagin’s model

subpopulations and to estimate a mean trajectory for each subpopulation.

This is still an inter-individual model, but unlike other classical models such as standard growth curve models, it allows the existence of subpolulations with completely different behaviors.

(8)

General description of Nagin’s model

subpopulations and to estimate a mean trajectory for each subpopulation.

This is still an inter-individual model, but unlike other classical models such as standard growth curve models, it allows the existence of subpolulations with completely different behaviors.

(9)

The Likelihood Function (1)

Consider a population of size N and a variable of interest Y.

Let Yi =yi1,yi2, ...,yiT be T measures of the variable, taken at times t1, ...t_T for subject numberi.

P(Y_i) denotes the probability of Y_i count data⇒ Poisson distribution

binary data ⇒Binary logit distribution

censored data⇒ Censored normal distribution

Aim of the analysis: Find r groups of trajectories of a given kind (for instance polynomials of degree 4, P(t) =β0+β1t+β2t²+β3t³+β4t⁴.

(10)

The Likelihood Function (1)

P(Y_i) denotes the probability of Y_i count data⇒ Poisson distribution binary data⇒ Binary logit distribution

(11)

The Likelihood Function (1)

P(Y_i) denotes the probability of Y_i

count data⇒ Poisson distribution binary data⇒ Binary logit distribution

(12)

The Likelihood Function (1)

P(Y_i) denotes the probability of Y_i count data⇒ Poisson distribution

binary data⇒ Binary logit distribution

(13)

The Likelihood Function (1)

(14)

The Likelihood Function (1)

(15)

The Likelihood Function (1)

(16)

The Likelihood Function (2)

πj : probability of a given subject to belong to group number j

⇒π_j is the size of group j. We try to estimate a set of parameters Ω =

n

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

o which allow to maximize the probability of the measured data.

P^j(Y_i) : probability ofY_i if subject i belongs to groupj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

Finite mixture model Daniel S. Nagin (Carnegie Mellon University) finite : sums across a finite number of groups

mixture : population composed of a mixture of unobserved groups

(17)

The Likelihood Function (2)

⇒π_j is the size of group j.

We try to estimate a set of parameters Ω = n

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(18)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(19)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(20)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(21)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

Finite mixture model Daniel S. Nagin (Carnegie Mellon University)

finite : sums across a finite number of groups

(22)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(23)

The Likelihood Function (2)

β₀^j, β₁^j, β₂^j, β₃^j, β₄^j, πj

⇒P(Y_i) =

r

X

j=1

π_jP^j(Y_i). (1)

(24)

The Likelihood Function (3)

Hypothesis: In a given group, conditional independence is assumed for the sequential realizations of the elements ofYi!!!

⇒P^j(Yi) =

T

Y

t=1

p^j(yit), (2)

where p^j(y_i_t) denotes the probability of y_i_t given membership in groupj. Likelihood of the estimator:

L=

N

Y

i=1

P(Y_i) =

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

p^j(y_i_t). (3)

(25)

The Likelihood Function (3)

⇒P^j(Yi) =

T

Y

t=1

p^j(yit), (2)

where p^j(y_i_t) denotes the probability of y_i_t given membership in groupj.

Likelihood of the estimator: L=

N

Y

i=1

P(Y_i) =

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

p^j(y_i_t). (3)

(26)

The Likelihood Function (3)

⇒P^j(Yi) =

T

Y

t=1

p^j(yit), (2)

L=

N

Y

i=1

P(Y_i) =

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

p^j(y_i_t). (3)

(27)

The Likelihood Function (3)

⇒P^j(Yi) =

T

Y

t=1

p^j(yit), (2)

L=

N

Y

i=1

P(Y_i)

=

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

p^j(y_i_t). (3)

(28)

The Likelihood Function (3)

⇒P^j(Yi) =

T

Y

t=1

p^j(yit), (2)

L=

N

Y

i=1

P(Y_i) =

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

p^j(y_i_t). (3)

(29)

The case of a normal distribution (1)

y_i_t =β₀^j +β^j₁Age_i_t +β₂^jAge_i²_t +β₃^jAge_i³_t +β₄^jAge_i⁴_t +ε_i_t, (4) where ε_i_t ∼ N(0, σ), σ being a constant standard deviation.

Notations :

β^jt_i_t =β₀^j +β₁^jAge_i_t +β₂^jAge_i²

t +β₃^jAge_i³

t +β₄^jAge_i⁴

t. φ: density of standard centered normal law.

Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(30)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(31)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

t.

φ: density of standard centered normal law. Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(32)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(33)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(34)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

Hence,

p^j(yit) = 1 σφ

y_i_t −β^jt_it

σ .

(5)

(35)

The case of a normal distribution (1)

Notations :

t +β₃^jAge_i³

t +β₄^jAge_i⁴

Hence,

p^j(yit) = 1 σφ

yit −β^jtit

σ .

(5)

(36)

The case of a normal distribution (2)

So we get

L= 1 σ

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

φ

y_i_t −β^jt_i_t σ

. (6)

It is too complicated to get closed-forms equations

⇒ quasi-Newton procedure maximum research routine Software:

SAS-based Proc Traj procedure

by Bobby L. Jones (Carnegie Mellon University).

(37)

The case of a normal distribution (2)

So we get

L= 1 σ

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

φ

. (6)

(38)

The case of a normal distribution (2)

So we get

L= 1 σ

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

φ

. (6)

(39)

The case of a normal distribution (2)

So we get

L= 1 σ

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

φ

. (6)

⇒ quasi-Newton procedure maximum research routine

Software:

(40)

The case of a normal distribution (2)

So we get

L= 1 σ

N

Y

i=1 r

X

j=1

π_j

T

Y

t=1

φ

. (6)

(41)

A computational trick

The estimations of π_j must be in [0,1].

It is difficult to force this constraint in model estimation. Instead, we estimate the real parameters θj such that

πj = e^θ^j

r

X

j=1

e^θ^j

, (7)

Finally,

L= 1 σ

N

Y

i=1 r

X

j=1

e^θ^j

r

X

j=1

e^θ^j

T

Y

t=1

φ

. (8)

(42)

A computational trick

It is difficult to force this constraint in model estimation.

Instead, we estimate the real parameters θj such that πj = e^θ^j

r

X

j=1

e^θ^j

, (7)

Finally,

L= 1 σ

N

Y

i=1 r

X

j=1

e^θ^j

r

X

j=1

e^θ^j

T

Y

t=1

φ

. (8)

(43)

A computational trick

Instead, we estimate the real parameters θj such that

πj = e^θ^j

r

X

j=1

e^θ^j

, (7)

Finally,

L= 1 σ

N

Y

i=1 r

X

j=1

e^θ^j

r

X

j=1

e^θ^j

T

Y

t=1

φ

. (8)

(44)

A computational trick

r

X

j=1

e^θ^j

, (7)

Finally,

L= 1 σ

N

Y

i=1 r

X

j=1

e^θ^j

r

X

j=1

e^θ^j

T

Y

t=1

φ

. (8)

(45)

A computational trick

r

X

j=1

e^θ^j

, (7)

Finally,

L= 1 σ

N

Y

i=1 r

X

j=1

e^θ^j

r

X

j=1

e^θ^j

T

Y

t=1

φ

. (8)

(46)

Muth´ en’s model (1)

Muth´en and Shedden (1999): Generalized growth curve model

Elegant and technically demanding extension of the uncensored normal model.

Adds random effects to the parameters β^j that define a group’s mean trajectory.

Trajectories of individual group members can vary from the group trajectory.

Software:

Mplus package by L.K. Muth´en and B.O Muth´en.

(47)

Muth´ en’s model (1)

Software:

(48)

Muth´ en’s model (1)

Software:

(49)

Muth´ en’s model (1)

Software:

(50)

Muth´ en’s model (1)

Software:

(51)

Muth´ en’s model (2)

Advantage of GGCM

Fewer groups are required to specify a satisfactory model.

Disadvantages of GGCM

1 Difficult to extend to other types of data.

2 Group cross-over effects.

3 Can create the illusion of non-existing groups.

(52)

Muth´ en’s model (2)

Advantage of GGCM

(53)

Muth´ en’s model (2)

Advantage of GGCM

(54)

Muth´ en’s model (2)

Advantage of GGCM

(55)

Model Selection

Bayesian Information Criterion:

BIC = log(L)−0,5klog(N), (9)

where k denotes the number of parameters in the model. Rule:

The bigger the BIC, the better the model!

(56)

Model Selection

where k denotes the number of parameters in the model.

Rule:

(57)

Model Selection

where k denotes the number of parameters in the model.

Rule:

(58)

Posterior Group-Membership Probabilities

Posterior probability of individual i’s membership in groupj : P(j/Y_i). Bayes’s theorem

⇒P(j/Yi) = P(Y_i/j)ˆπ_j

r

X

j=1

P(Y_i/j)ˆπ_j

. (10)

Bigger groups have on average larger probability estimates.

To be classified into a small group, an individual really needs to be strongly consistent with it.

(59)

Posterior Group-Membership Probabilities

Posterior probability of individual i’s membership in groupj : P(j/Y_i).

Bayes’s theorem

r

X

j=1

P(Y_i/j)ˆπ_j

. (10)

(60)

Posterior Group-Membership Probabilities

Bayes’s theorem

r

X

j=1

P(Y_i/j)ˆπ_j

. (10)

(61)

Posterior Group-Membership Probabilities

Bayes’s theorem

r

X

j=1

P(Y_i/j)ˆπ_j

. (10)

(62)

Posterior Group-Membership Probabilities

Bayes’s theorem

r

X

j=1

P(Y_i/j)ˆπ_j

. (10)

(63)

Outline

(64)

Predictors of trajectory group membership

xi : vector of variables potentially associated with group membership (measured before t1).

Multinomial logit model:

π_j(x_i) = e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

, (11)

where θj denotes the effect of xi on the probability of group membership. L= 1

σ

N

Y

i=1 r

X

j=1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

φ

. (12)

(65)

Predictors of trajectory group membership

xi : vector of variables potentially associated with group membership (measured before t₁).

r

X

k=1

e^xⁱ^θ^k

, (11)

where θj denotes the effect of xi on the probability of group membership. L= 1

σ

N

Y

i=1 r

X

j=1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

φ

. (12)

(66)

Predictors of trajectory group membership

r

X

k=1

e^xⁱ^θ^k

, (11)

where θj denotes the effect of xi on the probability of group membership.

L= 1 σ

N

Y

i=1 r

X

j=1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

φ

. (12)

(67)

Predictors of trajectory group membership

r

X

k=1

e^xⁱ^θ^k

, (11)

where θj denotes the effect of xi on the probability of group membership.

L= 1 σ

N

Y

i=1 r

X

j=1

e^xⁱ^θ^j

r

X

k=1

e^xⁱ^θ^k

T

Y

t=1

φ

. (12)

(68)

Group membership probabilities

The Wald test which indicates whether any number of ocefficients is significally different, allows the statistical testing of the predictors.

Confidence intervals for the probabilities of group membership can be computed by a parametric bootstrap technique.

(69)

Group membership probabilities

(70)

Group membership probabilities

(71)

Adding covariates to the trajectories

yit =β₀^j +β^j₁t+β₂^jt²+β^j₃t³+β₄^jt⁴+α^j₁z1t+...+α^j_LzLt +εit, (13) where εit ∼ N(0, σ), σ being a constant standard deviation andzlt are covariates that may depend or not upon time t.

Unfortunately the estimation of parameters α^j_l is not implemented in proc traj procedure; it is just possible to plot the impact of the covariates.

(72)

Adding covariates to the trajectories

yit =β₀^j +β₁^jt+β₂^jt²+β₃^jt³+β₄^jt⁴+α^j₁z1t+...+α^j_Lz_Lt +εit, (13) where εit ∼ N(0, σ), σ being a constant standard deviation andzlt are covariates that may depend or not upon time t.

(73)

Adding covariates to the trajectories

yit =β₀^j +β₁^jt+β₂^jt²+β₃^jt³+β₄^jt⁴+α^j₁z1t+...+α^j_Lz_Lt +εit, (13) where εit ∼ N(0, σ), σ being a constant standard deviation andzlt are covariates that may depend or not upon time t.

(74)

Outline

(75)

The data : first dataset

Salaries of workers in the private sector in Luxembourg from 1940 to 2006.

About 7 million salary lines corresponding to 718.054 workers. Some sociological variables:

gender (male, female)

nationality and residentship (luxemburgish residents, foreign residents, foreign non residents)

working status (white collar worker, blue collar worker) year of birth

age in the first year of professional activity

(76)

The data : first dataset

About 7 million salary lines corresponding to 718.054 workers.

Some sociological variables: gender (male, female)

(77)

The data : first dataset

Some sociological variables:

(78)

The data : first dataset

(79)

The data : first dataset

working status (white collar worker, blue collar worker)

year of birth

(80)

The data : first dataset

(81)

The data : first dataset

(82)

The data : second dataset

Salaries of all workers in Luxembourg which began to work in Luxembourg between 1980 and 1990 at an age less than 30 years.

1.303.010 salary lines corresponding to 85.049 workers. Some sociological variables:

gender (male, female) nationality and residentship sector of activity

year of birth

age in the first year of professional activity marital status

year of birth of children

(83)

The data : second dataset

1.303.010 salary lines corresponding to 85.049 workers.

Some sociological variables: gender (male, female) nationality and residentship sector of activity

year of birth

(84)

The data : second dataset

nationality and residentship sector of activity

year of birth

(85)

The data : second dataset

gender (male, female) nationality and residentship

sector of activity year of birth

(86)

The data : second dataset

year of birth

(87)

The data : second dataset

year of birth

(88)

The data : second dataset

year of birth

marital status

(89)

The data : second dataset

year of birth