Identifiability of Finite Mixture Models

(1)

Identifiability of Finite Mixture Models

Jang SCHILTZ (University of Luxembourg) joint work with

C´edric NOEL(University of Lorraine & University of Luxembourg)

SMTDA 2020 June 4, 2020

(2)

Outline

1 Introduction

2 Characterization of Identifiability

3 Identifiability of finite mixture models

Jang SCHILTZ Identifiability of Finite Mixture Models with underlying Normal DistributionJune 4, 2020 2 / 20

(3)

Outline

1 Introduction

(4)

Outline

1 Introduction

(5)

Outline

1 Introduction

(6)

Identifiability

Definition:

A finite mixture model is identifiable if a given dataset leads to a uniquely determined set of model parameter estimations up to a permutation of the clusters.

Identifiability of the parameters is a necessary condition for the existence of consistent estimators for any statistical model.

Without identifiability, there might be several solutions for the parameter estimation problem.

(7)

Identifiability

Definition:

A finite mixture model is identifiable if a given dataset leads to a uniquely determined set of model parameter estimations up to a permutation of the clusters.

Identifiability of the parameters is a necessary condition for the existence of consistent estimators for any statistical model.

Without identifiability, there might be several solutions for the parameter estimation problem.

(8)

Literature Review

Teicher (1963): The class of all mixtures of one-dimensional normal distributions is identifiable.

Yakowitz and Spragins (1968): Extension to the class of all Gaussian mixtures.

Henning (2000): Identifiability for linear regression mixtures with Gaussian errors under certain conditions.

(9)

Literature Review

(10)

Literature Review

(11)

Finite Mixture Models

Data:

Variable of interestY_i =y_i₁,y_i₂, ...,y_i_T Covariantsx₁, ...,x_M andz_i₁, ...,z_i_T a_i_t age of subject i at time t

Model:

K groups of sizeπ_k with trajectories y_i_t =

sk

X

j=0

β_j^k+

M

X

m=1

α^k_mx_m+γ_j^kz_i_t

!

a_it^j +ε^k_it, (1) where ε^k_it ∼ N(0, σ^k).

(12)

Finite Mixture Models

Data:

Variable of interestY_i =y_i₁,y_i₂, ...,y_i_T Covariantsx₁, ...,x_M andz_i₁, ...,z_i_T a_i_t age of subject i at time t Model:

K groups of sizeπ_k with trajectories y_i_t =

sk

X

j=0

β_j^k+

M

X

m=1

α^k_mx_m+γ_j^kz_i_t

!

a_it^j +ε^k_it, (1) where ε^k_it ∼ N(0, σ^k).

(13)

Outline

1 Introduction

(14)

Notations

Distribution f of a finite mixture model:

f(yi; Ω) =

K

X

k=1

πkgk(yi;θ^k).

Cumulative distribution function F of a finite mixture model:

F(y_i; Ω) =

K

X

k=1

π_kG_k(y_i;θ^k).

(15)

Mixtures and mixing distributions

Let F =

F(y;ω), y ∈R^T, ω∈R^s+2_K be a family of T-dimensional cdf’s indexed by a parameter setω, such that F(y;ω) is measurable in R^T ×R^s+2_K .

The the s+ 2-dimensional cdfH(x) =R

R^s+2_K F(y;ω)dG(ω) is the image of the above mapping, of the s+ 2-dimensional cdfG.

The distribution H is called the mixture ofF andG its mixing distribution.

Let G denote the class of all s+ 2-dimensional cdf’s G and Hthe induced class of mixturesH.

Then H is identifiable ifQ is a one-to-one map fromG onto H.

(16)

Characterization of identifiability

The set Hof all finite mixtures of class F of distributions is the convex hull of F.

H= (

H(y) :H(y) =X

i

c_iF(y, ω_i), c_i >0,X

i

c_i = 1, F(y, ω_i)∈ F )

.

(2) Theorem

A necessary and sufficient condition for the class Hof all finite mixtures of the family F to be identifiable is thatF is a linearly independent family over the field of real numbers.

(17)

Outline

1 Introduction

(18)

The Model

Yit =f(ait;β^k, δ^k) +ε^k_it =β^kAit+δ^kWit+ε^k_it. (3) We can write

L((Y_i)i∈I) =O

i∈I

F_A_i_,W_i_,J. (4)

Identifiability of a model means that knowing the data distribution L(Y_i),i ∈I, one can uniquely identify the mixing distributionJ.

That is, no two distinct sets of parameters lead to the same data distribution.

(19)

Nagin’s base model

C₁ = F_A,J : F_A,J =O

i∈I

F_A_i_,J

!

J∈Ω₁

Theorem Let h_j = min

q : {A_ij,i ∈I} ⊆ ∪^q_i₌₁H_i H_i ∈ H_n−1 .

If there exist j such that|S(J)|<hj, ∀J thenC₁ is identifiable.

(20)

Adding covariates independent of cluster membership

C₂ = F_A,J : F_A,J =O

i∈I

F_A_i_,W_i_,J

!

J∈Ω1

, (5)

C_2A = F_A,J : F_A,J =O

i∈I

F_A_i_,J

!

J∈Ω₁

, (6)

C_2W = F_A,J : F_A,J =O

i∈I

F_W_i_,J

!

J∈Ω₁

. (7)

Theorem

IfC_2A andC_2W are identifiable and W_ij is not a multiple of A_ij, for all i,j , then C₂ is identifiable.

(21)

Numerical Example

Two clusters with sizesπ1=π2= ¹₂. Two time-points 1 and 2.

Same variability in both clusters σ= 0.1

We simulate 50 samples of 100 trajectories with parameters β¹ = (3,−2) andβ² = (0,2) (linear model)

β¹ = (10,−12.5,3.5) andβ² = (−2,5,−1) (polynomial model).

(22)

Parallel coordinate plots of the estimated parameter

0.44 0.56

-0.10 3.06

-2.03 2.06

0.0866 0.1097

π intercept slope σ

0.36 0.64

-1.31 10.75

-13.61 3.99

-1.44 4.71

0.0925 0.1075

π intercept slope degree 2 σ

Linear Model Parabolic Model

(23)

The generalized model

Theorem

The model is identifiable if

d_k <T for all 1≤k ≤K and all a_it are distinct, for all it. Wk has full rank for all1≤k ≤K .

rk(Ak,Wk) =rk(Ak) +rk(Wk), for all 1≤k ≤K .

(24)

Numerical Example

Two clusters with sizesπ1=π2= ¹₂. Two time-points 1 and 2.

Same variability in both clusters σ= 0.1

Shape description parametersβ1= (3,−2), β2 = (0,2),δ1 = 2 and δ₂ =−3.

We simulate 50 samples of 100 trajectories for 3 types of models:

The covariate is independent of time and only takes values 0 or 1 The covariate is time dependent but in a nonlinear way

The covariate is time dependent in a linear way

(25)

Parallel coordinate plots of the estimated parameter

0.35 0.65

-0.075 3.079

-2.03 2.05

0.0883 0.1084

-3.05 2.04

π intercept slope σ δ

0.411 0.589

-0.0577 3.1001

-2.06 2.04

0.0875 0.1114

-3.01 2.01

6.34e-08 1.00e+00

-4.11 12.51

-19.0 36.1

7.60 7.76

-124.6 67.8

Model 1 Model 2 Model 3

(26)

Bibliography

Nagin, D.S. 2005: Group-based Modeling of Development.

Cambridge, MA.: Harvard University Press.

Schiltz, J. 2015: A generalization of Nagin’s finite mixture model. In:

Dependent data in social sciences research: Forms, issues, and methods of analysis’ Mark Stemmler, Alexander von Eye & Wolfgang Wiedermann (Eds.). Springer 2015.

Teicher, H. 1963: Identifiability of Finite Mixtures. Annals of Mathematical Statistics, 34,4,1265–1269.

Yakowitz, S.J. and Spragins, J.D. 1968: On the identifiability of finite mixtures. Annals of Mathematical Statistics, 39, 209–214.

Hennig, C. 2000: Identifiability of Models for Clusterwise Linear Regression. Journal of Classification, 17, 273–296.