HAL Id: hal-01649206

(1)

HAL Id: hal-01649206

https://hal.archives-ouvertes.fr/hal-01649206

Submitted on 27 Nov 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Mixture Models with Missing data Classication of Satellite Image Time Series

Serge Iovleff, Mathieu Fauvel, Stéphane Girard, Cristian Preda, Vincent Vandewalle

To cite this version:

Serge Iovleff, Mathieu Fauvel, Stéphane Girard, Cristian Preda, Vincent Vandewalle. Mixture Models

with Missing data Classication of Satellite Image Time Series: QUALIMADOS: Atelier Qualité des

masses de données scientiques. Journées Science des Données MaDICS 2017, Jun 2017, Marseille,

France. pp.1-60. �hal-01649206�

(2)

Mixture Models with Missing data Classication of Satellite Image Time Series

QUALIMADOS: Atelier Qualité des masses de données scientiques

S. Iovle , Mathieu Fauvel , Stéphane Girard , Cristian Preda , Vincent Vandewalle

Laboratoire Paul Painlevé

23 Juin 2017

(3)

Clustering using Mixture Models

Sommaire

Clustering using Mixture Models What is Clustering ?

Example Mixture Models

EM Algorithm and variations Mixture Model and Mixed Data

Classication of Satellite Image Time Series

(4)

Clustering using Mixture Models What is Clustering ?

Clustering is the cluster building process

I The term Data Clustering rst appeared in 1954 (according to JSTOR) in an article dealing with anthropological data,

I Many, many existing methods

(https://en.wikipedia.org/wiki/Category:

Data_clustering_algorithms)

(5)

Clustering is the cluster building process

I The term Data Clustering rst appeared in 1954 (according to JSTOR) in an article dealing with anthropological data,

I Many, many existing methods

(https://en.wikipedia.org/wiki/Category:

Data_clustering_algorithms)

(6)

Clustering is the cluster building process

I The term Data Clustering rst appeared in 1954 (according to JSTOR) in an article dealing with anthropological data,

I Many, many existing methods

(https://en.wikipedia.org/wiki/Category:

Data_clustering_algorithms)

(7)

New challenges

Need to algorithms for Big-Data and Complex Data. In particular mixed

features and missing values

(8)

Clustering using Mixture Models Example

An example

Joint works with Christophe Biernacki (head of the Inria Modal team), Vincent Vandewalle, Komi Nagbe,...

Contract for a large lingerie store: "Cluster- ing cash receipts of the Customers with a loyalty card"

I 28 variables related to products,

I 6 variables related to costumers,

I 8 variables related to stores,

I n = 2, 899, 030 receipts.

Some meaningful variables with missing val-

ues.

(9)

An example

Joint works with Christophe Biernacki (head of the Inria Modal team), Vincent Vandewalle, Komi Nagbe,...

Contract for a large lingerie store: "Cluster- ing cash receipts of the Customers with a loyalty card"

I 28 variables related to products,

I 6 variables related to costumers,

I 8 variables related to stores,

I n = 2, 899, 030 receipts.

Some meaningful variables with missing val-

ues.

(10)

An example

Joint works with Christophe Biernacki (head of the Inria Modal team), Vincent Vandewalle, Komi Nagbe,...

Contract for a large lingerie store: "Cluster- ing cash receipts of the Customers with a loyalty card"

I 28 variables related to products,

I 6 variables related to costumers,

I 8 variables related to stores,

I n = 2, 899, 030 receipts.

Some meaningful variables with missing val-

ues.

(11)

An example

Joint works with Christophe Biernacki (head of the Inria Modal team), Vincent Vandewalle, Komi Nagbe,...

Contract for a large lingerie store: "Cluster- ing cash receipts of the Customers with a loyalty card"

I 28 variables related to products,

I 6 variables related to costumers,

I 8 variables related to stores,

I n = 2, 899, 030 receipts.

Some meaningful variables with missing val-

ues.

(12)

An example (Variables)

(13)

An example (Variables)

(14)

An example (Results)

(15)

An example (Results)

(16)

Clustering using Mixture Models Mixture Models

Mixture Models

Main Idea

x in cluster k ⇐⇒ x belongs to distribution P _k

x = (x ₁ , . . . , x n )

clustering −→

ˆ

z = (ˆ z ₁ , . . . , ˆ z n ) , K ˆ clusters

Model Based clustering is a probabilistic approach.

(17)

R package MixAll and SaaS MixtComp

Two softwares available

I R package MixAll

> library(MixAll)

> data(geyser)

> ## add 10 missing values as random

> x = as.matrix(geyser); n <- nrow(x); p <- ncol(x);

> indexes <- matrix(c(round(runif(5 ,1 ,n)), round(runif(5 ,1 ,p))), ncol=2) ;

> x[indexes] <- NA;

> ## estimate model

> model <- clusterDiagGaussian( data=x, nbCluster=2:3 , models=c( " gaussian_pk_sjk "))

> plot(model)

> missingValues(model) row col value 1 133 1 2.029661 2 42 2 54.569144 3 49 2 79.970973 4 209 2 54.569144 5 213 2 54.569144

I SaaS software MixtComp https://massiccc.lille.inria.fr/

(18)

Hypothesis of mixture of parametric distributions

I Cluster k is modeled by a parametric distribution x i |z = k ∼ p (.|α _k )

I Cluster k has probability π _k

z _i ∼ M( 1 , π ₁ , . . . , π _K ).

Mixture model

The model parameters are θ = (π ₁ , . . . , π _K , α ₁ , . . . , α _K ) and

p( x i ) =

K

X

k = 1

π _k p( x i ; α _k )

(19)

Clustering using Mixture Models EM Algorithm and variations

EM Algorithm

Starting from an initial arbitrary parameter θ ⁰ , the m th iteration of the EM algorithm consists of repeating the following I, E and M steps.

I I step: Impute by using expectation of the missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I E step: Compute conditional probabilities z _i = k| x i using current value θ ^r ⁻ ¹ of the parameter:

t _ik ^r = t _k ^r ( x i |θ ^r ⁻ ¹ ) = p _k ^r ⁻ ¹ h( x i |α ^r _k ⁻ ¹ ) P K

l= 1 p _l ^r− ¹ h( x i |α ^r− _k ¹ ) . (1)

I M step: Update ML estimate θ ^r using conditional probabilities t _ik ^r as mixing weights

L(θ| x ₁ , . . . , x n , t ^r ) =

n

X

i= 1 K

X

k= 1

t _ik ^r ln [p _k h( x i |α _k )] ,

I Iterate until convergence

(20)

EM Algorithm

Starting from an initial arbitrary parameter θ ⁰ , the m th iteration of the EM algorithm consists of repeating the following I, E and M steps.

I I step: Impute by using expectation of the missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I E step: Compute conditional probabilities z _i = k| x i using current value θ ^r ⁻ ¹ of the parameter:

t _ik ^r = t _k ^r ( x i |θ ^r ⁻ ¹ ) = p _k ^r ⁻ ¹ h( x i |α ^r _k ⁻ ¹ ) P K

l= 1 p _l ^r− ¹ h( x i |α ^r− _k ¹ ) . (1)

I M step: Update ML estimate θ ^r using conditional probabilities t _ik ^r as mixing weights

L(θ| x ₁ , . . . , x n , t ^r ) =

n

X

i= 1 K

X

k= 1

t _ik ^r ln [p _k h( x i |α _k )] ,

I Iterate until convergence

(21)

EM Algorithm

Starting from an initial arbitrary parameter θ ⁰ , the m th iteration of the EM algorithm consists of repeating the following I, E and M steps.

I I step: Impute by using expectation of the missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I E step: Compute conditional probabilities z _i = k| x i using current value θ ^r ⁻ ¹ of the parameter:

t _ik ^r = t _k ^r ( x i |θ ^r ⁻ ¹ ) = p _k ^r ⁻ ¹ h( x i |α ^r _k ⁻ ¹ ) P K

l= 1 p _l ^r− ¹ h( x i |α ^r− _k ¹ ) . (1)

I M step: Update ML estimate θ ^r using conditional probabilities t _ik ^r as mixing weights

L(θ| x ₁ , . . . , x n , t ^r ) =

n

X

i= 1 K

X

k= 1

t _ik ^r ln [p _k h( x i |α _k )] ,

I Iterate until convergence

(22)

EM Algorithm

Starting from an initial arbitrary parameter θ ⁰ , the m th iteration of the EM algorithm consists of repeating the following I, E and M steps.

I I step: Impute by using expectation of the missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I E step: Compute conditional probabilities z _i = k| x i using current value θ ^r ⁻ ¹ of the parameter:

t _ik ^r = t _k ^r ( x i |θ ^r ⁻ ¹ ) = p _k ^r ⁻ ¹ h( x i |α ^r _k ⁻ ¹ ) P K

l= 1 p _l ^r− ¹ h( x i |α ^r− _k ¹ ) . (1)

I M step: Update ML estimate θ ^r using conditional probabilities t _ik ^r as mixing weights

L(θ| x ₁ , . . . , x n , t ^r ) =

n

X

i= 1 K

X

k= 1

t _ik ^r ln [p _k h( x i |α _k )] ,

I Iterate until convergence

(23)

EM Algorithm

Starting from an initial arbitrary parameter θ ⁰ , the m th iteration of the EM algorithm consists of repeating the following I, E and M steps.

I I step: Impute by using expectation of the missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I E step: Compute conditional probabilities z _i = k| x i using current value θ ^r ⁻ ¹ of the parameter:

t _ik ^r = t _k ^r ( x i |θ ^r ⁻ ¹ ) = p _k ^r ⁻ ¹ h( x i |α ^r _k ⁻ ¹ ) P K

l= 1 p _l ^r− ¹ h( x i |α ^r− _k ¹ ) . (1)

I M step: Update ML estimate θ ^r using conditional probabilities t _ik ^r as mixing weights

L(θ| x ₁ , . . . , x n , t ^r ) =

n

X

i= 1 K

X

k= 1

t _ik ^r ln [p _k h( x i |α _k )] ,

I Iterate until convergence

(24)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(25)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(26)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(27)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(28)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(29)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(30)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(31)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(32)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(33)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(34)

SEM/SemiSEM Algorithms

Drawbacks

I The I step may be dicult

I EM algorithm may converges slowly and is slowed down by the imputation step

I Biased estimators Solution: Use Monte Carlo

I Replace I step by a simulation step

I IS step: simulate missing values x ^m using x ^o , θ ^r ⁻ ¹ , t _ik ^r− ¹ .

I Replace E step by a simulation step (Optional)

I S step: generate labels z ^r = { z ^r ₁ , ..., z ^r _n } according to the categorical distribution (t _ik ^r , 1 ≤ k ≤ K ) .

SEM and SemiSEM does not converge point wise. It generates a Markov chain.

I θ ¯ = (θ ^r ) r = 1 ,...,R

I missing values imputed using empirical MAP value (or expectation)

(35)

Clustering using Mixture Models Mixture Model and Mixed Data

Mixed Data

Mixed data are handled using conditional independence of the variables.

1. Observation space of the form X = X ₁ × X ₂ × . . . × X L

2. x i arises from a mixture probability distribution with density

f ( x i = ( x 1 i , x 2 i , . . . x Li )|θ) =

K

X

k= 1

π _k

L

Y

l= 1

h ^l ( x li |α _lk ).

3. The density functions (or probability distribution functions) h ^l (.|α _lk ) can be any implemented model.

MixAll implements Gaussian, Poisson, Categorical, Gamma distributions.

MixtComp implements Gaussian, Poisson, Categorical and specic

distributions for rank and ordinal data.

(36)

Classication of Satellite Image Time Series

Sommaire

Clustering using Mixture Models

Classication of Satellite Image Time Series Cube of Data

Missing Data/Noisy Data/Sampling (Long term) Objective

Modeling

Missing Values ?

(37)

Classication of Satellite Image Time Series

I Dé Mastodons: Appel à Projet 2016 "Qualité des données"

I Creation of the CloHe (CLustering Of Heterogeneous Data with applications to satellite data records) project

I Members: Mathieu Fauvel (INRA), Stéphane Girard (Inria Grenoble),

Vincent vandewalle (Lille2), Crisitan Preda (Université Lille 1)

https://modal.lille.inria.fr/CloHe/

(38)

Classication of Satellite Image Time Series Cube of Data

Formosat-2 is no more operational

Figure: Formosat-2 furnished

multi-spectral data (R, G, B, NIR) with a 8 meter resolution. 17 complete images of France by year

Sentinel-2A start service in 2016.

Figure: Sentinel-2 furnish 13 spectral bandwidths with 4 bandwidths with a 10 meters resolution and 6 bandwidths with a 20 meters resolution. A

complete image of France every 5 days

(39)

Data Cube

Figure: Sentinel-2 furnish approximately 20TB of images/year, and cover the entire France in 5 days with 1.6 milliard de pixels.

Data Cube

X = (X _ikt ), i ∈ I , k ∈ { r,v,b,ir }, Y = (Y _i ), i ∈ J ⊂ I .

with

I i = (x, y) geographic position,

I k spectral band,

I t dates,

I missing values (clouds, ported shadows) at some dates and some positions,

I noisy data (undetected shadows, cloud veil, etc...).

I mixel (mixture of pixel) presence

S. Iovle (Lille 1) Mixture Models with Missing data Classication of Satellite Image Time Series23 Juin 2017 18 / 30

(40)

Data Cube

Figure: Sentinel-2 furnish approximately 20TB of images/year, and cover the entire France in 5 days with 1.6 milliard de pixels.

Data Cube

X = (X _ikt ), i ∈ I , k ∈ { r,v,b,ir }, Y = (Y _i ), i ∈ J ⊂ I .

with

I i = (x, y) geographic position,

I k spectral band,

I t dates,

I missing values (clouds, ported shadows) at some dates and some positions,

I noisy data (undetected shadows, cloud veil, etc...).

I mixel (mixture of pixel) presence

(41)

Data Cube

Figure: Sentinel-2 furnish approximately 20TB of images/year, and cover the entire France in 5 days with 1.6 milliard de pixels.

Data Cube

X = (X _ikt ), i ∈ I , k ∈ { r,v,b,ir }, Y = (Y _i ), i ∈ J ⊂ I .

with

I i = (x, y) geographic position,

I k spectral band,

I t dates,

I missing values (clouds, ported shadows) at some dates and some positions,

I noisy data (undetected shadows, cloud veil, etc...).

I mixel (mixture of pixel) presence

(42)

Data Cube

Figure: Sentinel-2 furnish approximately 20TB of images/year, and cover the entire France in 5 days with 1.6 milliard de pixels.

Data Cube

X = (X _ikt ), i ∈ I , k ∈ { r,v,b,ir }, Y = (Y _i ), i ∈ J ⊂ I .

with

I i = (x, y) geographic position,

I k spectral band,

I t dates,

I missing values (clouds, ported shadows) at some dates and some positions,

I noisy data (undetected shadows, cloud veil, etc...).

I mixel (mixture of pixel) presence

(43)

Data Cube

Figure: Sentinel-2 furnish approximately 20TB of images/year, and cover the entire France in 5 days with 1.6 milliard de pixels.

Data Cube

X = (X _ikt ), i ∈ I , k ∈ { r,v,b,ir }, Y = (Y _i ), i ∈ J ⊂ I .

with

I i = (x, y) geographic position,

I k spectral band,

I t dates,

I missing values (clouds, ported shadows) at some dates and some positions,

I noisy data (undetected shadows, cloud veil, etc...).

I mixel (mixture of pixel) presence

(44)

Classication of Satellite Image Time Series Missing Data/Noisy Data/Sampling

Missing data

Figure: Very cloudy Figure: A few number of clouds

Figure: "sheeps" Figure: Some clouds with a veil

(45)

Noisy Data

Figure: clouds and their shadows

(46)

Non-Uniform sampling

Figure: Path-row grid for Landsat acquisitions.

Every path (North-South track) is acquired on the same date every 16 days.

Figure: Map of the number of times that every pixel sees the ground taking into account satellite revisit and cloud cover.

Figure: Histogram of the number of times that every pixel sees the ground taking into account satellite revisit and cloud cover.

Open Access: http://www.mdpi.com/2072-4292/9/1/95/htm

(47)

Classication of Satellite Image Time Series (Long term) Objective

Objective

The aim is to be able to cluster the whole France using Sentinel-2 data.

Figure: Classication of France

(48)

Classication of Satellite Image Time Series Modeling

Gaussian modeling

I Y _i ∈ { 1 , . . . , G} ,

I L( X i |Y _i = g ) = N (µ _g , Σ _g )

I Two kinds of parsimony assumptions on covariance matrices

I

independence between spectra Σ

_g,k

of size T × T , ( T = 17),

I

or independences between times, Σ

_g,t

of size K × K , ( K = 4).

I handle missing values for both models

I Implementations and tests in a R package.

(49)

Gaussian modeling

I Y _i ∈ { 1 , . . . , G} ,

I L( X i |Y _i = g ) = N (µ _g , Σ _g )

I Two kinds of parsimony assumptions on covariance matrices

I

independence between spectra Σ

_g,k

of size T × T , ( T = 17),

I

or independences between times, Σ

_g,t

of size K × K , ( K = 4).

I handle missing values for both models

I Implementations and tests in a R package.

(50)

Gaussian modeling

I Y _i ∈ { 1 , . . . , G} ,

I L( X i |Y _i = g ) = N (µ _g , Σ _g )

I Two kinds of parsimony assumptions on covariance matrices

I

independence between spectra Σ

_g,k

of size T × T , ( T = 17),

I

or independences between times, Σ

_g,t

of size K × K , ( K = 4).

I handle missing values for both models

I Implementations and tests in a R package.

(51)

Gaussian modeling

I Y _i ∈ { 1 , . . . , G} ,

I L( X i |Y _i = g ) = N (µ _g , Σ _g )

I Two kinds of parsimony assumptions on covariance matrices

I

independence between spectra Σ

_g,k

of size T × T , ( T = 17),

I

or independences between times, Σ

_g,t

of size K × K , ( K = 4).

I handle missing values for both models

I Implementations and tests in a R package.

(52)

Method

Missing values formation process

Missing At Random (MAR): Probability for a value to be missing does not depends from its value conditionally to the other observations.

Denote x _ik ⁺ = x _i ^O

x _ik ^M

⁺

, Σ ˜ ⁺ _ik =

0 ^O _i 0 ^OM _i 0 ^MO _i Σ ˜ ^M _ik

⁺

with 0 null matrix, and Σ ˜ ^M _ik

⁺

= Σ ^M _ik − Σ ^MO _ik Σ ^O _ik − 1 Σ ^OM _ik . then

Σ ⁺ _k = 1 n _k ⁺

n

X

i = 1

h

( x _ik ⁺ − µ ⁺ _k )( x _ik ⁺ − µ ⁺ _k ) ⁰ + Σ ˜ ⁺ _ik i

Σ ˜ ^M _ik

⁺

is correcting the variance due to the imputation by the mean.

(53)

Gaussian modeling

Oak Silver fir

Black pine Silver fir

Silver fir

Silver birch Black locust

Oak European ashMaritime pine

Black pine

Black pine Oak

Silver birch

Oak

Douglas fir Silver fir Black pine Silver fir Silver fir Silver fir

Douglas fir Oak

Tree Species Classification using Formosat-2 Satellite Image Time Series

Source: Esri, DigitalGlobe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community

February 8, 2017 ₀ _0.3 _0.6 _1.2_mi

0 0.35 0.7 1.4km

1:36,112

Figure: Tree species classication with G = 13

(54)

Classication of Satellite Image Time Series Missing Values ?

Continuous Model

Main assumption

Y |Z = k ∼ GP (µ k , C k ), k = 1 , . . . , K (2) where GP (µ _k , C _k ) is a Gaussian Process with mean µ _k ∈ L ₂ (I ) and with covariance operator C _k : I × I → R.

I mean functions belongs to a J− dimensional subspace

µ _k (t) =

J

X

j = 1

α _kj ϕ _j (t ),

I Covariance function

C k (s , t)(h k ) = θ k Q((t − s )/h k ),

I Spectrum are independents.

(55)

Continuous Model

Main assumption

Y |Z = k ∼ GP (µ k , C k ), k = 1 , . . . , K (2) where GP (µ _k , C _k ) is a Gaussian Process with mean µ _k ∈ L ₂ (I ) and with covariance operator C _k : I × I → R.

I mean functions belongs to a J− dimensional subspace

µ _k (t) =

J

X

j = 1

α _kj ϕ _j (t ),

I Covariance function

C k (s , t)(h k ) = θ k Q((t − s )/h k ),

I Spectrum are independents.

(56)

Continuous Model

Main assumption

Y |Z = k ∼ GP (µ k , C k ), k = 1 , . . . , K (2) where GP (µ _k , C _k ) is a Gaussian Process with mean µ _k ∈ L ₂ (I ) and with covariance operator C _k : I × I → R.

I mean functions belongs to a J− dimensional subspace

µ _k (t) =

J

X

j = 1

α _kj ϕ _j (t ),

I Covariance function

C k (s , t)(h k ) = θ k Q((t − s )/h k ),

I Spectrum are independents.

(57)

Continuous Model

Main assumption

Y |Z = k ∼ GP (µ k , C k ), k = 1 , . . . , K (2) where GP (µ _k , C _k ) is a Gaussian Process with mean µ _k ∈ L ₂ (I ) and with covariance operator C _k : I × I → R.

I mean functions belongs to a J− dimensional subspace

µ _k (t) =

J

X

j = 1

α _kj ϕ _j (t ),

I Covariance function

C k (s , t)(h k ) = θ k Q((t − s )/h k ),

I Spectrum are independents.

(58)

Estimation of Continuous Model

For each i , let B _`,j ⁱ = ϕ _j (t _` ⁱ ) , m _ki = B ⁱ α _k and

Σ ⁱ _j,j

0

(h _k ) = θ _k Q((t _j ⁱ − t _j ⁱ

0

)/h _k ) =: θ _k S _j ⁱ _,j

0

(h _k ),

then

y i |Z _i = k ∼ N _T

_i

(m ki , θ k S ⁱ (h k )), k = 1, . . . , K , i = 1, . . . , n we end up with K independent minimization problems:

( ˆ α _k , h ˆ _k ) = arg max

α

k

,h

k

,θ

k

X

Z

i

=k

log det S ⁱ (h _k ) + T _i log θ _k

+ 1

θ _k (y i − B ⁱ α k ) ^> S ⁱ (h k ) ⁻ ¹ (y i − B ⁱ α k )

(59)

Results

About 65% well classied.

Figure: G = 13 spectrum

(60)

Mean values

Figure: rst, 4th, 7th and 11th classes

(61)

Links

I https://cran.r-project.org/web/packages/MixAll/

I https://massiccc.lille.inria.fr/

I https://modal.lille.inria.fr/CloHe/

I http://www.mdpi.com/2072-4292/9/1/95/htm