A Bayesian model for joint unmixing, clustering and classification of hyperspectral data

(1)

O

pen

A

rchive

T

oulouse

A

rchive

O

uverte

(OATAO)

OATAO is an open access repository that collects the work of some Toulouse

researchers and makes it freely available over the web where possible.

This is

an author's

version published in:

https://oatao.univ-toulouse.fr/19746

Official URL :

https://informatique-mia.inra.fr/resste/sites/informatique-mia.inra.fr.resste/files/RESSTEmai2017_ALagrange.pdf#overlay-context=hierarchique

To cite this version :

Any correspondence concerning this service should be sent to the repository administrator:

[email protected]

Lagrange, Adrien and Dobigeon, Nicolas and Fauvel, Mathieu and May, Stéphane A Bayesian model for

joint unmixing, clustering and classification of hyperspectral data. (2017) In: Séminaire des doctorants, 28

April 2017 (Toulouse, France). (Unpublished)

OATAO

(2)

A Bayesian model for joint unmixing, clustering

and classification of hyperspectral data

Adrien Lagrange

Ph.D. student at IRIT/INP-ENSEEIHT

Supervisors: Nicolas Dobigeon (IRIT/INP-ENSEEIHT), Mathieu Fauvel (DYNAFOR - INRA/INP-ENSAT) and Stéphane May (CNES)

(3)

Context Model Experiments Conclusions and perspectives

Context

Hyperspectral imaging

Objective

Model

Spectral unmixing

Clustering

Classification

Experiments

Synthetic data

Real data

(4)

Context Model Experiments Conclusions and perspectives Hyperspectral imaging

Nature of an hyperspectral image

A remote sensing hyperspectral image is:

same area at different wavelength → hundreds of measurements per pixel,

poor spatial resolution due to sensor limitations, e.g., resolution around 10x10m per

pixel for aerial applications

0

100

200 # bands

reflectance

0

100

200 # bands

reflectance

A. Lagrange Supervisors: Nicolas Dobigeon (IRIT/INP-ENSEEIHT), Mathieu Fauvel (DYNAFOR - INRA/INP-ENSAT) and Stéphane May (CNES) A Bayesian model for joint unmixing, clustering and classification of hyperspectral data 3 of 28

eoooo

o

000000000000 0000

o

•

(5)

Hyperspectral image analysis

Spectral unmixing

y

p

≈ Ma

p

y

p

: p-th observation

M: endmember matrix (spectra of

elementary components)

a

p

: p-th abundance vector

Classification

Maximum a posteriori (MAP) rule:

y

p

belongs to j ⇔ j = arg max

j∈J

p(j|y

p

),

⇔ j = arg max

j∈J

p(j)p(y

p

|j).

y

p

?

buildings

water

vegetation

y

p

buildings

water

vegetation

oeooo

o

•

0000 000000000000

""-...

-0

(6)

----Context Model Experiments Conclusions and perspectives Hyperspectral imaging

Spectral unmixing

One illustrative example

Image: 50 × 50 pixels (Moffett field), L = 224 bands,

3 materials: vegetation, water, soil.

ooeoo

o

000000000000 0000

o

•

(7)

Spectral unmixing

One illustrative example

Image: 50 × 50 pixels (Moffett field), L = 224 bands,

3 materials: vegetation, water, soil.

oooeo

o

000000000000 0000

o

•

8 [r]

5 s

0.4

0 Vegetot;onSl)ect<\Jm

O.Ol

•

Woter•pect,um

B

5o

;

1•pect,um

1:

~

E:

/

0 0 0 10 20 30 40 50 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5

wavelength (µm) wavelength (µm) wavelength (µm) AbundallCe of Yegetation Abundance of water Abundance of soil

-.:-. r·

,·

.

\_

rt_ • - I

(8)

Spectral unmixing

A matrix factorization, latent factor modeling or blind source separation problem: Y ≈ MA

1. Principal Component Analysis (PCA)

I

Searching for

orthogonal

“principal components” (PCs) m

r

,

I

PCs = directions with maximal variance in the data,

I

Generally used as a dimension reduction procedure.

2. Independent Component Analysis (ICA) (of Y

T

₎

I

Maximizing the statistical

independence

between the sources m

r

,

I

Several measures of independence ⇒ several algorithms.

3. Nonnegative Matrix Factorization (NMF)

I

Searching for M et A with

positive

entries,

I

Several measures of divergence d (·|·) ⇒ several algorithms.

4. (Fully Constrained) Spectral Mixture Analysis (SMA)

I

Positivity

constraints on m

r

⇒ positive “sources”

I

Positivity

and

sum-to-one

constraints on a

r

⇒ mixing coefficients = proportions/concentrations/probabilities.

(9)

Context Model Experiments Conclusions and perspectives Objective

Objective

Spectral unmixing

Classification

Low-level biophysical information

High-level semantic information

Abundance vector per pixel

Unique label per pixel

Unsupervised

Supervised

=⇒

Scarcely considered jointly.

Objective

Propose a unified framework to estimate jointly a classification map

and a spectral unmixing from an hyperspectral image.

(10)

Context

Hyperspectral imaging

Objective

Model

Spectral unmixing

Clustering

Classification

Experiments

Synthetic data

Real data

(11)

Bayesian model

conventional linear mixing

model;

clustering of homogeneous

abundance vectors;

classification with a

non-homogeneous Markov

random field (MRF) to

promote coherence between

cluster and class labels.

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

000000 eooooooooooo 0000 o

•

(12)

Bayesian model

conventional linear mixing

model;

clustering of homogeneous

abundance vectors;

classification with a

non-homogeneous Markov

random field (MRF) to

promote coherence between

cluster and class labels.

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

•

0

(13)

Context Model Experiments Conclusions and perspectives Spectral unmixing

Linear Mixture Model (1)

Linear combination of elementary signatures corrupted by an additive Gaussian noise

y

p

= Ma

p

+ n

p

with

y

p

: measured spectrum (p = 1, . . . , P where P is the total number of pixels)

M: endmember matrix

(i.e., spectra of R elementary components, assumed to be known here)

a

p

: abundance vector

n

p

: noise

•

(14)

Linear Mixture Model (2)

Prior model for the noise:

n

p

|s

2

∼ N (0

D

, s

2

I

D

),

s

2

|δ ∼ IG(1, δ),

p(δ|s

2

) ∝

1 δ

1

R+

(δ).

(15)

Hierarchical model

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

000000

o

ooe

oooooooo

0000

o

(16)

Context Model Experiments Conclusions and perspectives Clustering

Clustering (1)

Assumption: several unknown spectrally coherent clusters with statistically

homogeneous abundance vectors, ∀k ∈ {1, ..., K },

a

p

|z

p

= k, ψ

k

, Σ

k

∼ N (ψ

k

, Σ

k

) with Σ

k

= diag(σ

k,1

, . . . , σ

k,R

)

where z1

, . . . , z

p

are discrete labels identifying the belonging to the clusters.

Vague priors for cluster parameters:

I

ψ

_k

∼ Dir(1)

→ ensures nonnegativity and sum-to-one constraints of E [a

p

|z

p

= k]

(soft constraints on a

p

)

I

σ

k,r

∼ IG(1, 0.1)

eoooo

(17)

Clustering (1)

Assumption: several unknown spectrally coherent clusters with statistically

homogeneous abundance vectors, ∀k ∈ {1, ..., K },

a

p

|z

p

= k, ψ

k

, Σ

k

∼ N (ψ

k

, Σ

k

) with Σ

k

= diag(σ

k,1

, . . . , σ

k,R

)

where z1

, . . . , z

p

are discrete labels identifying the belonging to the clusters.

Vague priors for cluster parameters:

I

ψ

_k

∼ Dir(1)

→ ensures nonnegativity and sum-to-one constraints of E [a

p

|z

p

= k]

(soft constraints on a

p

)

I

σ

k,r

∼ IG(1, 0.1)

eoooo

•

(18)

Hierarchical model

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

000000

oooo

oeooo

ooo

0000

o

(19)

Clustering (2)

Clustering with a non-homegeneous Markov random field

P[z

p

= k|zV(p)

, ω

p

, q

k,ωp

] ∝ exp

V

1(k, ωp

, q

k,ωp

) +

X

p0∈V(p)

V

2(k, zp0

)

with V(p) neighborhood of p, ω

p

classification label of p.

Two potentials:

To promote coherence with classification → V1(k, j, q

k,j

) = log(q

k,j

);

To promote spatial coherence (Potts-Markov potential) → V2

(k, z

p0

) = β1

δ(k, z

_p0

)

with δ(·, ·) Kronecker function.

000000

oooo

ooeoo

ooo

0000

o

•

(20)

Hierarchical model

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

000000

oooo

oooeo

ooo

0000

o

,

I

'

I

(21)

Clustering (3)

Estimation of coefficients of interaction between high-level and low-level

information:

q

j

∼ Dir(1) → q

j

|z, ω ∼ Dir(n1,j

, . . . , n

K ,j

)

with

n

k,j

= #{p|z

p

= k, ω

p

= j}

In particular:

E [q

k,j

|z, ω] =

n

k,j

PK

i=1

n

i,k

= P [z

p

= k|ω

p

= j]

(22)

Context Model Experiments Conclusions and perspectives Classification

Classification (1)

Classification rule with a Markov random field

P[ω

p

= j|ωV(p)

, c

p

, η

p

] ∝ exp

W

1

(j, c

p

, η

p

) +

X

p0∈V(p)

W

2

(j, ω

p0

)

Two potentials:

To promote coherence with labeled data

W

1(j, cp

, η

p

) =







log(η

p

),

if j = c

p

log(

1−ηp J −1

),

otherwise

if p ∈ L

− log(J )

otherwise

To promote spatial coherence → W2

(j, ω

p0

) = β2

δ(j, ω

_p0

).

000000

ooooooooo

eoo

oor::0 o

•

(23)

Hierarchical model

Y

s

2

δ

M

A

Σ

ψ

z

β

1

Q

ω

β

2

η

c

L

Observations

Unmixing

Clustering

Classification

(24)

Classification (2)

Robust classification:

η

p

∈ (0, 1) the confidence in label c

p

provided by user

Possibility to correct labeled data when η

p

< 1

•

(25)

Context

Hyperspectral imaging

Objective

Model

Spectral unmixing

Clustering

Classification

Experiments

Synthetic data

Real data

(26)

Context Model Experiments Conclusions and perspectives Synthetic data

Dataset

413 spectral bands

SNR = 30dB

Clustering generated with

Potts-Markov MRF

Classes created by aggregating several

clusters

Image 1: 3 clusters, 2 classes, 3

endmenbers, 100x100px

Image 2: 12 clusters, 5 classes, 9

endmembers, 200x200px

(a)

(b)

(c)

(d)

Classification map: (a) image 1, (b) image 2;

Clusters: (c) image 1, (d) image 2

000000 000000000000

eoo

o

•

(27)

Results

0

50

100

150

0

0.2

0.4

0.6

0.8

1 # iteration

v-mesure

0

50

100

150

0

0.2

0.4

0.6

0.8

1 # iteration

v-mesure

(a)

(b)

Proposed model in blue, model without classification stage (Eches et al.).(a) Clustering

convergence for image 1, (b) Clustering convergence for image 2

000000 000000000000

oeo

o

....

+ +- + +-+ +- + +-+ +

T

(28)

Results

Provided labeled data

Classification obtained

Deterioration of labeled data (40% of

error)

Confidence set to 60%

⇒

Correction of mislabeled pixels

000000 000000000000

ooe

o

•

(29)

Context Model Experiments Conclusions and perspectives Real data

Dataset

349 spectral bands

10 endmembers extracted with VCA

6 classes (straw cereal, summer crop,

wooded area, artificial surfaces, bare

soil, pasture)

Top half of groundtruth provided as

labeled data

(a)

(b)

(c)

(d)

Muesli dataset: (a) colored composition of

data, (b) groundtruth, (c) obtained

clustering and (d) obtained classification

000000 000000000000

ooo

e

o

•

(30)

Context

Hyperspectral imaging

Objective

Model

Spectral unmixing

Clustering

Classification

Experiments

Synthetic data

Real data

(31)