Lab. des Images et des Signaux (LIS)

(1)

Réseaux de neurones formels

Christian Jutten

Lab. des Images et des Signaux (LIS)

UMR 5083 Centre National de la Recherche Scientifique, Institut National Polytechnique de Grenoble,

Université Joseph Fourier

(2)

Contenu

• I. Introduction

• II. Quelques flashs de neurobiologie

• III. Modèles mathématiques

• IV. Coopération et compétition

• V. Mémoires associatives linéaires

• VI. Perceptrons multi-couches

• VII. Modèles de Hopfield

• VIII. Cartes auto-organisatrices de Kohonen

• IX. Séparation de sources

• X. Présentation du BE et des mini-projets

(3)

Chapitre 9

Séparation aveugle de sources

(4)

The problem

• The signal received by a sensor (electrode, antenna, microphone, etc.) is an intricated signal.

)

1(t

s as₁(t)+ bs₂(t)

)

2(t s

How can one retrieve the different signals (sources) from the mixture ?

(5)

Solution

• It is possible using a few observations observations – more sensors than sources

– different mixtures (observations) 



 

 ≠ d

c b

a

) ( )

( ₂

1 t bs t

as +

)

1(t s

) ( )

( ₂

1 t ds t

cs +

)

2(t s

(6)

Summary

• Origine of the problem : motion decoding in vertebrates

• Neuromimetic approach: an intuitive solution

• Statement of the problem, assumptions and solutions

• Independence criteria (ICA)

• Algorithm principles

• Exploitation of prior informations

• A few examples

• Exhibitions and discussion

(7)

Motion decoding





+

=

+

=

) ( )

( )

(

) ( )

( )

(

22 21

12 11

t v a t

p a t

f

t v a t

p a t

f

II I

Linear model

(8)

Neuromimetic approach

One uses a neural network with recurrent connections able to separate the unknown sources by tuning the weights.

∑

−

=

j

j ij i

i t x t c y t

y ( ) ( ) ( )

)

1(t

x y₁(t)

c₁n c₁₂ )

2(t

x y₂(t)

) (t

y_n ( ) ( ) ( ) ( )

) ( )

( ) (

) ( )

( )

(

1X t B X t

C I t

Y

t X t

Y C I

t Y C t

X t

Y

=

−

=

−

=

−

c₂n c₂₁

) (t x_n

cnn c_n₂ c_n₁

(9)

2-source problem

)

1(t x

)

2(t x

c12

c21

)

1(t y

)

2(t

 y



+

=

+

=

) ( )

( )

(

) ( )

( )

(

2 22 1

21 2

2 12 1

11 1

t s a t s a t

x

t s a t s a t

x





−

=

−

=

) ( )

( )

(

) ( )

( )

(

1 21 2

2

2 12 1

1

t y c t x t

y

t y c t x t

y





−

=

−

=

)) ( )

( ( )

( )

(

)) ( )

( ( )

( )

(

2 12 1

21 2

2

1 21 2

12 1

1

t y c t x c t x t

y

t y c t x c t x t

y









−

− +

= −

−

= −

−

− +

= −

−

= −

21 12

2 21 21 22

1 11 21 21

21 12

1 21 2 2

21 12

2 22 12 12

1 21 12 11

21 12

2 12 1 1

1

) ( ) (

1

) ( )

) ( (

1

) ( ) (

1

) ( )

) ( (

c c

t s a c a

c c

t x c t t x

y

c c

t s a c a

c c

t x c t t x

y

21 21

12 12 et

a c a

a

c = a =





=

) ( )

(

) ( )

( ₁₁ ₁

1

t s a t

y

t s a t

y





=

) ( )

(

) ( )

( ₁₂ ₂

1

t s a t

y

t s a t

y

21 21 22

21

12 11 et

a c a

a

c = a =

(10)

2-source intuitive solution (1/2)

• There exists one solution (in fact, 2 solutions)

• How to find the solution, i.e. c_ij since the a_ij‘s are unknown

• The idea could be to estimate c_ij so that outputs y_j become independent

• A first approach could be based on output decorrelation:

i.e. according to stochastic iteration:

• Problem: , i.e. the separation matrix is symmetric and cannot « inverse » any mixing matrix A.

[

⁽ ⁾ ⁽ ⁾

]

)

(t E y t y t c_ij = µ _t _i _j

∆

) ( ) ( )

(t y t y t c_ij = µ _i _j

∆

) ( )

(t c t c_ij = ∆ _ji

∆

(11)

2-source intuitive solution (2/2)

• For overcoming this problem, one can avoid the symmetry by using:

or , where f(.) and g(.) are two even nonlinear functions.

• The simplest algorithm is;

or

• The algorithm converges when, on the average,

i.e. when the outputs are (approximately) independent.

[

⁽ ⁽ ⁾⁾ ⁽ ⁽ ⁾⁾

]

)

(t E f y t g y t

c_ij = µ _t _i _j

∆ ∆c_ij(t) = µ f (y_i(t))g(y_j(t))

) ( )

( )

(t y t ³ y t c_ij = µ _i _j

[

⁽ ⁾ ⁽ ⁾

]

∆ )

(t E y t ³ y t c_ij = µ _t _i _j

∆

[

⁽ ⁾ ⁽ ⁾

]

⁰^,

)

( = ³ ≈

∆c_ij t µ E_t y_i t y_j t

(12)

General problem

• Assumption on the mixture model F

– linear instantaneous (without memory) model, – linear convolutive (with memory) model,

– non linear model.

• One choses a separation model G, suited to the mixture model

• Sources are assumed to be statistically independent

F G

Unknown sources

S

Estimated sources Mixtures

X

(13)

Separability? (1/3)

• Idea : one tunes G so that Y_i are independent each together

• Question :

• Theoretical results

– Let be X a linear instantaneous regular mixture (Ais a regular matrix) of independent sources, whose at most one is Gaussian, signals Y=BX are independent iff BA = DP (Comon, HOS 1991 and Signal Processing 1994).

Independence is equivalent to separation up to scale and permutation indeterminacies.

A B

Unknown sources

S

Estimated sources Mixtures

X

? t,

independen i Y S

Y_i ∀ ⇔ ≈

(14)

Separability ? (2/3)

• Theoretical results for convolutive mixtures

– Let be X a linear convolutive mixture (A(z) is a filter matrix) of independent sources, whose at most one is Gaussian, signals Y=[B(z)]X are independent iff B(z)A(z) = D(z)P. (Weinstein, Yellin, IEEE SP, 94 : Nguyen, Jutten, SP, 1995).

Independence is equivalent to separation up to filter and permutation indeterminacies.

A(z) B(z)

Unknown sources

S

Estimated sources

Y Mixtures

X

(15)

Separability ? (3/3)

• Theoretical results for non linear mixtures

– For general nonlinear mixtures, Y independence does not insure source separation (Darmois, 1953 ; Hyvarinen, Pajunen,1998).

– Particular nonlinear mixtures are separable, e.g. the so-called post- nonlinear models (Taleb, Jutten, IEEE SP 1999 ; Jutten et al., SP 2004 ; Achard, Jutten, IEEE SPL 2005).

A(z) B(z)

Estimated sources

Y Mixtures

X Unknown

sources S

(16)

Undeterminacies : other approach

• Scale and permutation indeterminacies can be understood according to another way.

For a linear model, the sensor i receives the mixture :

• Easy generalization to convolutive and nonlinear mixtures.

(

⁽ ⁾

)

) (

) ( )

( )

( ₁ ₁ ₂ ₂

t a s

t s a

t s a t

s a t

x

j j

ij j

j ij

n in i

i i

α α

∑











= 

=

+ +

+

= K

(17)

Statistical dependence Measure

• Method based on source independence : ICA

• Independence?

– With a few words…

– the definition :

– a few criteria, more convenient

with characteristic functions : independence of 2 variables is equivalent of cancellation of cross-cumulants of any order (requires an infinite number of equations !)

with Kullback-Leibler divergence:

) ( )

( )

,

( 1 2

2

1 u v p u p v

p_Y_Y = _Y _Y

u du

u ^Y

Y

∏

⁼

∫ ∫ ∏

i Y

Y p u

p p p

p I

i

i ( )

) log (

) ( ...

) ,

(

(18)

Independence criterion

• Properties of Kullback-Leibler divergence

positive and vanishes if and only if

• K-L divergence is then a good independence criterion … but its estimation requires pdf estimations.

u du

u ^Y

Y

∏

⁼

∫ ∫ ∏

i

i Y

Y p u

p p p

p I

i

i ( )

) log (

) ( ...

) ,

(

∏

=

i

Y u

p

p_Y(u) i ( )

(19)

Independence criterion

• Using the second characteristic functions (ln of pdf Fourier transform), the independence definition

becomes:

• Expanding in Taylor series around 0, leads to:

∏

=

i

Y u

p

p_Y(u) i ( )

∑

∏

Ψ

= Ψ











 Φ

= Φ

i

i Y i

i Y

u u

i i

) ( )

(

) ( ln

u u

Y Y

(20)

Independence criterion

• If we only approximate independence ?

• At the second order 2… one only cancels decorrelation :

• The algorithm converges if:

• One observe that :

• The separation matrix B is then symmetric, and cannot inverse any mixing matrix.

j i ij

ij

j i ij

ij

y y b

b

y y E b

b

µ µ

−

=

−

= [ ]

j i y

y

E[ _i _j] = 0,∀ ≠

ji

ij b

b =

(21)

Independence criterion

• At the 2nd order… one only cancels the correlation.

• Algebraic explanation

For 2 variables (mixtures), one computes the terms:

Pb: there are 3 equations, but the separating matrix has 4 unknowns !

For n variables, one computes the terms:

One uses n + (n² - n)/2 = (n² + n)/2 terms, but B contains n² unknowns.

• Consequently, decorrelation is not sufficient.

] E[

and ]

[y ² y₁y₂ E _i

j i y y E y

E[ _i² ]and [ _i _j], ≠

(22)

Independence criterion

• With a better independence approximation : – higher order statistics,

– avoid the symmetry.

• For instance, one proposes the following algorithm:

• The algorithm converges if:

• simplest case ; no longer symmetry:

as many equations as unknowns.

] [

]

[y_i³y_j E y_j³y_i

E ≠

) ( ) (

)]

( ) ( [

j i

ij ij

j i

ij ij

y g y f b

b

y g y f E b

b

µ µ

−

=

−

= f ≠ g

0 )]

( ) (

[ f y_i g y_j = E

0 ] [y_i³y_j = E

(23)

MI-based Algorithm principle

• For linear mixture, one has to estimate B

• We compute the criterion gradient with respect to the parameter :

• The separation matrix is updated according to:

• The algorithm converges when is equal to 0 (on the average), i.e. if I is minimum.

• It is an unsupervised algorithm: independence is directly estimated from the outputs, without any priors.

B B

B ∂

− ∂

=

+ I

t

t 1) ( ) µ (

∂B

∂I

∂B

∂I

(24)

Mutual information algorithm (1/2)

• The mutual information writes:

• Estimation equation is then: where represents the parameter vector

• It leads to the following equation:

which requires estimation of probability density functions (pdf).

) 0

( =

Θ

∂

∂I Y

det 0 ) ln

( ln

det ) ln

( )

(

Θ =

∂ + ∂

Θ

∂

− ∂

=

Θ

∂ + ∂

Θ

∂

= ∂ Θ

∂

∑

G i

i i

i Y

G i

i

y J dy

y p E d

Y J H Y

I

i

G i

i i

i H H Y H J

Y H

I(^Y) =

∑

( )− (^Y) =

∑

( )− (^X)+ lndet Θ

(25)

Mutual information algorithm (2/2)

• For linear mixtures, F and G are modeled by matrices

• Estimation equations lead to:

– For Gaussian sources, : second-order statistics, – For non Gaussian sources : higher (than 2) order statistics,

– Priors or good estimates of source distribution leads to optimal statistics ; approximation of pdf leads to different algorithm implementations (2nd order, cumulants, etc.).

A B

S(t) X(t) = AS(t) Y(t) = BX(t)

j i dY Y

Y p

E d _j

i i

Y_i  = ≠



 



 ln ( ) 0,

i i

i

Y y

dy y p

d ln i ( ) ∝

(26)

Practical issues: pdf estimation

• Estimation equations require pdf’s or score functions estimates

• pdf’s can be estimated using various methods

– expansion near Gaussianity: Gram-Charlier (Lacoume 91, Comon SP 94, Yang et al. SP 98), Edgeworth

– kernel estimators (Pham IEEE Trans. SP 96, Taleb, Jutten IEEE SP 99)

… then, score functions are estimated by derivation

• Score function can be estimated directly by minimizing MSE cost (Pham et al.

EUSIPCO 92 ; Taleb, Jutten ICANN 97, IEEE SP 99)

( )

[ ]

( )



 ∂ ∂



 −

∂

= ∂

∂

−

=

w w

w w w w

w

w w

y y

y y y

J E

y y

E J

Y Y Y

Y Y

) , ( ˆ )

, ( ˆ

) ( )

, ( ) ˆ

, ˆ (

) (

) ( )

, ( 2 ˆ

) 1 (

2 2

ψ ψ

ψ ψ ψ

ψ ψ

(27)

Source separation in PNL mixtures Model and Separability

• PNL are particular NL mixtures

• Linear channels, sensors with NL distortions

• PNL are separable NL mixtures,

– If at most, one source is Gaussian, if the mixing matrix has at least 2 non zero entries per row and per column, and if the functions f_iare invertible, then outputs are independent iff g_iof_i is linear and BA=DP

Observations Unknown

sources

Mixing matrix A

f₁(.) f₂(.)

g₁(.) g₂(.)

Separation matrix B

Estimated sources

S X Y

(28)

Source separation in PNL mixtures Criterion

• The mutual information writes:

• Estimation equations are

– A set for linear part, – A set for NL part.

B E

E Y Y

det ln )

, ( ln

) ( )

(

det ln det

ln )

( )

(

) ( )

( )

(

∂ −

− ∂

−

=

−

=

−

=

∏

∑

i

i i i

i i

i

B G

i

i i

i

e e E g

H Y

H

J E

H Y

H

H Y

H I

θ ) 0

( =

Θ

∂

∂I Y

(29)

Source separation in PNL mixtures Estimation equations

• Linear part estimation equations are:

• Nonlinear part estimation equations are:

• Both sets of equations require pdf or score function estimates for optimality.

[ ]





≠

=

⇔ =

= E y y i j

y y I E

E

j i Y

i i T Y

Y

i i

0 )

(

1 )

) (

( ψ

ψ y y ψ

BZ Y =

=

 =











∑

and ) ( with

1 ,

) (

i j j

j Z

i

j i Y ij

E g Z

n , j Z

Z Y b

E ψ i ψ j L

(30)

Source separation in PNL mixtures Algorithm

• The algorithm is based on the estimation of 3 parts:

– marginal score functions of estimated sources, – estimation of the nonlinear functions g_i,

– estimation of the separation matrix B,

Estimated sources Y Unknown

sources Observations

f₁(.) f₂(.)

g₁(.) g₂(.)

X

Mixing matrix A

Separation matrix B

Score function estimation

S

Parametric (or not) estimation algorithm

(31)

Blind inversion of Wiener systems The model, classical approaches

• Wiener system

• Hammerstein system

• Wiener system is a usual NL model in biology, in satelite communications, etc.

• Usually, input signal is assumed to be iid Gaussian

• Classical identification methods for nonlinear systems are based on higher- order cross-correlations

• If the distortion input is available, the compensation of the nonlinearities is almost straighforward, after identification of the NL

s(t) h f

e(t)

ω

g

e(t) x(t) y(t)

(32)

Blind inversion of Wiener systems Wiener and PNL (1/2)

• With the following parameterization:

since the scalar input s(t) is iid, S(t) has independent components and consequently E(t) is a mixture of independent sources, the Wiener system is nothing but an infinite dimensionnal PNL mixture:

( ) ( ) ( )

















−

+

= −

...

1 2

...

1 1

...

k h k

h k

h

k h k

h k

H h

[ ^s ^t ^k ^s ^t ^k ^s ^t ^k ]^T

t) L, ( 1), ( ), ( 1),L

( = − + − − −

S

[ ê ^t ^k ê ^t ^k ê ^t ^k ]^T

t) L, ( 1), ( ), ( 1),L

( = − + − − −

E

(33)

Blind inversion of Wiener systems Wiener and PNL (2/2)

• If the Wiener system satisfies:

– Subsystems h and f are unknown and invertible ; h can be a nonminimum phase filter

– The input s(t) is an unknown (a priori) non Gaussian iid process

it is equivalent to PNL mixtures, with a particular Toeplitz mixture matrix H, with same NL function f on each channel

• PNL separability implies Wiener systems inversibility

• PNL separability is only proved for finite dimensions. It is conjectured for infinite dimensions

• Practically, the filter h(k) and its inverse w(k) are truncated

(34)

Blind inversion of Wiener systems iid Criterion

• Output of the inversion (Hammerstein) structure:

• Y(t) is spatially independent = the sequence {y(t)} is iid

• The mutual information for infinite dimensional stationary random vectors is defined from entropy rates (Cover, Thomas, John Wiley & Sons, 1991)

• I(Y) is always positive and vanishes if and only if y(t) is an iid

( )

^t ^g

( )

^e

( )

^t

x =

( ) ^H(^y( )^T ^y( )^T )

Y T

H T ,...,

1 2

lim 1 −

= +

+∞

→

y t( ) = ∗w x t( )

( ) ( )











 − −

= +

∑

−

∞ =

→

T

T t H y t H y T y T

Y T

I ( ) ( ), , ( )

1 2

lim 1 )

( L

(35)

Blind inversion of Wiener systems Estimation equations

• Estimation equations are then

• I(Y) must be derived with respect to parameters of the linear part: w, and with respect to the nonlinear function: g

• Linear part. Consider a small relative variation of w, in terms of a convolution by a small filter ε. The first order variation of I(Y):

with with , it leads to

• With a Gaussian signal, reduces to second order statistics...

vector parameter

the is with ,

) 0

( θ

θ =

∂

∂I Y

( )=−

[ (

γ ,_ψ _{( )} +δ

)

^*ε

]

^{( )}⁰

∆ _y _y

Y y

I

{

^γ ( ) ^δ

}

µ

ε = _y_ψ _y +

, y ^ω ←^ω + ^µ

{

^γ _y_,ψ ( )_y +^δ

}

*^ω

y

( )( )

[

(^τ )^ψ ( ( )^τ )

]

γ _y_ψ _y t E y t _y y

y = −

,

( )

γ ψ

(36)

Summary: BSS Advantages

• The method does not require reference signals.

• Very weak priors about the sources

– sources can be similar (noise with same properties, déterministic signals, etc.)

– at most one Gaussian source

• Algorithms can be easily implemented (VLSI, DSP, etc.)

• Tracking is possible for mobile sources, or varying mixtures

• Can be used for any kind of sensors

(37)

Summary: BSS Restrictions

• As many sensors that sources

• Sources are retrieved up to indeterminacies: scale and permutation (for linear instantaneous mixtures)

• The separating model must be suited to the mixing model

(38)

Prior information (1/3)

Sparcity : discrete source

A

0 if s₂ =





+

=

+

=

2 22 1

21 2

2 12 1

11 1

s a s

a x

s a s

a x





=

1 21 2

1 11 1

s a x

(39)

Prior information (2/3)

Non stationary sources

– silence : temporal sparcity (e.g. : speech, ECG, etc.)

A

0 if s₂ =





+

=

+

=

2 22 1

21 2

2 12 1

11 1

s a s

a x

s a s

a x





=

1 21 2

1 11 1

s a x

(40)

Prior information(3/3)

Non stationary source with limited spectrum

– silence : temporal sparcity (e.g. : speech) – limited spectrum : frequential sparcity

– time-frequency sparcity (Rosca et al. ; Abrard and Deville ; ICA’2001) f





+

=

+

=

) , ( )

, ( )

, (

) , ( )

, ( )

,

( ₁₁ ₁ ₁₂ ₂

1

t s

a t

s a t

x

t s

a t

s a t

x

ω ω

ω

ω ω

ω if s₂(ω₀,t₀) = 0



=

= ( , ) )

,

( ₀ ₀ ₁₁ ₁ ₀ ₀

1 t a s t

x

ω ω

t

(41)

Applications

• Biomedical signals : EEG, ECG, MEG, RMIf

– non invasive techniques, localization, artifact compensation

• Communications et antenna processing

– sonar, radar, mobile phones,

• Monitoring

• Sparse images coding

• Classification

• Smart sensor design

(42)

A few examples

1. Dam monitoring

• The dam wall moves according to the water level, the temperature, etc.

• Simple pendular are hung on the wall, at different locations: the pendular deviations measure the wall motions, with different sensitivities for the water level or the temperature, etc.

G. d’Urso et al.,Modélisation des déplacements de barrages, GRETSI 1997,

(43)

A few examples

2. Fetal ECG Extraction

source signals estimated sources

L. De Lathauwer, B. De Moor, J. Vanderwalle, Fetal electrocardiogram extraction by blind subspace separation, IEEE Trans. on BME, 47(5):567-572, 2000

(44)

A few examples

3. Artefacts removal in MEG (1/2)

(45)

A few examples

3. Artefacts removal in MEG (2/2)

(46)

A few examples

4. Smart sensor arrays (1/2)

• Smart sensor arrays based on low cost sensors

– unknown source number: it must be estimated,

– sensore are very closes (200 µm): very weak spatial diversity

(Paraschiv-Ionescu, Jutten, Bouvier, IEEE Sensors Journal, Dec. 2002)

programmable current sources

A A

S/H Mux Interface electronics

biasing amplifier gain ranging, addressing, enable H₁

H₆ B₁

B₂

B_m

Hall-type silicon sensor array unknown

sensor environment

H₂ V_H(t)

DSP card

Program code:

•Source number estimation

•Source separation

ADC DSP DAC

RAM

(47)

A few examples

4. Smart sensor arrays (2/2)

(48)

A few examples

5. Multi-user access control

Y. Deville, L. Andry, Application of blind source separation techniques for multi-tag contactless identification systems (NOLTA 95; IEICE Trans. On Fundamentals of Electronics, 1996;

French patent Sept. 1995 subsequently extended)

(49)

6. Smart chemical sensor array

(G. Bedoya, S. Bermijo, J. Cabestany, et al. IST Sewing project)

ID for the conventional MOSFET is:

I^D= α[(V^G –V^T)-0.5V^D]V^D (1) where α = µC^oW V^D/L

MOSFET V^G ground-to-gate metal potential

ISFET V^G* ground-to-membrane potential

S D

V^ref

E^ref

V(Vref-elec) Nernst Potential

(50)

ISFET modeling

Metal - semiconductor work function

V^T = Φ^ms + 2 Φ^F- Q^ss+Q^B Co

V^T*= Φ^cs + 2 Φ^F- Q^ss+Q^B +Eref – E^o Co

Chemical membrane - semiconductor work function

V^G* = Vref + RT ln (aⁱ + K^ij a^{j zi/zj} )

nF Empirical Nikolski-Eisenmann

equation