• Aucun résultat trouvé

Lab. des Images et des Signaux (LIS)

N/A
N/A
Protected

Academic year: 2021

Partager "Lab. des Images et des Signaux (LIS)"

Copied!
59
0
0

Texte intégral

(1)

Réseaux de neurones formels

Christian Jutten

Lab. des Images et des Signaux (LIS)

UMR 5083 Centre National de la Recherche Scientifique, Institut National Polytechnique de Grenoble,

Université Joseph Fourier

(2)

Contenu

• I. Introduction

• II. Quelques flashs de neurobiologie

• III. Modèles mathématiques

• IV. Coopération et compétition

• V. Mémoires associatives linéaires

• VI. Perceptrons multi-couches

• VII. Modèles de Hopfield

• VIII. Cartes auto-organisatrices de Kohonen

IX. Séparation de sources

• X. Présentation du BE et des mini-projets

(3)

Chapitre 9

Séparation aveugle de sources

(4)

The problem

• The signal received by a sensor (electrode, antenna, microphone, etc.) is an intricated signal.

)

1(t

s as1(t)+ bs2(t)

)

2(t s

How can one retrieve the different signals (sources) from the mixture ?

(5)

Solution

• It is possible using a few observations observations – more sensors than sources

– different mixtures (observations)

 ≠ d

c b

a

) ( )

( 2

1 t bs t

as +

)

1(t s

) ( )

( 2

1 t ds t

cs +

)

2(t s

(6)

Summary

• Origine of the problem : motion decoding in vertebrates

• Neuromimetic approach: an intuitive solution

• Statement of the problem, assumptions and solutions

• Independence criteria (ICA)

• Algorithm principles

• Exploitation of prior informations

• A few examples

• Exhibitions and discussion

(7)

Motion decoding

+

=

+

=

) ( )

( )

(

) ( )

( )

(

22 21

12 11

t v a t

p a t

f

t v a t

p a t

f

II I

Linear model

(8)

Neuromimetic approach

One uses a neural network with recurrent connections able to separate the unknown sources by tuning the weights.

=

j

j ij i

i t x t c y t

y ( ) ( ) ( )

)

1(t

x y1(t)

c1n c12 )

2(t

x y2(t)

) (t

yn ( ) ( ) ( ) ( )

) ( )

( ) (

) ( )

( )

(

1X t B X t

C I t

Y

t X t

Y C I

t Y C t

X t

Y

=

=

=

=

c2n c21

) (t xn

cnn cn2 cn1

(9)

2-source problem

)

1(t x

)

2(t x

c12

c21

)

1(t y

)

2(t

y

+

=

+

=

) ( )

( )

(

) ( )

( )

(

2 22 1

21 2

2 12 1

11 1

t s a t s a t

x

t s a t s a t

x

=

=

) ( )

( )

(

) ( )

( )

(

1 21 2

2

2 12 1

1

t y c t x t

y

t y c t x t

y

=

=

)) ( )

( ( )

( )

(

)) ( )

( ( )

( )

(

2 12 1

21 2

2

1 21 2

12 1

1

t y c t x c t x t

y

t y c t x c t x t

y



+

=

=

+

=

=

21 12

2 21 21 22

1 11 21 21

21 12

1 21 2 2

21 12

2 22 12 12

1 21 12 11

21 12

2 12 1 1

1

) ( ) (

) ( ) (

1

) ( )

) ( (

1

) ( ) (

) ( ) (

1

) ( )

) ( (

c c

t s a c a

t s a c a

c c

t x c t t x

y

c c

t s a c a

t s a c a

c c

t x c t t x

y

21 21

12 12 et

a c a

a

c = a =

=

=

) ( )

(

) ( )

( 11 1

1

t s a t

y

t s a t

y

=

=

) ( )

(

) ( )

( 12 2

1

t s a t

y

t s a t

y

21 21 22

21

12 11 et

a c a

a

c = a =

(10)

2-source intuitive solution (1/2)

• There exists one solution (in fact, 2 solutions)

• How to find the solution, i.e. cij since the aij‘s are unknown

• The idea could be to estimate cij so that outputs yj become independent

• A first approach could be based on output decorrelation:

i.e. according to stochastic iteration:

• Problem: , i.e. the separation matrix is symmetric and cannot « inverse » any mixing matrix A.

[

( ) ( )

]

)

(t E y t y t cij = µ t i j

) ( ) ( )

(t y t y t cij = µ i j

) ( )

(t c t cij = ji

(11)

2-source intuitive solution (2/2)

• For overcoming this problem, one can avoid the symmetry by using:

or , where f(.) and g(.) are two even nonlinear functions.

• The simplest algorithm is;

or

• The algorithm converges when, on the average,

i.e. when the outputs are (approximately) independent.

[

( ( )) ( ( ))

]

)

(t E f y t g y t

cij = µ t i j

cij(t) = µ f (yi(t))g(yj(t))

) ( )

( )

(t y t 3 y t cij = µ i j

[

( ) ( )

]

)

(t E y t 3 y t cij = µ t i j

[

( ) ( )

]

0,

)

( = 3

cij t µ Et yi t yj t

(12)

General problem

• Assumption on the mixture model F

– linear instantaneous (without memory) model, – linear convolutive (with memory) model,

– non linear model.

• One choses a separation model G, suited to the mixture model

• Sources are assumed to be statistically independent

F G

Unknown sources

S

Estimated sources Mixtures

X

(13)

Separability? (1/3)

• Idea : one tunes G so that Yi are independent each together

• Question :

• Theoretical results

– Let be X a linear instantaneous regular mixture (Ais a regular matrix) of independent sources, whose at most one is Gaussian, signals Y=BX are independent iff BA = DP (Comon, HOS 1991 and Signal Processing 1994).

Independence is equivalent to separation up to scale and permutation indeterminacies.

A B

Unknown sources

S

Estimated sources Mixtures

X

? t,

independen i Y S

Yi ∀ ⇔ ≈

(14)

Separability ? (2/3)

• Theoretical results for convolutive mixtures

– Let be X a linear convolutive mixture (A(z) is a filter matrix) of independent sources, whose at most one is Gaussian, signals Y=[B(z)]X are independent iff B(z)A(z) = D(z)P. (Weinstein, Yellin, IEEE SP, 94 : Nguyen, Jutten, SP, 1995).

Independence is equivalent to separation up to filter and permutation indeterminacies.

A(z) B(z)

Unknown sources

S

Estimated sources

Y Mixtures

X

(15)

Separability ? (3/3)

• Theoretical results for non linear mixtures

– For general nonlinear mixtures, Y independence does not insure source separation (Darmois, 1953 ; Hyvarinen, Pajunen,1998).

– Particular nonlinear mixtures are separable, e.g. the so-called post- nonlinear models (Taleb, Jutten, IEEE SP 1999 ; Jutten et al., SP 2004 ; Achard, Jutten, IEEE SPL 2005).

A(z) B(z)

Estimated sources

Y Mixtures

X Unknown

sources S

(16)

Undeterminacies : other approach

• Scale and permutation indeterminacies can be understood according to another way.

For a linear model, the sensor i receives the mixture :

• Easy generalization to convolutive and nonlinear mixtures.

(

( )

)

) (

) ( )

( )

( )

( 1 1 2 2

t a s

t s a

t s a t

s a t

s a t

x

j j

j j

ij j

j ij

n in i

i i

α α

=

=

+ +

+

= K

(17)

Statistical dependence Measure

• Method based on source independence : ICA

• Independence?

– With a few words…

– the definition :

– a few criteria, more convenient

with characteristic functions : independence of 2 variables is equivalent of cancellation of cross-cumulants of any order (requires an infinite number of equations !)

with Kullback-Leibler divergence:

) ( )

( )

,

( 1 2

2

1 u v p u p v

pYY = Y Y

u du

u Y

Y

Y

=

∫ ∫ ∏

i Y

Y p u

p p p

p I

i

i ( )

) log (

) ( ...

) ,

(

(18)

Independence criterion

• Properties of Kullback-Leibler divergence

positive and vanishes if and only if

• K-L divergence is then a good independence criterion … but its estimation requires pdf estimations.

u du

u Y

Y

Y

=

∫ ∫ ∏

i

i Y

Y p u

p p p

p I

i

i ( )

) log (

) ( ...

) ,

(

=

i

i

Y u

p

pY(u) i ( )

(19)

Independence criterion

• Using the second characteristic functions (ln of pdf Fourier transform), the independence definition

becomes:

• Expanding in Taylor series around 0, leads to:

=

i

i

Y u

p

pY(u) i ( )

Ψ

= Ψ

Φ

= Φ

i

i Y i

i Y

u u

i i

) ( )

(

) ( ln

) ( ln

u u

Y Y

(20)

Independence criterion

• If we only approximate independence ?

• At the second order 2… one only cancels decorrelation :

• The algorithm converges if:

• One observe that :

• The separation matrix B is then symmetric, and cannot inverse any mixing matrix.

j i ij

ij

j i ij

ij

y y b

b

y y E b

b

µ µ

=

= [ ]

j i y

y

E[ i j] = 0,

ji

ij b

b =

(21)

Independence criterion

• At the 2nd order… one only cancels the correlation.

• Algebraic explanation

For 2 variables (mixtures), one computes the terms:

Pb: there are 3 equations, but the separating matrix has 4 unknowns !

For n variables, one computes the terms:

One uses n + (n2 - n)/2 = (n2 + n)/2 terms, but B contains n2 unknowns.

• Consequently, decorrelation is not sufficient.

] E[

and ]

[y 2 y1y2 E i

j i y y E y

E[ i2 ]and [ i j],

(22)

Independence criterion

• With a better independence approximation : – higher order statistics,

– avoid the symmetry.

• For instance, one proposes the following algorithm:

• The algorithm converges if:

• simplest case ; no longer symmetry:

as many equations as unknowns.

] [

]

[yi3yj E yj3yi

E

) ( ) (

)]

( ) ( [

j i

ij ij

j i

ij ij

y g y f b

b

y g y f E b

b

µ µ

=

= f g

0 )]

( ) (

[ f yi g yj = E

0 ] [yi3yj = E

(23)

MI-based Algorithm principle

• For linear mixture, one has to estimate B

• We compute the criterion gradient with respect to the parameter :

• The separation matrix is updated according to:

• The algorithm converges when is equal to 0 (on the average), i.e. if I is minimum.

• It is an unsupervised algorithm: independence is directly estimated from the outputs, without any priors.

B B

B

=

+ I

t

t 1) ( ) µ (

B

∂I

B

∂I

(24)

Mutual information algorithm (1/2)

• The mutual information writes:

• Estimation equation is then: where represents the parameter vector

• It leads to the following equation:

which requires estimation of probability density functions (pdf).

) 0

( =

Θ

I Y

det 0 ) ln

( ln

det ) ln

( )

(

Θ =

+

Θ

=

Θ

+

Θ

= Θ

G i

i i

i Y

G i

i

y J dy

y p E d

Y J H Y

I

i

G i

i i

i H H Y H J

Y H

I(Y) =

( ) (Y) =

( ) (X)+ lndet Θ

(25)

Mutual information algorithm (2/2)

• For linear mixtures, F and G are modeled by matrices

• Estimation equations lead to:

– For Gaussian sources, : second-order statistics, – For non Gaussian sources : higher (than 2) order statistics,

– Priors or good estimates of source distribution leads to optimal statistics ; approximation of pdf leads to different algorithm implementations (2nd order, cumulants, etc.).

A B

S(t) X(t) = AS(t) Y(t) = BX(t)

j i dY Y

Y p

E d j

i i

Yi =

ln ( ) 0,

i i

i

Y y

dy y p

d ln i ( )

(26)

Practical issues: pdf estimation

• Estimation equations require pdf’s or score functions estimates

• pdf’s can be estimated using various methods

– expansion near Gaussianity: Gram-Charlier (Lacoume 91, Comon SP 94, Yang et al. SP 98), Edgeworth

– kernel estimators (Pham IEEE Trans. SP 96, Taleb, Jutten IEEE SP 99)

… then, score functions are estimated by derivation

• Score function can be estimated directly by minimizing MSE cost (Pham et al.

EUSIPCO 92 ; Taleb, Jutten ICANN 97, IEEE SP 99)

( )

[ ]

( )





=

=

w w

w w w w

w

w w

y y

y y y

J E

y y

E J

Y Y Y

Y Y

) , ( ˆ )

, ( ˆ

) ( )

, ( ) ˆ

, ˆ (

) (

) ( )

, ( 2 ˆ

) 1 (

2 2

ψ ψ

ψ ψ ψ

ψ ψ

(27)

Source separation in PNL mixtures Model and Separability

• PNL are particular NL mixtures

• Linear channels, sensors with NL distortions

• PNL are separable NL mixtures,

– If at most, one source is Gaussian, if the mixing matrix has at least 2 non zero entries per row and per column, and if the functions fi are invertible, then outputs are independent iff giofi is linear and BA=DP

Observations Unknown

sources

Mixing matrix A

f1(.) f2(.)

g1(.) g2(.)

Separation matrix B

Estimated sources

S X Y

(28)

Source separation in PNL mixtures Criterion

• The mutual information writes:

• Estimation equations are

– A set for linear part, – A set for NL part.

B E

E Y Y

det ln )

, ( ln

) ( )

(

det ln det

ln )

( )

(

) ( )

( )

(

=

=

=

i

i i i

i i

i

B G

i

i i

i

e e E g

H Y

H

J E

J E

H Y

H

H Y

H I

θ ) 0

( =

Θ

I Y

(29)

Source separation in PNL mixtures Estimation equations

• Linear part estimation equations are:

• Nonlinear part estimation equations are:

• Both sets of equations require pdf or score function estimates for optimality.

[ ]

=

=

= E y y i j

y y I E

E

j i Y

i i T Y

Y

i i

0 )

(

1 )

) (

( ψ

ψ y y ψ

BZ Y =

=

=

=

and ) ( with

1 ,

) (

) (

i j j

j Z

i

j i Y ij

E g Z

n , j Z

Z Y b

E ψ i ψ j L

(30)

Source separation in PNL mixtures Algorithm

• The algorithm is based on the estimation of 3 parts:

– marginal score functions of estimated sources, – estimation of the nonlinear functions gi,

– estimation of the separation matrix B,

Estimated sources Y Unknown

sources Observations

f1(.) f2(.)

g1(.) g2(.)

X

Mixing matrix A

Separation matrix B

Score function estimation

S

Parametric (or not) estimation algorithm

(31)

Blind inversion of Wiener systems The model, classical approaches

Wiener system

Hammerstein system

Wiener system is a usual NL model in biology, in satelite communications, etc.

Usually, input signal is assumed to be iid Gaussian

Classical identification methods for nonlinear systems are based on higher- order cross-correlations

If the distortion input is available, the compensation of the nonlinearities is almost straighforward, after identification of the NL

s(t) h f

e(t)

ω

g

e(t) x(t) y(t)

(32)

Blind inversion of Wiener systems Wiener and PNL (1/2)

• With the following parameterization:

since the scalar input s(t) is iid, S(t) has independent components and consequently E(t) is a mixture of independent sources, the Wiener system is nothing but an infinite dimensionnal PNL mixture:

( ) ( ) ( )

( ) ( ) ( )





+

=

...

...

...

...

...

...

1 2

...

...

1 1

...

...

...

...

...

...

k h k

h k

h

k h k

h k

H h

[ s t k s t k s t k ]T

t) L, ( 1), ( ), ( 1),L

( = +

S

[ e t k e t k e t k ]T

t) L, ( 1), ( ), ( 1),L

( = +

E

(33)

Blind inversion of Wiener systems Wiener and PNL (2/2)

• If the Wiener system satisfies:

– Subsystems h and f are unknown and invertible ; h can be a nonminimum phase filter

– The input s(t) is an unknown (a priori) non Gaussian iid process

it is equivalent to PNL mixtures, with a particular Toeplitz mixture matrix H, with same NL function f on each channel

• PNL separability implies Wiener systems inversibility

• PNL separability is only proved for finite dimensions. It is conjectured for infinite dimensions

• Practically, the filter h(k) and its inverse w(k) are truncated

(34)

Blind inversion of Wiener systems iid Criterion

• Output of the inversion (Hammerstein) structure:

Y(t) is spatially independent = the sequence {y(t)} is iid

• The mutual information for infinite dimensional stationary random vectors is defined from entropy rates (Cover, Thomas, John Wiley & Sons, 1991)

I(Y) is always positive and vanishes if and only if y(t) is an iid

( )

t g

( )

e

( )

t

x =

( ) H(y( )T y( )T )

Y T

H T ,...,

1 2

lim 1

= +

+∞

y t( ) = ∗w x t( )

( ) ( )





= +

=

T

T

T t H y t H y T y T

Y T

I ( ) ( ), , ( )

1 2

lim 1 )

( L

(35)

Blind inversion of Wiener systems Estimation equations

• Estimation equations are then

I(Y) must be derived with respect to parameters of the linear part: w, and with respect to the nonlinear function: g

Linear part. Consider a small relative variation of w, in terms of a convolution by a small filter ε. The first order variation of I(Y):

with with , it leads to

• With a Gaussian signal, reduces to second order statistics...

vector parameter

the is with ,

) 0

( θ

θ =

I Y

( )=

[ (

γ ,ψ ( ) +δ

)

*ε

]

( )0

y y

Y y

I

{

γ ( ) δ

}

µ

ε = yψ y +

, y ω ω + µ

{

γ y,ψ ( )y +δ

}

*ω

y

( )( )

[

(τ )ψ ( ( )τ )

]

γ yψ y t E y t y y

y =

,

( )

γ ψ

(36)

Summary: BSS Advantages

• The method does not require reference signals.

• Very weak priors about the sources

– sources can be similar (noise with same properties, déterministic signals, etc.)

– at most one Gaussian source

• Algorithms can be easily implemented (VLSI, DSP, etc.)

• Tracking is possible for mobile sources, or varying mixtures

• Can be used for any kind of sensors

(37)

Summary: BSS Restrictions

• As many sensors that sources

• Sources are retrieved up to indeterminacies: scale and permutation (for linear instantaneous mixtures)

• The separating model must be suited to the mixing model

(38)

Prior information (1/3)

Sparcity : discrete source

A

0 if s2 =

+

=

+

=

2 22 1

21 2

2 12 1

11 1

s a s

a x

s a s

a x

=

=

1 21 2

1 11 1

s a x

s a x

(39)

Prior information (2/3)

Non stationary sources

– silence : temporal sparcity (e.g. : speech, ECG, etc.)

A

0 if s2 =

+

=

+

=

2 22 1

21 2

2 12 1

11 1

s a s

a x

s a s

a x

=

=

1 21 2

1 11 1

s a x

s a x

(40)

Prior information(3/3)

Non stationary source with limited spectrum

– silence : temporal sparcity (e.g. : speech) – limited spectrum : frequential sparcity

– time-frequency sparcity (Rosca et al. ; Abrard and Deville ; ICA’2001) f

+

=

+

=

) , ( )

, ( )

, (

) , ( )

, ( )

,

( 11 1 12 2

1

t s

a t

s a t

x

t s

a t

s a t

x

ω ω

ω

ω ω

ω if s2(ω0,t0) = 0

=

= ( , ) )

,

( 0 0 11 1 0 0

1 t a s t

x

ω ω

ω ω

t

(41)

Applications

• Biomedical signals : EEG, ECG, MEG, RMIf

– non invasive techniques, localization, artifact compensation

• Communications et antenna processing

– sonar, radar, mobile phones,

• Monitoring

• Sparse images coding

• Classification

• Smart sensor design

(42)

A few examples

1. Dam monitoring

• The dam wall moves according to the water level, the temperature, etc.

• Simple pendular are hung on the wall, at different locations: the pendular deviations measure the wall motions, with different sensitivities for the water level or the temperature, etc.

G. d’Urso et al.,Modélisation des déplacements de barrages, GRETSI 1997,

(43)

A few examples

2. Fetal ECG Extraction

source signals estimated sources

L. De Lathauwer, B. De Moor, J. Vanderwalle, Fetal electrocardiogram extraction by blind subspace separation, IEEE Trans. on BME, 47(5):567-572, 2000

(44)

A few examples

3. Artefacts removal in MEG (1/2)

(45)

A few examples

3. Artefacts removal in MEG (2/2)

(46)

A few examples

4. Smart sensor arrays (1/2)

• Smart sensor arrays based on low cost sensors

– unknown source number: it must be estimated,

– sensore are very closes (200 µm): very weak spatial diversity

(Paraschiv-Ionescu, Jutten, Bouvier, IEEE Sensors Journal, Dec. 2002)

programmable current sources

A A

S/H Mux Interface electronics

biasing amplifier gain ranging, addressing, enable H1

H6 B1

B2

Bm

Hall-type silicon sensor array unknown

sensor environment

H2 VH(t)

DSP card

Program code:

•Source number estimation

•Source separation

ADC DSP DAC

RAM

(47)

A few examples

4. Smart sensor arrays (2/2)

(48)

A few examples

5. Multi-user access control

Y. Deville, L. Andry, Application of blind source separation techniques for multi-tag contactless identification systems (NOLTA 95; IEICE Trans. On Fundamentals of Electronics, 1996;

French patent Sept. 1995 subsequently extended)

(49)

6. Smart chemical sensor array

(G. Bedoya, S. Bermijo, J. Cabestany, et al. IST Sewing project)

ID for the conventional MOSFET is:

ID = α[(VG –VT)-0.5VD]VD (1) where α = µCoW VD/L

MOSFET VG ground-to-gate metal potential

ISFET VG* ground-to-membrane potential

S D

Vref

Eref

V(Vref-elec) Nernst Potential

(50)

ISFET modeling

Metal - semiconductor work function

VT = Φms + 2 ΦF - Qss+QB Co

VT*= Φcs + 2 ΦF - Qss+QB +Eref – Eo Co

Chemical membrane - semiconductor work function

VG* = Vref + RT ln (ai + Kij aj zi/zj )

nF Empirical Nikolski-Eisenmann

equation

Références

Documents relatifs

In [2], they proposed a two-step approach by combining the fractional lower order statistic for the mixing estimation and minimum entropy criterion for noise-free source

When the flat mirror is replaced by a diffractive electron mirror (Fig. 3.7b), the beam incident on the mirror will be both reflected and diffracted, leading to multiple focused

La technique de coulée de sol-gel a d'abord été développée pour la production de tubes à grande enveloppe pour des préformes de fibres optiques et a été modifiée pour

the framework of blind source separation (BSS) where the mixing process is generally described as a linear convolutive model, and independent component analysis (ICA) (Hyvarinen et

The efficient market hypothesis implies that any published information is taken into account by the market, whatever its format of presentation. Research on the impact of

However for post nonlinear (PNL) mixtures, the above method introduced.. some bias in estimating the reduced criterion and therefore, it might be prefer- able to consider a

Also, with the selection of this small subset of taps, which are truly responsible for spatial dependencies, we can successfully avoid the possible temporal whitening effect of

En effet, notre étude confirme des données déjà connues, les complications qui surviennent pendant la grossesse et l’accouchement chez l’adolescente sont tout à fait