Réseaux de neurones formels
Christian Jutten
Lab. des Images et des Signaux (LIS)
UMR 5083 Centre National de la Recherche Scientifique, Institut National Polytechnique de Grenoble,
Université Joseph Fourier
Contenu
• I. Introduction
• II. Quelques flashs de neurobiologie
• III. Modèles mathématiques
• IV. Coopération et compétition
• V. Mémoires associatives linéaires
• VI. Perceptrons multi-couches
• VII. Modèles de Hopfield
• VIII. Cartes auto-organisatrices de Kohonen
• IX. Séparation de sources
• X. Présentation du BE et des mini-projets
Chapitre 9
Séparation aveugle de sources
The problem
• The signal received by a sensor (electrode, antenna, microphone, etc.) is an intricated signal.
)
1(t
s as1(t)+ bs2(t)
)
2(t s
How can one retrieve the different signals (sources) from the mixture ?
Solution
• It is possible using a few observations observations – more sensors than sources
– different mixtures (observations)
≠ d
c b
a
) ( )
( 2
1 t bs t
as +
)
1(t s
) ( )
( 2
1 t ds t
cs +
)
2(t s
Summary
• Origine of the problem : motion decoding in vertebrates
• Neuromimetic approach: an intuitive solution
• Statement of the problem, assumptions and solutions
• Independence criteria (ICA)
• Algorithm principles
• Exploitation of prior informations
• A few examples
• Exhibitions and discussion
Motion decoding
+
=
+
=
) ( )
( )
(
) ( )
( )
(
22 21
12 11
t v a t
p a t
f
t v a t
p a t
f
II I
Linear model
Neuromimetic approach
One uses a neural network with recurrent connections able to separate the unknown sources by tuning the weights.
∑
−
=
j
j ij i
i t x t c y t
y ( ) ( ) ( )
)
1(t
x y1(t)
c1n c12 )
2(t
x y2(t)
) (t
yn ( ) ( ) ( ) ( )
) ( )
( ) (
) ( )
( )
(
1X t B X t
C I t
Y
t X t
Y C I
t Y C t
X t
Y
=
−
=
=
−
−
=
−
c2n c21
) (t xn
cnn cn2 cn1
2-source problem
)
1(t x
)
2(t x
c12
c21
)
1(t y
)
2(t
y
+
=
+
=
) ( )
( )
(
) ( )
( )
(
2 22 1
21 2
2 12 1
11 1
t s a t s a t
x
t s a t s a t
x
−
=
−
=
) ( )
( )
(
) ( )
( )
(
1 21 2
2
2 12 1
1
t y c t x t
y
t y c t x t
y
−
−
=
−
−
=
)) ( )
( ( )
( )
(
)) ( )
( ( )
( )
(
2 12 1
21 2
2
1 21 2
12 1
1
t y c t x c t x t
y
t y c t x c t x t
y
−
− +
= −
−
= −
−
− +
= −
−
= −
21 12
2 21 21 22
1 11 21 21
21 12
1 21 2 2
21 12
2 22 12 12
1 21 12 11
21 12
2 12 1 1
1
) ( ) (
) ( ) (
1
) ( )
) ( (
1
) ( ) (
) ( ) (
1
) ( )
) ( (
c c
t s a c a
t s a c a
c c
t x c t t x
y
c c
t s a c a
t s a c a
c c
t x c t t x
y
21 21
12 12 et
a c a
a
c = a =
=
=
) ( )
(
) ( )
( 11 1
1
t s a t
y
t s a t
y
=
=
) ( )
(
) ( )
( 12 2
1
t s a t
y
t s a t
y
21 21 22
21
12 11 et
a c a
a
c = a =
2-source intuitive solution (1/2)
• There exists one solution (in fact, 2 solutions)
• How to find the solution, i.e. cij since the aij‘s are unknown
• The idea could be to estimate cij so that outputs yj become independent
• A first approach could be based on output decorrelation:
i.e. according to stochastic iteration:
• Problem: , i.e. the separation matrix is symmetric and cannot « inverse » any mixing matrix A.
[
( ) ( )]
)
(t E y t y t cij = µ t i j
∆
) ( ) ( )
(t y t y t cij = µ i j
∆
) ( )
(t c t cij = ∆ ji
∆
2-source intuitive solution (2/2)
• For overcoming this problem, one can avoid the symmetry by using:
or , where f(.) and g(.) are two even nonlinear functions.
• The simplest algorithm is;
or
• The algorithm converges when, on the average,
i.e. when the outputs are (approximately) independent.
[
( ( )) ( ( ))]
)
(t E f y t g y t
cij = µ t i j
∆ ∆cij(t) = µ f (yi(t))g(yj(t))
) ( )
( )
(t y t 3 y t cij = µ i j
[
( ) ( )]
∆ )(t E y t 3 y t cij = µ t i j
∆
[
( ) ( )]
0,)
( = 3 ≈
∆cij t µ Et yi t yj t
General problem
• Assumption on the mixture model F
– linear instantaneous (without memory) model, – linear convolutive (with memory) model,
– non linear model.
• One choses a separation model G, suited to the mixture model
• Sources are assumed to be statistically independent
F G
Unknown sources
S
Estimated sources Mixtures
X
Separability? (1/3)
• Idea : one tunes G so that Yi are independent each together
• Question :
• Theoretical results
– Let be X a linear instantaneous regular mixture (Ais a regular matrix) of independent sources, whose at most one is Gaussian, signals Y=BX are independent iff BA = DP (Comon, HOS 1991 and Signal Processing 1994).
Independence is equivalent to separation up to scale and permutation indeterminacies.
A B
Unknown sources
S
Estimated sources Mixtures
X
? t,
independen i Y S
Yi ∀ ⇔ ≈
Separability ? (2/3)
• Theoretical results for convolutive mixtures
– Let be X a linear convolutive mixture (A(z) is a filter matrix) of independent sources, whose at most one is Gaussian, signals Y=[B(z)]X are independent iff B(z)A(z) = D(z)P. (Weinstein, Yellin, IEEE SP, 94 : Nguyen, Jutten, SP, 1995).
Independence is equivalent to separation up to filter and permutation indeterminacies.
A(z) B(z)
Unknown sources
S
Estimated sources
Y Mixtures
X
Separability ? (3/3)
• Theoretical results for non linear mixtures
– For general nonlinear mixtures, Y independence does not insure source separation (Darmois, 1953 ; Hyvarinen, Pajunen,1998).
– Particular nonlinear mixtures are separable, e.g. the so-called post- nonlinear models (Taleb, Jutten, IEEE SP 1999 ; Jutten et al., SP 2004 ; Achard, Jutten, IEEE SPL 2005).
A(z) B(z)
Estimated sources
Y Mixtures
X Unknown
sources S
Undeterminacies : other approach
• Scale and permutation indeterminacies can be understood according to another way.
For a linear model, the sensor i receives the mixture :
• Easy generalization to convolutive and nonlinear mixtures.
(
( ))
) (
) ( )
( )
( )
( 1 1 2 2
t a s
t s a
t s a t
s a t
s a t
x
j j
j j
ij j
j ij
n in i
i i
α α
∑
∑
=
=
+ +
+
= K
Statistical dependence Measure
• Method based on source independence : ICA
• Independence?
– With a few words…
– the definition :
– a few criteria, more convenient
with characteristic functions : independence of 2 variables is equivalent of cancellation of cross-cumulants of any order (requires an infinite number of equations !)
with Kullback-Leibler divergence:
) ( )
( )
,
( 1 2
2
1 u v p u p v
pYY = Y Y
u du
u Y
Y
Y
∏
=∫ ∫ ∏
i Y
Y p u
p p p
p I
i
i ( )
) log (
) ( ...
) ,
(
Independence criterion
• Properties of Kullback-Leibler divergence
positive and vanishes if and only if
• K-L divergence is then a good independence criterion … but its estimation requires pdf estimations.
u du
u Y
Y
Y
∏
=∫ ∫ ∏
i
i Y
Y p u
p p p
p I
i
i ( )
) log (
) ( ...
) ,
(
∏
=
i
i
Y u
p
pY(u) i ( )
Independence criterion
• Using the second characteristic functions (ln of pdf Fourier transform), the independence definition
becomes:
• Expanding in Taylor series around 0, leads to:
∏
=
i
i
Y u
p
pY(u) i ( )
∑
∏
Ψ
= Ψ
Φ
= Φ
i
i Y i
i Y
u u
i i
) ( )
(
) ( ln
) ( ln
u u
Y Y
Independence criterion
• If we only approximate independence ?
• At the second order 2… one only cancels decorrelation :
• The algorithm converges if:
• One observe that :
• The separation matrix B is then symmetric, and cannot inverse any mixing matrix.
j i ij
ij
j i ij
ij
y y b
b
y y E b
b
µ µ
−
=
−
= [ ]
j i y
y
E[ i j] = 0,∀ ≠
ji
ij b
b =
Independence criterion
• At the 2nd order… one only cancels the correlation.
• Algebraic explanation
For 2 variables (mixtures), one computes the terms:
Pb: there are 3 equations, but the separating matrix has 4 unknowns !
For n variables, one computes the terms:
One uses n + (n2 - n)/2 = (n2 + n)/2 terms, but B contains n2 unknowns.
• Consequently, decorrelation is not sufficient.
] E[
and ]
[y 2 y1y2 E i
j i y y E y
E[ i2 ]and [ i j], ≠
Independence criterion
• With a better independence approximation : – higher order statistics,
– avoid the symmetry.
• For instance, one proposes the following algorithm:
• The algorithm converges if:
• simplest case ; no longer symmetry:
as many equations as unknowns.
] [
]
[yi3yj E yj3yi
E ≠
) ( ) (
)]
( ) ( [
j i
ij ij
j i
ij ij
y g y f b
b
y g y f E b
b
µ µ
−
=
−
= f ≠ g
0 )]
( ) (
[ f yi g yj = E
0 ] [yi3yj = E
MI-based Algorithm principle
• For linear mixture, one has to estimate B
• We compute the criterion gradient with respect to the parameter :
• The separation matrix is updated according to:
• The algorithm converges when is equal to 0 (on the average), i.e. if I is minimum.
• It is an unsupervised algorithm: independence is directly estimated from the outputs, without any priors.
B B
B ∂
− ∂
=
+ I
t
t 1) ( ) µ (
∂B
∂I
∂B
∂I
Mutual information algorithm (1/2)
• The mutual information writes:
• Estimation equation is then: where represents the parameter vector
• It leads to the following equation:
which requires estimation of probability density functions (pdf).
) 0
( =
Θ
∂
∂I Y
det 0 ) ln
( ln
det ) ln
( )
(
Θ =
∂ + ∂
Θ
∂
− ∂
=
Θ
∂ + ∂
Θ
∂
= ∂ Θ
∂
∂
∑
∑
G i
i i
i Y
G i
i
y J dy
y p E d
Y J H Y
I
i
G i
i i
i H H Y H J
Y H
I(Y) =
∑
( )− (Y) =∑
( )− (X)+ lndet ΘMutual information algorithm (2/2)
• For linear mixtures, F and G are modeled by matrices
• Estimation equations lead to:
– For Gaussian sources, : second-order statistics, – For non Gaussian sources : higher (than 2) order statistics,
– Priors or good estimates of source distribution leads to optimal statistics ; approximation of pdf leads to different algorithm implementations (2nd order, cumulants, etc.).
A B
S(t) X(t) = AS(t) Y(t) = BX(t)
j i dY Y
Y p
E d j
i i
Yi = ≠
ln ( ) 0,
i i
i
Y y
dy y p
d ln i ( ) ∝
Practical issues: pdf estimation
• Estimation equations require pdf’s or score functions estimates
• pdf’s can be estimated using various methods
– expansion near Gaussianity: Gram-Charlier (Lacoume 91, Comon SP 94, Yang et al. SP 98), Edgeworth
– kernel estimators (Pham IEEE Trans. SP 96, Taleb, Jutten IEEE SP 99)
… then, score functions are estimated by derivation
• Score function can be estimated directly by minimizing MSE cost (Pham et al.
EUSIPCO 92 ; Taleb, Jutten ICANN 97, IEEE SP 99)
( )
[ ]
( )
∂ ∂
−
∂
= ∂
∂
∂
−
=
w w
w w w w
w
w w
y y
y y y
J E
y y
E J
Y Y Y
Y Y
) , ( ˆ )
, ( ˆ
) ( )
, ( ) ˆ
, ˆ (
) (
) ( )
, ( 2 ˆ
) 1 (
2 2
ψ ψ
ψ ψ ψ
ψ ψ
Source separation in PNL mixtures Model and Separability
• PNL are particular NL mixtures
• Linear channels, sensors with NL distortions
• PNL are separable NL mixtures,
– If at most, one source is Gaussian, if the mixing matrix has at least 2 non zero entries per row and per column, and if the functions fi are invertible, then outputs are independent iff giofi is linear and BA=DP
Observations Unknown
sources
Mixing matrix A
f1(.) f2(.)
g1(.) g2(.)
Separation matrix B
Estimated sources
S X Y
Source separation in PNL mixtures Criterion
• The mutual information writes:
• Estimation equations are
– A set for linear part, – A set for NL part.
B E
E Y Y
det ln )
, ( ln
) ( )
(
det ln det
ln )
( )
(
) ( )
( )
(
∂ −
− ∂
−
=
−
−
−
=
−
=
∏
∑
∑
∑
i
i i i
i i
i
B G
i
i i
i
e e E g
H Y
H
J E
J E
H Y
H
H Y
H I
θ ) 0
( =
Θ
∂
∂I Y
Source separation in PNL mixtures Estimation equations
• Linear part estimation equations are:
• Nonlinear part estimation equations are:
• Both sets of equations require pdf or score function estimates for optimality.
[ ]
≠
=
⇔ =
= E y y i j
y y I E
E
j i Y
i i T Y
Y
i i
0 )
(
1 )
) (
( ψ
ψ y y ψ
BZ Y =
=
=
=
∑
and ) ( with
1 ,
) (
) (
i j j
j Z
i
j i Y ij
E g Z
n , j Z
Z Y b
E ψ i ψ j L
Source separation in PNL mixtures Algorithm
• The algorithm is based on the estimation of 3 parts:
– marginal score functions of estimated sources, – estimation of the nonlinear functions gi,
– estimation of the separation matrix B,
Estimated sources Y Unknown
sources Observations
f1(.) f2(.)
g1(.) g2(.)
X
Mixing matrix A
Separation matrix B
Score function estimation
S
Parametric (or not) estimation algorithm
Blind inversion of Wiener systems The model, classical approaches
• Wiener system
• Hammerstein system
• Wiener system is a usual NL model in biology, in satelite communications, etc.
• Usually, input signal is assumed to be iid Gaussian
• Classical identification methods for nonlinear systems are based on higher- order cross-correlations
• If the distortion input is available, the compensation of the nonlinearities is almost straighforward, after identification of the NL
s(t) h f
e(t)
ω
g
e(t) x(t) y(t)
Blind inversion of Wiener systems Wiener and PNL (1/2)
• With the following parameterization:
since the scalar input s(t) is iid, S(t) has independent components and consequently E(t) is a mixture of independent sources, the Wiener system is nothing but an infinite dimensionnal PNL mixture:
( ) ( ) ( )
( ) ( ) ( )
−
−
+
= −
...
...
...
...
...
...
1 2
...
...
1 1
...
...
...
...
...
...
k h k
h k
h
k h k
h k
H h
[ s t k s t k s t k ]T
t) L, ( 1), ( ), ( 1),L
( = − + − − −
S
[ e t k e t k e t k ]T
t) L, ( 1), ( ), ( 1),L
( = − + − − −
E
Blind inversion of Wiener systems Wiener and PNL (2/2)
• If the Wiener system satisfies:
– Subsystems h and f are unknown and invertible ; h can be a nonminimum phase filter
– The input s(t) is an unknown (a priori) non Gaussian iid process
it is equivalent to PNL mixtures, with a particular Toeplitz mixture matrix H, with same NL function f on each channel
• PNL separability implies Wiener systems inversibility
• PNL separability is only proved for finite dimensions. It is conjectured for infinite dimensions
• Practically, the filter h(k) and its inverse w(k) are truncated
Blind inversion of Wiener systems iid Criterion
• Output of the inversion (Hammerstein) structure:
• Y(t) is spatially independent = the sequence {y(t)} is iid
• The mutual information for infinite dimensional stationary random vectors is defined from entropy rates (Cover, Thomas, John Wiley & Sons, 1991)
• I(Y) is always positive and vanishes if and only if y(t) is an iid
( )
t g( )
e( )
tx =
( ) H(y( )T y( )T )
Y T
H T ,...,
1 2
lim 1 −
= +
+∞
→
y t( ) = ∗w x t( )
( ) ( )
− −
= +
∑
−
∞ =
→
T
T
T t H y t H y T y T
Y T
I ( ) ( ), , ( )
1 2
lim 1 )
( L
Blind inversion of Wiener systems Estimation equations
• Estimation equations are then
• I(Y) must be derived with respect to parameters of the linear part: w, and with respect to the nonlinear function: g
• Linear part. Consider a small relative variation of w, in terms of a convolution by a small filter ε. The first order variation of I(Y):
with with , it leads to
• With a Gaussian signal, reduces to second order statistics...
vector parameter
the is with ,
) 0
( θ
θ =
∂
∂I Y
( )=−
[ (
γ ,ψ ( ) +δ)
*ε]
( )0∆ y y
Y y
I
{
γ ( ) δ}
µ
ε = yψ y +
, y ω ←ω + µ
{
γ y,ψ ( )y +δ}
*ωy
( )( )
[
(τ )ψ ( ( )τ )]
γ yψ y t E y t y y
y = −
,
( )
γ ψ
Summary: BSS Advantages
• The method does not require reference signals.
• Very weak priors about the sources
– sources can be similar (noise with same properties, déterministic signals, etc.)
– at most one Gaussian source
• Algorithms can be easily implemented (VLSI, DSP, etc.)
• Tracking is possible for mobile sources, or varying mixtures
• Can be used for any kind of sensors
Summary: BSS Restrictions
• As many sensors that sources
• Sources are retrieved up to indeterminacies: scale and permutation (for linear instantaneous mixtures)
• The separating model must be suited to the mixing model
Prior information (1/3)
Sparcity : discrete source
A
0 if s2 =
+
=
+
=
2 22 1
21 2
2 12 1
11 1
s a s
a x
s a s
a x
=
=
1 21 2
1 11 1
s a x
s a x
Prior information (2/3)
Non stationary sources
– silence : temporal sparcity (e.g. : speech, ECG, etc.)
A
0 if s2 =
+
=
+
=
2 22 1
21 2
2 12 1
11 1
s a s
a x
s a s
a x
=
=
1 21 2
1 11 1
s a x
s a x
Prior information(3/3)
Non stationary source with limited spectrum
– silence : temporal sparcity (e.g. : speech) – limited spectrum : frequential sparcity
– time-frequency sparcity (Rosca et al. ; Abrard and Deville ; ICA’2001) f
+
=
+
=
) , ( )
, ( )
, (
) , ( )
, ( )
,
( 11 1 12 2
1
t s
a t
s a t
x
t s
a t
s a t
x
ω ω
ω
ω ω
ω if s2(ω0,t0) = 0
=
= ( , ) )
,
( 0 0 11 1 0 0
1 t a s t
x
ω ω
ω ω
t
Applications
• Biomedical signals : EEG, ECG, MEG, RMIf
– non invasive techniques, localization, artifact compensation
• Communications et antenna processing
– sonar, radar, mobile phones,
• Monitoring
• Sparse images coding
• Classification
• Smart sensor design
A few examples
1. Dam monitoring
• The dam wall moves according to the water level, the temperature, etc.
• Simple pendular are hung on the wall, at different locations: the pendular deviations measure the wall motions, with different sensitivities for the water level or the temperature, etc.
G. d’Urso et al.,Modélisation des déplacements de barrages, GRETSI 1997,
A few examples
2. Fetal ECG Extraction
source signals estimated sources
L. De Lathauwer, B. De Moor, J. Vanderwalle, Fetal electrocardiogram extraction by blind subspace separation, IEEE Trans. on BME, 47(5):567-572, 2000
A few examples
3. Artefacts removal in MEG (1/2)
A few examples
3. Artefacts removal in MEG (2/2)
A few examples
4. Smart sensor arrays (1/2)
• Smart sensor arrays based on low cost sensors
– unknown source number: it must be estimated,
– sensore are very closes (200 µm): very weak spatial diversity
(Paraschiv-Ionescu, Jutten, Bouvier, IEEE Sensors Journal, Dec. 2002)
programmable current sources
A A
S/H Mux Interface electronics
biasing amplifier gain ranging, addressing, enable H1
H6 B1
B2
Bm
Hall-type silicon sensor array unknown
sensor environment
H2 VH(t)
DSP card
Program code:
•Source number estimation
•Source separation
ADC DSP DAC
RAM
A few examples
4. Smart sensor arrays (2/2)
A few examples
5. Multi-user access control
Y. Deville, L. Andry, Application of blind source separation techniques for multi-tag contactless identification systems (NOLTA 95; IEICE Trans. On Fundamentals of Electronics, 1996;
French patent Sept. 1995 subsequently extended)
6. Smart chemical sensor array
(G. Bedoya, S. Bermijo, J. Cabestany, et al. IST Sewing project)
ID for the conventional MOSFET is:
ID = α[(VG –VT)-0.5VD]VD (1) where α = µCoW VD/L
MOSFET VG ground-to-gate metal potential
ISFET VG* ground-to-membrane potential
S D
Vref
Eref
V(Vref-elec) Nernst Potential
ISFET modeling
Metal - semiconductor work function
VT = Φms + 2 ΦF - Qss+QB Co
VT*= Φcs + 2 ΦF - Qss+QB +Eref – Eo Co
Chemical membrane - semiconductor work function
VG* = Vref + RT ln (ai + Kij aj zi/zj )
nF Empirical Nikolski-Eisenmann
equation