Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Christo Kurisummoottil Thomas, Dirk Slock
Communication Systems Department, EURECOM, Sophia Antipolis, France
R´ eunion du GdR ISIS, Mai 27,
D´ ecompositions Tensorielles, Visio Conf´ erence
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Outline
1 Introduction
2 Combined BP-MF-EP Framework
3 Kronecker Structured Dictionary Learning using BP/VB
4 Numerical Results and Conclusion
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 1 / 17
Time Varying Sparse State Tracking
Sparse signal x
tis modeled using an AR(1) process with diagonal correlation coefficient matrix F .
Define: Λ = diag(λ), F = diag(f).
f
i: correlation coefficient and x
i,t∼ CN (0,
λ1i
). Further, w
t∼ CN (0, Γ
−1= Λ
−1(I − FF
H)) and v
t∼ CN (0, γ
−1I). VB leads to Gaussian SAVE-Kalman Filtering (GS-KF).
Applications: Localization, Adaptive Filtering.
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Sparsification of the Innovation Sequence (Sparse Bayesian Learning-SBL)
Bayesian Compressed Sensing: 2-layer hierarchical prior for x as in
1, inducing sparsity for x .
p(x
i,t|λ
i) = N (x
i,t; 0, λ
−1i), p(λ
i/a, b) = Γ
−1(a)b
aλ
a−1ie
−bλi⇒ sparsifying Student-t marginal
p(x
i,t) = b
aΓ(a +
12)
(2π)
12Γ(a) (b + w
i2/2)
−(a+12)We apply the (Gamma) prior not to the precision of the state x but it’s innovation w , allowing to sparsify at the same time the components of x AND their variation in time (innovation).
1
Tipping’01
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 3 / 17
Tensor Representation (Channel Tracking in MaMIMO OFDM)
Sampling across Doppler space and stacking all the subcarrier and sampled (in Doppler) elements as a vector
vec(H(t)) =
L
X
i=1
x
i,th
t(φ
i) ⊗ h
r(θ
i) ⊗ v
f(τ
i) ⊗ v
t(f
i) = A(t)x
t4-D Tensor model, Delay, Doppler and Tx/Rx spatial dimensions.
Array response itself: Kronecker structure in the case of polarization or in the case of 2D antenna arrays with separable structure
2.
User mobility changes the scattering geometry and path coefficients.
Tensor based KF proposed here avoids the off-grid basis mismatch issues.
2
Qian, Fu, Sidiropoulos, Yang’18
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Kronecker Structured Tensor Models
≈ ,1
1,1 ,1
1 + ... +
Y
. .
,
1,1 ,
= 1
Tensor signals appear in many applications: massive multi-input multi-output (MIMO) radar, massive MIMO (MaMIMO) channel estimation, speech processing, image and video processing.
Exploiting tensorial structure beneficial compared to estimating unstructured dictionary.
From Canonical Polyadic Decompositions (CPD) to Tucker Decompositions (TD).
The signal model for the recovery of a time varying sparse signal under Kronecker structured (KS) dictionary matrix can be formulated as (Tensor representation, Y
t= X
t+ N
t)
Observation: y
t= (A
1(t) ⊗ A
2(t).... ⊗ A
N(t))
| {z }
A(t)
x
t+ v
t, y
t= vec(Y
t) State Update: x
t= Fx
t−1+ w
t,
Y
t∈ C
I1×I2...×INis the observations or data at time t, A
j,i(t) ∈ C
Ij, the factor matrix
A
j(t) = [A
j,1(t), ..., A
j,Pj(t)] and the overall unknown parameters are [[A
1(t), ..., A
N(t); x
t]], x
tis M(=
N
Q
j=1
P
j)-dimensional sparse center tensor and w
t, v
tare the state or measurement noise.
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 5 / 17
Identifiability of Tensor Decomposition
CPD is essentially unique under scaling and permutation indeterminacies, if T = [[A
1(t), ..., A
N(t); x
t]] = [[A
1(t), ..., A
N(t); x
t]]
A
j(t) = A
j(t)Π
jP
j, ∀j.
Π
jbeing P
j× P
jpermutation matrix and P
jis a non-singular diagonal matrix.
To circumvent the scaling indeterminacies, we model the columns of A
j(t) as A
j,i(t) = [1 a
Tj,i]
T. Recent results on local identifiability results for Kronecker structered dictionary matrix (so TD) case
3. They derived sample complexity upper bound for coordinate dictionary identification up to specified errors, under a separable sparsity model for X.
Global identifiability is still an open issue.
3
Shakeri, Sarwate, Bajwa’18
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Our Contribution Here Compared to State of the Art
Extension of our previous work
4on Kronecker Structured (KS) dictionary learning (DL) - called as SAVED-KS DL (Space Alternating Variational Estimation for KS DL) to the case of time varying sparse state vector.
Variational Bayes (VB) at the scalar level i.e mean field (MF) for x
tand at the column level for the factor matrices in the measurement matrix is quite suboptimal since the posterior covariance computed does not converge to the true posterior
5. One potential solution: belief propagation (BP).
Also, better variational free energy (VFE) approximation, e.g. Belief Propagation (BP) or Expectation Propagation (EP) instead of MF.
Inspired by the framework of
6, we combine BP and MF approximations in such a way as to optimize the message passing (MP) framework.
The main focus: reduced complexity (e.g. VB leads to the low complexity) algorithms without sacrificing much on the performance (BP for performance improvement).
Dynamic autoregressive SBL (DAR-SBL): a case of joint Kalman filtering (KF) with a linear time-invariant diagonal state-space model, and parameter estimation =⇒ an instance of nonlinear filtering.
4
Thomas, Slock’19
5
Thomas,Slock’18
6
Riegler, Kirkelund, Manch´ on, Badiu, Fleury’13
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 7 / 17
Outline
1 Introduction
2 Combined BP-MF-EP Framework
3 Kronecker Structured Dictionary Learning using BP/VB
4 Numerical Results and Conclusion
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Variational Free Energy (VFE) Framework
Intractable joint posterior distribution of the parameters θ = {x , A, f, λ, γ}.
Actual posterior: p(θ)=
Z1Y
a∈ABP
f
a(θ
a) Y
b∈AMF
f
b(θ
b)
| {z }
factor nodes
, where A
BP, A
MF= set of variable nodes
belonging to the BP/MF part with A
BP∩ A
MF=∅.
The whole θ is partitioned into the set θ
i, and we want to approximate the true posterior p(θ) by an approximate posterior q(θ) = Q
i
q
i(θ
i). The individual variable portions θ
iare the variable nodes in a factor graph.
N
BP(i), N
MF(i)− the set of neighbouring factor nodes of variable node i which belong to the BP/MF part.
I
MF= S
a∈AMF
N (a), I
BP= S
a∈ABP
N (a). N (a)− the set of neighbouring variable nodes of any factor node a.
The resulting Free Energy (Entropy − Average Energy) obtained by the combination of BP and MF are written as below (let q
i(θ
i) represents the belief about θ
i(the approximate posterior)),
F
BP,MF= P
a∈ABP
P
θa
q
a(θ
a) ln
qfa(θa)a(θa)
− P
b∈AMF
P
xb
Q
i∈N(b)
q
i(θ
i) ln f
b(θ
b)− P
i∈I
(|N
BP(i)| − 1) P
θi
q
i(θ
i) ln q
i(θ
i).
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 8 / 17
Message Passing Expressions
The beliefs have to satisfy the following normalization and marginalization constraints, P
θi
q
i(θ
i) = 1, ∀i ∈ I
MF\ I
BP, P
θa
q
a(θ
a) = 1, ∀a ∈ A
BP, q
i(θ
i) = P
θa\θi
q
a(θ
a), ∀a ∈ A
BP, i ∈ N (a).
The fixed point equations corresponding to the constrained optimization of the approximate VFE can be written as follows
7,
q
i(θ
i) = z
iQ
a∈NBP(i)
m
BPa→i(θ
i) Q
a∈NMF(i)
m
MFa→i(θ
i), = ⇒ Product of incoming beliefs
n
i→a(θ
i) = Q
c∈NBP(i)\a
m
c→i(θ
i) Q
d∈NMF(i)
m
d→i(θ
i), = ⇒ variable to factor nodes
m
MFa→i(θ
i) = exp(< ln f
a(θ
a) >
Qj∈N(a)\i
nj→a(θj)
), m
BPa→i(θ
i) = ( R Q
j∈N(a)\i
n
j→a(θ
j)f
a(θ
a) Q
j6=i
dθ
j), = ⇒ factor to variable nodes
where <>
qrepresents the expectation w.r.t distribution q.
7
Riegler, Kirkelund, Manch´ on, Badiu, Fleury’13
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Gaussian BP-MF-EP KF
Proposed Method: Alternating optimization between non linear KF for the sparse states (plus the hyperparameters) and BP for dictionary learning (DL).
Diagonal AR(1) ( DAR(1) ) Prediction Stage: Since there is no coupling between the scalars in the state update , it is enough to update the prediction stage using MF. However, the interaction between x
l,tand f
lrequires Gaussian projection, using expectation propagation (EP).
More details in
8.
y
n− factor node, x
l− variable node. (l , n) or (n, l) to represent the messages passed from l to n or viceversa. Gaussian messages from y
nto x
lparameterized by mean b x
n,l(t)and variance ν
n,l(t). Measurement Update (Filtering) Stage: For the measurement update stage, the posterior for x
tis inferred using BP. In the measurement stage, the prior for x
l,tgets replaced by the belief from the prediction stage. We define d
l,t=(
N
P
n=1
ν
n,l(t)−1)
−1, r
l,t=d
l,t(
N
P
n=1 bxn,l(t)
ν(t)n,l
+
bxl,t|t−1σl,t|t−12
).
σ
l,t|t−2= λ
l,t+ d
l,t−1, x b
l,t|t=
rl,t1+dl,tσ−2l,t|t
.
8
Thomas, Slock, Asilomar’19
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 10 / 17
Lag-1 Smoothing Stage for Correlation Coefficient f
y
t= A(t)Fx
t−1+ v e
t, where v e
t= A(t)w
t+ v
t, e v
t∼ CN (0, R e
t)
We show in Lemma 1 of
9that KF is not enough to adapt the hyperparameters, instead we need at least a lag 1 smoothing (i.e. the computation of x b
k,t−1|t, σ
k,t−1|t2through BP). For the smoothing stage, we use BP.
Gaussian Posterior for x
t: σ
k,t−1|t−2,(i)= (b f
k|t2+ σ
2fk|t
)A
Hk(t) R e
t−1A
k(t) + σ
−2k,t−1|t−1,
< x
k,t−1|t(i)> = σ
k,t−1|t2,(i)(b f
k|tHA
Hk(t) R e
t−1(y
t− A
k(t)F
k|t< x
(i−1)k,t−1|t
>) +
xbk,t−1|t−1σ2k,t−1|t−1
).
Applying the MF rule, the resulting Gaussian distribution for f has mean, σ
−2fi|t
and variance, b f
i|t, the detailed derivations for which are in our paper
10.
σ
−2fi|t
= (| b x
i,t−1|t|
2+ σ
2i,t−1|t)A
Ti(t) R e
t−1A(t)
i, b f
i|t= σ
2fi|t
x b
i,t−1|tHA
Hi(t) R e
t−1(y
t− A
i(t) F b
i|tx b
i,t−1|t).
R e
t= A(t)Λ
−1A(t)
H+
γ1I.
9
Thomas, Slock, Asilomar’19
10
Thomas, Slock’20
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Outline
1 Introduction
2 Combined BP-MF-EP Framework
3 Kronecker Structured Dictionary Learning using BP/VB
4 Numerical Results and Conclusion
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 11 / 17
Dictionary Learning using Tensor Signal Processing
≈ ,1
1,1 ,1
1 + ... +
Y
. .
,
1,1 ,
=
1
Let Y
i1,...,iNrepresents the i
1i
2...i
Nthelement of the tensor and y = [y
1,1,...,1, y
1,1,...,2...., y
I1,I2,...,IN]
T, then it can be verified that
11,
y
t= (A
1(t) ⊗ A
2(t)... ⊗ A
N(t))x
t+ w
t, w ∼ N (0, γ
−1I),
Matrix Unfolding:Y
(n)= A
n(t)X
(n)(A
N(t ) ⊗ ...A
n+1(t) ⊗ A
n−1(t)... ⊗ A
1(t))
T. A
j(t) is of dimension, I
j× P
jand the resulting Tensor is C
I1×....×IN.
Retaining the Tensor structure in the dictionary matrix leads to better estimates than using the matricized version for A and learning it.
Less free variables to be estimated in the Tensor structured case.
Variational Bayesian Inference using the following approximate posterior q(x, λ, γ, A) = q
γ(γ)
M
Q
i=1
q
xi(x
i)
M
Q
i=1
q
λi(λ
i)
N
Q
j=1 Pj
Q
i=1
q
Aj,i(A
j,i), = ⇒ SAVED-KS DL Or
q(x, λ, γ, A) = q
γ(γ)
M
Q
i=1
q
xi(x
i)
M
Q
i=1
q
λi(λ
i)
N
Q
j=1
q
Aj(A
j) = ⇒ Joint VB for DL
11
Sidiropoulos, Lathauwer, Fu, Huang, Papalexakis, Faloutsos’17
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 12 / 17
Suboptimality of SAVED-KS DL and Joint VB
From the expression for the error covariance in the estimation of the factor A
j,i(SAVED-KS DL in
12) (tr{( N
1k=N,k6=j
< A
Tk(t)A
∗k(t) >) < X
(j)TX
(j)>}I), = ⇒ it does not take into account the estimation error in the other columns of A
j(t). The columns of A
j(t) can be correlated, for e.g. if we consider two paths (say i, j) with same DoA but with different delays, the delay responses v
f(τ
i(t)) and v
f(τ
i(t)) may be correlated.
The joint VB estimates (mean and covariance) can be obtained as M
jT= A b
T1,j
(t) =< γ > Ψ
−1jB
Tj, Ψ
j= (< γ >< X
(j)(
1
N
k=N,k6=j
< A
Tk(t)A
∗k(t) >)X
(j)T>), (1)
where V
j=< X
(j)>< (
1
N
k=N,k6=j
A
k(t))
T> and B
jis defined as with the first row of (Y
(j)V
Tj) removed.
However, the joint VB involves a matrix inversion and is not recommended for large system dimensions.
Nevertheless, it is possible to estimate each columns of A
j(t) by BP, since each column estimate can be expressed as the solution of a linear system of equation from (1), A b
Tj,i(t) = Ψ
−1jb
j,i. b
j,irepresents the i
thcolumn of B
Tj.
12
Thomas, Slock’19
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 13 / 17
Optimal Partitioning of the Measurement Stage and KS DL
Lemma
For the measurement stage, an optimal partitioning is to apply BP for the sparse vector x
tand VB (SAVED-KS) for the columns of the factor matrices A
j,i(t) assuming the vectors A
j,i(t) are independent and have zero mean. However, if the columns of A
j(t) are correlated, then a joint VB, with the posteriors of the factor matrices assumed independent, should be done for an optimal performance.
Proof: Follows from Lemma 1 of
13, where the main message was that if the parameter partitioning in VB is such that the different parameter blocks are decoupled at the level of FIM, then VB is not suboptimal in terms of (mismatched) Cramer-Rao Bound (mCRB).
y
t= (
M
P
r=1
x
r,tF
r)
| {z }
F(xt)
(
N
N
j=1
Φ
j,t)
| {z }
f(Φt)
+w
t. J(Φ
t) = [J(Φ
1,t) ... J(Φ
N,t)]
where, J(Φ
j,t) = F (x
t)(Φ
1,t⊗ ...I
IjPj.... ⊗ Φ
N,t),
FIM=
E(γ)J(Φt)TJ(Φt) 0 0 0
0 E(γ)J(xt)TJ(xt) + E(Γ−1) 0 0
0 0 aE(Γ−2) 0
0 0 0 (N+c−1)E(γ−2)
13
Kalyan, Thomas, Slock’19
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Outline
1 Introduction
2 Combined BP-MF-EP Framework
3 Kronecker Structured Dictionary Learning using BP/VB
4 Numerical Results and Conclusion
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 14 / 17
BP-MF-EP Outperforms SAVED-KS DL
0 2 4 6 8 10 12 14 16 18 20
SNR in dB -60
-50 -40 -30 -20 -10 0 10
NMSE
BP-MF-EP with ALS for DL SAVED-KS with DL BP-MF-EP with BP for DL BP-MF-EP with Joint VB for DL BP-MF-EP with Known Dictionary Matrix
Static SBL: NMSE as a function of N.
ALS- Alternating Least Squares.
Exponential power delay profile for x
t.
30 non zero elements in x
t, same support across all time.
Dimensions: 3-D Tensor (4, 8, 8), with M = 200.
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries
Conclusions and Thank You!
Further advancements from
14: VB with a too fine variable partitioning is quite suboptimal.
Better approximation is message passing based methods such as belief propagation (BP) and expectation propagation (EP)
BP or EP message passing can be implemented using low complexity methods such as AMP/GAMP/VAMP, which are proven to be Bayes optimal under certain conditions on A.
AMP - Approximate message passing. We also derived an Generalized Vector AMP (GVAMP-SBL) SBL version to take care of diagonal power delay profile.
Further work to be done on learning a combination of structured and unstructured Kronecker factor matrices.
14
Thomas, Slock’19
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries GdR ISIS, D´ecompositions Tensorielles, 2020, Christo Kurisummoottil Thomas, Dirk Slock, EURECOM, FRANCE 16 / 17
References (and Questions?)
K. Gopala, C. K. Thomas, D. Slock, “Sparse Bayesian learning for a bilinear calibration model and mismatched CRB,”EUSIPCO, Sept. 2019.
C. Qian, X. Fu, N. D. Sidiropoulos, Y. Yang, “Tensor-based Parameter Estimation of Double Directional Massive MIMO Channel with Dual-Polarized Antennas,”ICASSP, Apr. 2018.
E. Riegler, G. E. Kirkelund, C. N. Manch´on, M. A. Badiu, B. H. Fleury, “Merging Belief Propagation and the Mean Field Approximation: A Free Energy Approach,”IEEE Trans. on Info. Theo., Jan. 2013.
Z. Shakeri, A. D. Sarwate, W. U. Bajwa, “Identifiability of Kronecker-Structured Dictionaries for Tensor Data,”IEEE Journ. Sel. Top. in Sig.
Process., Oct. 2018.
C. K. Thomas, D. Slock, “Save - Space alternating variational estimation for sparse Bayesian learning,”IEEE Data Sci. Wkshp, June. 2018.
C. K. Thomas, D. Slock, “Space alternating variational estimation and kronecker structured dictionary learning,”ICASSP, May. 2019.
C. K. Thomas and D. Slock, “Low Complexity Static and Dynamic Sparse Bayesian Learning Combining BP, VB and EP Message Passing,” in IEEE Asilomar, Nov. 2019.
C. K. Thomas, D. Slock, “BP-VB-EP based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries,”ICASSP, May. 2020.
M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine ,”Journ. Mach. Learn. Rese., Jun. 2001.
Approximate Inference based Static and Dynamic Sparse Bayesian Learning with Kronecker Structured Dictionaries