Space alternating variational bayesian learning for LMMSE filtering

(1)

Space Alternating Variational Bayesian Learning for LMMSE Filtering

Christo Kurisummoottil Thomas Dirk Slock

EURECOM, Sophia-Antipolis, France, Email:{kurisumm,slock}@eurecom.fr

Abstract—In this paper, we address the fundamental problem of sparse signal recovery for temporally correlated multiple measurement vectors (MMV) in a Bayesian framework. The temporal correlation of the sparse vector is modeled using a first order autoregressive process. In the case of time varying sparse signals, conventional tracking methods like Kalman filtering fail to exploit the sparsity of the underlying signal. Moreover, the computational complexity associated with sparse Bayesian learning (SBL) renders it infeasible even for moderately large datasets. To address this issue, we utilize variational approximation technique (which allows to obtain analytical approximations to the posterior distributions of interest even when exact inference of these distributions is intractable) to propose a novel fast algorithm called space alternating variational estimation with Kalman filtering (SAVE-KF). Similarly as for SAGE (space- alternating generalized expectation maximization) compared to EM, the component-wise approach of VB appears to allow to avoid a lot of bad local optima, explaining the better performance, apart from lower complexity. Simulation results also show that the proposed algorithm has a faster convergence rate and achieves lower mean square error (MSE) than other state of the art fast SBL methods for temporally correlated measurement vectors.

Keywords— Sparse Bayesian Learning, Variational Bayes, Kalman Filtering

I. INTRODUCTION

Sparse signal reconstruction and compressed sensing (CS) has received significant attraction in the recent years. The compressed sensing problem can be formulated as,

yt=Atxt+wt, (1) where y_t is the observations or data at time t, A_t is called the measurement or the sensing matrix which is known and is of dimensionN×M withN < M,x_tis theM-dimensional sparse signal and w_t is the additive noise. x_t contains only K non-zero entries, with K << M and is modeled by an AR(1) (auto-regressive) process. wtis assumed to be a white gaussian noise, wt ∼ N(0, γ⁻¹I). Most of the MMV SBL algorithms are obtained by straightforward extension of the algorithms in the single measurement vector case (SMV). In SMV, to address this problem, a variety of algorithms such as the orthogonal matching pursuit [1], the basis pursuit method [2] and the iterative re-weighted l1 and l2 algorithms [3]

exist in the literature. In a Bayesian setting, the aim is to calculate the posterior distribution of the parameters given some observations (data) and some a priori knowledge. Sparse Bayesian learning algorithm was first introduced by [4] and then proposed for the first time for sparse signal recovery by

[5]. Performance can be further improved by exploiting the temporal correlation across the sparse vectors [6], [7]. How- ever, most of these algorithms do offline or batch processing.

Nevertheless the complexity of these solutions doesn’t scale with the problem size. In order to render low complexity or low latency solutions, online processing algorithms (which processes small set of measurement vectors at any time) will be necessary.

In conventional CS, the time invariant sparse signal is estimated using less measurements than the size of the signal.

On the other hand, in sparse adaptive estimation, a time varying signal xt is estimated time-recursively by exploiting the sparsity property of the signal. Conventional adaptive filtering methods such as LMS or recursive least squares (RLS) doesn’t exploit the underlying sparseness in the signal xt to improve the estimation performance. Kalman filter focus on estimation of the dynamical state from noisy observations where the dynamic and measurement process are considered to be from linear Gaussian state space model. However, classical Kalman filtering assume complete knowledge of the apriori information about the model parameters and noise statistics.

In sparse Bayesian learning, the sparse signalxtis modeled using a prior distributionp(x_t/α), whereα = [α₁ ... α_M]^T is the vector parameter and α_i is the inverse of the variance of x_t,i, also interpreted as the precision variable. Since most of the elements ofx_tare zero, most of theα_ishould be very high favoring solutions with few non-zero components.

In Type II maximum likelihood method or evidence proce- dure [8], an estimate of the parametersα, γ and sparse signal x_t is done iteratively using evidence maximization. In [9], the authors propose a Fast Marginalized Maximum Likelihood (FMML) by alternating maximization of the hyperparameters αi.

SBL involves a matrix inversion step at each iteration, which makes it a computationally complex algorithm even for moderately large datasets. An alternative approach to SBL is using variational approximation for Bayesian inference [10]–

[13]. Variational Bayesian (VB) inference tries to find an approximation of the posterior distribution which maximizes the variational lower bound onlog p(yt). We propose a space alternating variational estimation based technique for single measurement vectors in [14].

A. Contributions of this paper In this paper:

(2)

• We propose a novel Space Alternating Variational Estima- tion (SAVE) based SBL technique for LMMSE filtering called SAVE-KF. The proposed solution is for a multiple measurement case with an AR(1) process for the temporal correlation of the sparse signal. The update and prediction stages of the proposed algorithm reveals links to the Kalman filtering.

• Numerical results suggest that our proposed solution has a faster convergence rate (and hence lower complexity) than (even) the existing fast SBL and performs better than the existing fast SBL algorithms in terms of reconstruction error in the presence of noise.

In the following, boldface lower-case and upper-case charac- ters denote vectors and matrices respectively. the operators tr(·), (·)^T represents trace,and transpose respectively. The operator(·)^H represents the conjugate transpose or conjugate for a matrix or a scalar respectively. A complex Gaussian random vector with mean µ and covariance matrix Θ is distributed asx∼ CN(µ,Θ).diag(·)represents the diagonal matrix created by elements of a row or column vector. The operator < x > or E(·) represents the expectation ofx. ||·| | represents the Frobenius norm. ∠(·) represents the angle or phase of a complex number. <{(·)} represents the real part of (·). All the variables are complex here unless specified otherwise.

II. STATESPACEMODEL

Sparse signal xt is modeled using an AR(1) process with correlation coefficient matrix F, with F diagonal. The state space model can be written as follows,

xt=Fx_t−1+vt, State Update,

yt=Atxt+wt, Observation, (2) wherex_t= [x_t,1, ..., x_t,M]^T. MatricesFandΓare defined as,

F=







f₁ 0 . . . 0

0 f2 0

... ... . .. ... 0 0 . . . fM





 ,Γ=







√1

α1 . . . 0

0 0

... . .. ... 0 . . . ^√_α¹

M





 ,

(3) Herea_irepresents the correlation coefficient andα_irepresents the inverse variance of x_t,i ∼ CN(0,_α¹

i). Further, v_t ∼ CN(0,Γ(I−FF^H))andwt∼ CN(0,_γ¹I).vtare the complex Gaussian mutually uncorrelated innovation sequences. wt is independent of the innovation process v_t. Further we define, Λ=Γ(I−FF^H) = diag(λ₁, ..., λ_M).

Although the above signal model seems simple, there are numerous applications:

• Bayesian adaptive filtering [15], [16].

• Wireless channel estimation: multi-path parameter estimation as in [17], [18]. In this case, xt = FIR filter response, and Γrepresents e.g. the power delay profile.

III. VB-SBL

In Bayesian compressive sensing, a two-layer hierarchical prior is assumed for the xas in [4]. The hierarchical prior is

such that it encourages the sparsity property of xt or of the innovation sequencesvt.

p(x_t/Γ) =

M

Y

i=1

p(x_t,i/α_i) =

M

Y

i=1

N(0, α⁻¹_i ),

p(xt/x_t−1,F,Γ) =

M

Y

i=1

p(xt,i/x_t−1,i, αi, ai) =

M

Y

i=1

N(a_ix_t−1,i, 1 αi

).

(4)

Further a Gamma prior is considered overΓ, p(Γ) =

M

Y

i=1

p(αi/a, b) =

M

Y

i=1

Γ⁻¹(a)b^aα^a−1_i e^−bαⁱ. (5) The inverse of noise variance γ is also assumed to have a Gamma prior,

p(γ) = Γ⁻¹(c)d^cγ_i^c−1e^−dγ. (6) Now the likelihood distribution can be written as,

p(y_t/x_t, γ) = (2π)^−Nγ^Ne^−γ||y^t^−A^t^x^t^||

2

2 . (7)

A. Variational Bayes

The computation of the posterior distribution of the parameters is usually intractable. In order to address this issue, in variational Bayesian framework, the posterior distribution p(x_t,Γ, γ/y_1:t)is approximated by a variational distribution q(xt,Γ, γ)that has the factorized form:

q(xt,Γ, γ) = qγ(γ)

M

Y

i=1

qx_t,i(xt,i)

M

Y

i=1

qα_i(αi), (8) where y1:t represents the observations till the time t (y1, ...,yt), similarly we define x1:t. Variational Bayes compute the factorsqby minimizing the Kullback-Leibler distance between the true posterior distributionp(xt,Γ, γ/y1:t)and the q(x,Γ, γ). From [10],

KLD_{V B} = KL(p(x_t,Γ, γ/y_1:t)||q(x_t,Γ, γ)) (9) The KL divergence minimization is equivalent to maximizing the evidence lower bound (ELBO) [11]. To elaborate on this, we can write the marginal probability of the observed data as,

lnp(yt/y_1:t−1) =L(q) +KLDV B, where, L(q) =R

q(x_t,θ) ln^p(y^t^,x^t_q(θ)^,θ/y^1:t−1⁾dx_tdθ, KLDV B=−R

q(xt,θ) ln^p(x_q(x^t^,θ/y^1:t⁾

t,θ) dxtdθ,

(10)

where θ = {Γ, γ} and θi represents each scalar in θ.

Since KLD_{V B} ≥ 0, it implies that L(q) is a lower bound onlnp(y_t/y_1:t−1). Moreover,lnp(y_t/y_1:t−1)is independent of q(x_t,θ) and therefore maximizing L(q) is equivalent to minimizing KLD_{V B}. This is called as ELBO maximization and doing this in an alternating fashion for each variable in xt,θ leads to,

ln(qi(θi)) =<lnp(yt,xt,θ/y_1:t−1)>θ_i,x_t +ci, ln(qi(xt,i)) =<lnp(yt,xt,θ/y1:t−1)>θ,x_t,i+ci, p(yt,xt,θ/y_1:t−1) =p(yt/xt, γ,y_1:t−1)

p(xt/Λ,y_1:t−1)p(Λ)p(γ).

(11)

(3)

Here <>_k6=i represents the expectation operator over the distributions qk for all k 6= i. x_t,i represents xt without xi

andθ_i representsθ withoutθi.

IV. SAVE SPARSEBAYESIANLEARNING ANDKALMAN

FILTERING

In this section, we propose a Space Alternating Variational Estimation (SAVE) based alternating optimization between each elements of x_t and γ. For SAVE, not any particular structure of A_t is assumed, in contrast to AMP which performs poorly whenAt is not i.i.d or sub-Gaussian. The joint distribution w.r.t the observation of (2) can be written as,

p(yt,xt,θ/y1:t−1) =p(yt/xt,θ)p(xt,θ/y1:t−1), (12) where the predictive distribution p(xt,θ/y_1:t−1) can be assumed to gaussian distribution with meanxb_t|t−1and diagonal error covariance Pb_t|t−1, CN(bx_t|t−1,Pb_t|t−1). Each diagonal entry of Pb_t|t−1 is denoted as σ_t,k|t−1² . xb_t|t−1 is denoted as the prediction (estimation of xtfrom y_1:t−1).

lnp(yt,xt,θ/y_1:t−1) =Nlnγ−γ||yt−Atxt| |²+

−Mdet(bP_t|t−1)− xt−xb_t|t−1T

Pb⁻¹_t|t−1 xt−xb_t|t−1 +(c−1) lnγ+clnd−dγ+constants,

(13) In the following, c_x_t,k, c⁰_x

t,k, c_α_k, c_λ_k and c_γ represents normalization constants for the respective pdfs.

A. Prediction Stage

In this stage, we compute the prediction about xt at time t−1,bx_t,k|t−1. In the prediction step of Kalman, like in [19]

and based on the Chapman-Kolmogorov formula:

p(xt,Γ, γ|y1:t−1) =

Rp(x_t|xt−1,Γ, γ,y_1:t−1)p(x_t−1,Γ, γ|y1:t−1)dx_t−1 (14) Using variational approximation to the posterior

p(x_t−1,Γ, γ|y_1:t−1) ∼q(x_t−1|y_1:t−1)q(Γ, γ|y_1:t−1), the ex- pression above results in

p(x_t,Γ, γ|y_1:t−1)∼q(Γ, γ|y_1:t−1) Rp(x_t|x_t−1,Γ,y_1:t−1)q(x_t−1|y_1:t−1)dx_t−1

∼q(Γ, γ|y_1:t−1)q(xt|y_1:t−1)

(15) Using VB we approximate q(xt|y_1:t−1)like the following,

lnq(xt|y1:t−1)∼ E_q(Γ,γ|y_1:t−1)

ln

R p(x_t|x_t−1,Γ, γ,y_1:t−1)q(x_t−1|y_1:t−1)dx_t−1 (16) Defining fb_k|t−1 as the mean of the approximate posterior for fk and the variance asσ_a²_k|t−1 at time instantt−1. From the time update equation of the standard Kalman filter,

xt,k=akxb_{t−1,k|t−1}+vk,

h ¹

|ak|²σ²_{t−1,k|t−1}+ ¹

λk

(|xt,k|²−x^H_t,kakxb_{t−1,k|t−1}− a^H_kxb_{t−1,k|t−1}xt,k+|ak|²|bx_{t−1,k|t−1}|²)i

= _σ2 ¹ t,k|t−1

|xt,k−bx_t,k|t−1|²

(17)

To compute the term h 1 ^f^k

λk+|fk|²σ²_{t−1,k|t−1}i, we write fk = fb_k|t−1+ea_k|t−1, whereea_k|t−1 is a complex gaussian random variable with varianceσ²_f

k|t−1,

f_k

1

λk+|fk|²σ_{t−1,k|t−1}² +¹

λk

=

fbk|t−1+eak|t−1

(|ak|t−1|²+fbk|t−1ea^H_k|t−1+fb_k|t−1^H eak|t−1+|eak|t−1|²)σ²_{t−1,k|t−1}+_λk¹ ,

= ^f^b^k|t−1⁺^e^a^k|t−1

(|ak|t−1|²σ²_{t−1,k|t−1}+ ¹

λk)(1+⁽

fk|tb aHek|t+f Hbk|tak|te )σ2 t−1,k|t−1

|ak|t−1|2σ2

t−1,k|t−1+ 1 λk

,

=

fbk|t−1(1− ^σ

2t−1,k|t−1σ2 fk|t−1

<|ak|t−1|2>σ2

t−1,k|t−1+ 1

<λk>

)

<|ak|t−1|²>σ²_{t−1,k|t−1}+_λk¹ ,

=_<|a ^f^b^k|t−1

k|t−1|²>σ²_{t−1,k|t−1}+_<λk>¹ +σ²_{t−1,k|t−1}σ²

fk|t−1

(18) Finally we obtain,

σ²_t,k|t−1= (|fb_k|t−1|²+σ_a²_k|t−1)σ_{t−1,k|t−1}² + ¹

bλk|t−1,

xb_t,k|t−1=fb_k|t−1bx_{t−1,k|t−1} (19)

B. Measurement or Update Stage

Update ofqx_t,k(xt,k):Using (11),lnqx_t,k(xt,k)turns out to be quadratic inx_t,kand thus can be represented as a Gaussian distribution as follows,

lnq_x_t,k(x_t,k) =−< γ >n

(y_t−A_t,k <x_t,k>)^HA_t,kx_t,k− x^H_t,kA^H_t,k(y_t−A_t,k<x_t,k >) + ||A_t,k| |²|x_t,k|²o

−

1 2σ²_t,k|t−1

|xt,k|²−x^H_t,kxb_t,k|t−1−xt,kbx^H_t,k|t−1

+cx_t,k =

−_σ2¹ t,k|t

xt,k − bx_t,k|t

2+c⁰_x_t,k.

(20) Note that we split Atxt as, Atxt = At,kxt,k + A_t,kx_t,k, whereAt,k represents thek^thcolumn ofAt,A_t,k represents the matrix with k^th column of At removed. Clearly, the mean and the variance of the resulting Gaussian distribution becomes,

σ²_t,k|t = ¹

<γ>||At,k||²+ ¹

σ2 t,k|t−1

,

xb_t,k|t = σ_t,k|t²

A^H_t,k

yt − A_t,k<x_t,k >

< γ >+^b^x_σ^t,k|t−12 t,k|t−1

, (21) wherexb_t,k|trepresents the point estimate ofxt,k. One remark is that forcing a Gaussian posteriorqwith diagonal covariance matrix on the original Kalman measurement equations gives the same result as SAVE.

Update of qγ(γ): Similarly, the Gamma distribution from the variational Bayesian approximation for the qγ(γ) can be written as,

lnqγ(γ) =

(c−1 +N) lnγ − γ <||yt −Atxt| |²> +d + cγ, qγ(γ) ∝ γ^c+N−1e^−γ(<||yt−A_tx_t||²>+d).

(22)

(4)

The mean of the Gamma distribution for γ is given by,

< γ >= bγt= _(ξ^c+^N²

t+d),

ξ_t=βξ_t−1+ (1−β)<||yt − A_tx_t| |²>, where,

<||yt − A_tx_t| |²>=

||yt| |² −2<(y_t^HAtxb_t|t) + tr

(23) where we introduced temporal averaging also and β denotes the weighting coefficients which are less than one.

C. Fixed Lag Smoothing

For fixed lag smoothing with delay ∆>0, we rewrite the state space model as follows,

yt=AtF^∆x_t−∆+

∆−1

X

i=0

AtFⁱv_t−i+wt

| {z }

wet

,

p(yt,x_t−∆,θ/y_1:t−1) =p(yt/x_t−∆,θ)p(x_t−∆,θ/y_1:t−1), (24) where wet ∼ CN(0,Ret), Ret = At(I− |F|^2∆)Γ)A^H_t +_γ¹I.

The posterior distributionp(x_t−∆,θ/y_1:t−1)is approximated using variational approximation as q(x_t−∆,θ/y_1:t−1) with mean and covariance as bx_{t−∆|t−1} andΣ_{t−∆|t−1}.

lnp(y_t,x_t−∆,θ/y_1:t−1) =−¹₂ln detRe_t− (y_t−A_tF^∆x_t−∆)^HRe⁻¹_t (y_t−A_tF^∆x_t−∆)

−¹₂det(Σ_{t−∆|t−1})−

x_t−∆−bx_{t−∆|t−1}^H

Σ⁻¹_{t−∆|t−1} x_t−∆−xb_{t−∆|t−1}

+c_x−∆. (25) Here |F|represents the matrix obtained by the amplitudes of the entries in Fandc_x−∆represents normalization constants.

Update of x_t−∆:Using (11),lnq_x_t−∆(x_t−∆/y_1:t) turns out to be quadratic in x_t−∆ and thus can be represented as a Gaussian distribution with mean and covariance asbx_t−∆|tand Σ_t−∆|t respectively,

Σ_t−∆|t= (F^∆^HA^H_t <Re⁻¹_t >AtF^∆+Σ⁻¹_{t−∆|t−1})⁻¹, bx_t−∆|t=

Σ_t−∆|t(Σ⁻¹_{t−∆|t−1}bx_{t−∆|t−1}+F^∆^HA^H_t <Re⁻¹_t >yt).

(26) D. VB for AR(1) Parameters

In this section VB is used to learn the unknown correlation coefficient fk andλk. The goal is to maximize the posterior p(fk/xt,y1:t) and p(λk/xt,y1:t). We denote fb_k|t−1 as the mean or the point estimate of fk at time instant t−1. The update of correlation coefficient at t can be written as, From the state space model,x_t,k=f_kx_t−1,k+^√¹

λ_kv_t,k, lnp(x_t,x_t−1Γ, f_k,y_t/y_1:t−1) = lnλ_k−

λk|xt,k−fkx_t−1,k|²+ ((a−1) lnλk+alnb−bλk) +“constants”,

(27)

where “constants” denote terms independent ofλk andfk. Update of qλ_k(λk): Using variational approximation lnq(λk|y1:t)∼ E_q(x_t_,x_t−1_,γ,f_k_/y_1:t₎lnp(xt,Γ, γ, fky1:t),

lnλk−λk(<|xt,k−fkx_t−1,k|²>+b) + (a−1) lnλk

+cλ_k,

qλ_k(λk)∝λ^a_ke^−λ^k(^<|xt,k−fkxt−1,k|²>+b)

(28) The mean and variance of the resulting gamma distribution can be written as,

< λk >=_(<|x ^(a+1)

t,k−f_kxt−1,k|²>+b),

< βk >= ^(a+1)

(<|xt,k−fkxt−1,k|²>+b)²

(29) Update of q_f_k(f_k): Using variational approximation lnq(f_k|y_1:t)∼ Eq(x_t,xt−1,γ/y_1:t)lnp(x_t,x_t−1,Γ, γ,y_1:t),

qf_k(fk)∝e^−<|x^t−1,k^|²^>|f^k^−<x^t,k^><x^H^t−1,k^>| (30) Finally we write the mean and variance as,

σ_f²

k|t= ¹

bx²_t−1,k|t+σ²_t−1,k|t, fb_k|t=σ²_f

k|txb^H_t−1,k|txb_t,k|t= ^x^b

H

t−1,k|tbx_t,k|t

bx²_t−1,k|t+σ²_t−1,k|t. (31) Algorithm 1: The Adaptive SAVE-KF Algorithm Update Stage

σ_t,k|t² = σ_t,k|t−1² (σ_t,k|t−1² bγt−1||At,k| |² + 1)⁻¹, Kt,k =σ_t,k|t² A^H_t,kbγt−1,

bx_t,k|t = ^σ

2 t,k|t

σ²_t,k|t−1bx_t,k|t−1+Kt,k

yt −A_t,kbx_t,k|t ,

bγt = _(ξ^c+^N²

t+d),

η_k|t=<{fb_k|t−1^H bx_t,k|tbx_{t−1,k|t−1}}, bλ_k|t=

a+1

(|bx_t,k|t|²+σ²_t,k|t+|bf_k|t−1|²(|bx_{t−1,k|t−1}|²+|bσ²_{t−1,k|t−1}|²)−2η_k|t+b). Prediction Stage

σ_t+1,k|t² = (|fb_k|t|²+σ_a²

k|t)σ²_t,k|t+ ¹

λb_k|t, bx_t+1,k|t=fb_k|tbx_t,k|t,

Estimation of AR(1) Parameters σ_f²

k|t= ¹

xb²_t−1,k|t+σ²_t−1,k|t, fb_k|t=σ²_f

k|txb^H_t−1,k|txb_t,k|t= ^x^b

H

t−1,k|txb_t,k|t xb²_t−1,k|t+σ²_t−1,k|t.

(32) E. Computational Complexity

For our proposed SAVE, it is evident that we don’t need any matrix inversions compared to [4]. Our computational complexity is similar to [12]. Update of all the variablext,Γ, γ involves simple addition and multiplication operations. We in- troduce the following variables,q=y^H_t AtandB=A^H_t At. q,Band||yt| |²can be precomputed, so only computed once.

V. SIMULATIONRESULTS

In this section we present the simulation results to validate the performance of our SAVE-KF SBL algorithm (Algorithm 1) compared to the state of the art solutions. We consider the special case of uncorrelated measurements a_i = 0,∀i, thus

(5)

reducing it to the single measurement vector case. We compare our algorithm with the Fast Inverse-Free SBL (Fast IF SBL) in [12], the G-AMP based SBL in [20] and the fast version of SBL (FV SBL) in [13]. For the simulations, we have fixed M = 200 andK = 30. All the elements of At andxt are generated i.i.d from a normal distribution, N(0,1). SNR is fixed to be 20dB in the simulation.

A. MSE Performance

40 50 60 70 80 90 100 110 120 130 140 150

N, No of observations -45

-40 -35 -30 -25 -20 -15 -10 -5

NMSE

Proposed SAVE SBL FAST IF-SBL GAMP SBL FV SBL

Fig. 1. NMSE vs the number of observations.

From Figure 1, it is evident that proposed SAVE algorithm performs better than the state of the art solutions in terms of the Normalized Mean Square Error (NMSE). NMSE is defined asN M SE= _M¹ ||bx−x| |²,bxrepresents the estimated value, N M SEdB = 10 log 10(N M SE).

B. Complexity

40 60 80 100 120 140 160

N, No of Observations 10^-4

10^-3 10^-2

Execution Time (in sec)

Proposed SAVE SBL FAST IF SBL GAMP SBL FV SBL

Fig. 2. Number of iterations vs the number of observations.

Since the proposed SAVE have similar computational re- quirements compared to [12], [20] , we plot the execution time in matlab for all the algorithms. It is clear from Figure 2 that proposed SAVE approach has lower exection time and thus faster convergence compared to the existing fast SBL algorithm. However, GAMP based [20] and IF-SBL [12]

has a better convergence rate compared to ours but with a degradation in NMSE performance.

VI. CONCLUSION

We presented a fast SBL algorithm called SAVE-KF, which uses the variational inference techniques to approximate the posteriors of the data and parameters and track a time varying sparse signal. SAVE-KF helps to circumvent the matrix inversion operation required in conventional SBL using EM

algorithm. We showed that the proposed algorithm has a faster execution time or convergence rate and better performance in terms of NMSE than even the state of the art fast SBL solutions. SAVE-KF algorithm exploit the underlying sparsity in the signal compared to the classical Kalman filtering based methods.

REFERENCES

[1] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit ,”IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, December 2007.

[2] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit ,”SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1998.

[3] D. Wipf and S. Nagarajan, “Iterative reweightedl1andl2methods for finding sparse solutions ,”IEEE J. Sel. Topics Sig. Process., vol. 4, no. 2, pp. 317–329, April 2010.

[4] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.

[5] D. P. Wipf and B. D. Rao, “Sparse Bayesian Learning for Basis Selection ,”IEEE Trans. on Sig. Process., vol. 52, no. 8, pp. 2153–2164, August 2004.

[6] Z. Zhang and B. D. Rao, “Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning ,”IEEE J.

of Sel. Topics in Sig. Process., vol. 5, no. 5, pp. 912 – 926, September 2011.

[7] R. Prasad, C. Murthy, and B. Rao, “Joint approximately sparse channel estimation and data detection in OFDM systems using sparse Bayesian learning ,”IEEE Trans. on Sig. Process., vol. 62, no. 14, July 2014.

[8] R. Giri and B. D. Rao, “Type I and type II bayesian methods for sparse signal recovery using scale mixtures,” IEEE Trans. on Sig Process., vol. 64, no. 13, pp. 3418–3428, 2018.

[9] M. E. Tipping and A. C. Faul, “Fast marginal likelihood maximisation for sparse Bayesian models,” inAISTATS, January 2003.

[10] M. J. Beal, “Variational algorithms for approximate bayesian inference,”

inThesis, Univeristy of Cambridge, UK, May 2003.

[11] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,”IEEE Sig. Process. Mag., vol. 29, no. 6, pp. 131–146, November 2008.

[12] H. Duan, L. Yang, and H. Li, “Fast inverse-free sparse bayesian learning via relaxed evidence lower bound maximization,”IEEE Sig. Process.

Letters, vol. 24, no. 6, June 2017.

[13] D. Shutin, T. Buchgraber, S. R. Kulkarni, and H. V. Poor, “Fast variational sparse bayesian learningwith automatic relevance determination for superimposed signals,”IEEE Trans. on Sig. Process, vol. 59, no. 12, December 2011.

[14] C. K. Thomas and D. Slock, “SAVE - space alternating variational estimation for sparse Bayesian learning,” in Data Science Workshop, 2018.

[15] T. Sadiki and D. T. Slock, “Bayesian adaptive filtering: principles and practical approaches,” inEUSIPCO, 2004.

[16] J. B. S. Ciochina, C. Paleologu, “A family of optimized LMS-based algorithms for system identification,” in Proc. EUSIPCO, 2016, pp.

1803–1807.

[17] C. K. Thomas and D. Slock, “Variational bayesian learning for channel estimation and transceiver determination,” in Information Theory and Applications Workshop, San Diego, USA, February 2018.

[18] B. H. Fleury, M. Tschudin, R. Heddergott, D. Dahlhaus, and K. I. Ped- ersen, “Channel Parameter Estimation in Mobile Radio Environments Using the SAGE Algorithm ,” IEEE J. on Sel. Areas in Commun., vol. 17, no. 3, pp. 434–450, March 1999.

[19] S. Sarkka and A. Nummenmaa, “Recursive noise adaptive Kalman filtering by variational bayesian approximations ,” IEEE Transactions on Automatic Control, vol. 54, no. 3, pp. 596–600, March 2009.

[20] M. Al-Shoukairi, P. Schniter, and B. D. Rao, “GAMP-based low complexity sparse bayesian learning algorithm,” IEEE Trans. on Sig.

Process., vol. 66, no. 2, January 2018.