Save - Space alternating variational estimation for sparse Bayesian learning

(1)

SAVE - SPACE ALTERNATING VARIATIONAL ESTIMATION FOR SPARSE BAYESIAN LEARNING

Christo Kurisummoottil Thomas, Dirk Slock

EURECOM, Sophia-Antipolis, France, Email:{kurisumm,slock}@eurecom.fr

ABSTRACT

In this paper, we address the fundamental problem of sparse signal recovery in a Bayesian framework. The computational complexity associated with Sparse Bayesian Learning (SBL) renders it infea- sible even for moderately large problem sizes. To address this issue, we propose a fast version of SBL using Variational Bayesian (VB) inference. VB allows one to obtain analytical approximations to the posterior distributions of interest even when exact inference of these distributions is intractable. We propose a novel fast algorithm called space alternating variational estimation (SAVE), which is a version of VB(-SBL) pushed to the scalar level. Similarly as for SAGE (space-alternating generalized expectation maximization) compared to EM, the component-wise approach of SAVE compared to SBL renders it less likely to get stuck in bad local optima and its inherent damping (more cautious progression) also leads to typically faster convergence of the non-convex optimization process. Simula- tion results show that the proposed algorithm has a faster convergence rate and achieves lower MSE than other state of the art fast SBL methods.

Index Terms— Sparse Bayesian Learning, Variational Bayes, Approximate Message Passing, Alternating Optimization

1. INTRODUCTION

Sparse signal reconstruction and compressed sensing has received significant attraction in recent years. The compressed sensing problem can be formulated as

y=Ax+w, (1) wherey is the observations or data,Ais called the measurement or the sensing matrix which is known and is of dimensionN×M withN < M,xis theM-dimensional sparse signal andwis the additive noise.xcontains onlyKnon-zero entries, withK << M. wis assumed to be a white Gaussian noise,w ∼ N(0, γ⁻¹I). To address this problem, a variety of algorithms such as the orthogonal matching pursuit [1], the basis pursuit method [2] and the iterative re-weightedl1 andl2 algorithms [3] exist in the literature. Com- pared to these algorithms, using Bayesian techniques for sparse signal recovery generally achieves the best performance. In a Bayesian setting, the aim is to calculate the posterior distribution of the parameters given some observations (data) and some a priori knowledge.

The Sparse Bayesian Learning algorithm was first introduced by [4]

and then proposed for the first time for sparse signal recovery by [5].

In sparse Bayesian learning, the sparse signalxis modeled using a prior distributionp(x|α) = QM

i=1p(xi|αi), where α = EURECOM’s research is partially supported by its industrial members:

ORANGE, BMW, ST Microelectronics, Symantec, SAP, Monaco Telecom, iABG, and by the projects DUPLEX (French ANR), MASS-START and GE- OLOC (French FUI).

[α1 ... αM]^Tandαiis the inverse of the variance ofxi, also called the precision variable. Since most of the elements ofxare zero, most of theαishould be very high, favoring solutions with few non-zero components.

In the empirical Bayesian approach, an estimate of the hyper parametersα, γand sparse signalxis performed iteratively using evidence maximization. The hyper-parameters are estimated first using an evidence maximization, which is referred to as Type II maximum likelihood method [6]. For a given estimate ofα, γ, the posterior ofxis formulated asp(x/y,α,b bγ)and the mean of this posterior distribution is used as a point estimate ofbx. In [7], the authors propose a Fast Marginalized Maximum Likelihood (FMML) by alternating maximization of the hyperparametersαi. Both previ- ous approaches allow for a greedy initialization (OMP-like) which improves convergence speed and handles initialization issues. Re- cently approximate message passing (AMP) [8], generalized AMP and vector AMP [9–11] were introduced to compute the posterior distributions in a message passing framework and with less complexity. It uses central limit theorem to represent all the messages in a factor graph in belief propogation as Gaussian random variables. It also uses taylor series approximations to reduce the number of messages exchanged between the factor nodes and the variable nodes.

But it suffers from the limitation that only for i.i.d GaussianA, the algorithm is guaranteed to converge.

SBL involves a matrix inversion step at each iteration, which makes it a computationally complex algorithm even for moderately large datasets. An alternative approach to SBL is using variational approximation for Bayesian inference [12, 13]. Variational Bayesian (VB) inference tries to find an approximation of the posterior distribution which maximizes the variational lower bound onlnp(y). [14] introduces a Fast version of SBL by alternatingly maximizing the variational posterior lower bound with respect to single (hyper)parameters. They analytically show that the station- ary points for theαiare the same as those of FMML, provide the pruning conditions and thus accelerate the convergence. [15] introduces inverse-free SBL via a Taylor series expansion. The authors propose a variational expectation-maximization (EM) scheme to maximize a relaxed-ELBO (evidence lower bound), which leads to a computationally efficient SBL algorithm.

1.1. Contributions of this paper In this paper:

• We propose a novel Space Alternating Variational Estimation based SBL technique called SAVE.

• We also propose an AMP-style approximation of SAVE, which reveals links to AMP algorithms.

• Numerical results suggest that our proposed solution has a faster convergence rate (and hence lower complexity) than

(2)

(even) the existing fast SBL and performs better than the existing fast SBL algorithms in terms of reconstruction error in the presence of noise.

In the following, boldface lower-case and upper-case characters de- note vectors and matrices respectively. The operatorstr(·), (·)^T represents trace, and transpose respectively. A real Gaussian random vector with meanµand covariance matrixΘis distributed as x∼ N(µ,Θ). diag(·)represents the diagonal matrix created by elements of a row or column vector. The operator< x >or E(·) represents the expectation ofx. k·krepresents the Frobenius norm.

All the variables are real here unless specified otherwise.

2. VB-SBL

In Bayesian compressive sensing, a two-layer hierarchical prior is assumed for thexas in [4]. The hierarchical prior is chosen such that it encourages the sparsity property ofx. xis assumed to have a Gaussian distribution parameterized byα = [α1 α2 ... αM], whereαirepresents the inverse variance or the precision parameter ofxi.

p(x/α) =

M

Y

i=1

p(xi/αi) =

M

Y

i=1

N(0, α_i⁻¹). (2) Further a Gamma prior is considered overα,

p(α) =

M

Y

i=1

p(αi/a, b) =

M

Y

i=1

Γ⁻¹(a)b^aα^a−1_i e^−bαⁱ. (3) The inverse of noise varianceγis also assumed to have a Gamma prior,p(γ) = Γ⁻¹(c)d^cα^c−1_i e^−dγ.Now the likelihood distribution can be written as,

p(y/x, γ) = (2π)^−N/2γ^N/2e^{−γ||y−Ax||}

2

2 . (4)

2.1. Variational Bayes

The computation of the posterior distribution of the parameters is usually intractable. In order to address this issue, in variational Bayesian framework, the posterior distributionp(x,α, γ/y)is approximated by a variational distributionq(x,α, γ)that has the fac- torized form:

q(x,α, γ) = qγ(γ)

M

Y

i=1

qx_i(xi)

M

Y

i=1

qα_i(αi) (5) Variational Bayes compute the factorsqby minimizing the Kullback- Leibler distance between the true posterior distributionp(x,α, γ/y) and theq(x,α, γ). From [12],

KLDV B = KL(p(x,α, γ/y)||q(x,α, γ)) (6) The KL divergence minimization is equivalent to maximizing the evidence lower bound (ELBO) [13]. To elaborate on this, we can write the marginal probability of the observed data as,

lnp(y) =L(q) +KLDV B, where, L(q) =R

q(θ) ln^p(y,θ)_q(θ) dθ, KLDV B=−R

q(θ) ln^p(θ/y)_q(θ) dθ.

(7) SinceKLDV B ≥ 0, it implies that L(q) is a lower bound on lnp(y). Moreover,lnp(y) is independent ofq(θ) and therefore maximizingL(q) is equivalent to minimizingKLDV B. This is called as ELBO maximization and doing this in an alternating fash- ion for each variable inθleads to,

ln(qi(θi)) =<lnp(y,θ)>k6=i+ci,

p(y,θ) =p(y/x,α, γ)p(x/α)p(α)p(γ). (8) whereθ = {x,α, γ}and θi represents each scalar inθ. Here

<>k6=irepresents the expectation operator over the distributionsqk

for allk6=i.

3. SAVE SPARSE BAYESIAN LEARNING

In this section, we propose a Space Alternating Variational Estima- tion (SAVE) based alternating optimization between each elements ofθ. For SAVE, not any particular structure ofAis assumed, in contrast to AMP which performs poorly whenAis not i.i.d or sub- Gaussian. The joint distribution can be written as,

lnp(y,θ) = ^N₂ lnγ−^γ₂||y−Ax| |²+

M

X

i=1

1

2lnαi−αi

2x²_i

+

M

X

i=1

((a−1) lnαi+alnb−bαi) +(c−1) lnγ+clnd−dγ+constants,

(9) In the following,cx_i, c⁰x_i, cα_i andcγrepresents normalization constants for the respective pdfs.

Update ofqx_i(xi):Using (8),lnqx_i(xi)turns out to be quadratic in xiand thus can be represented as a Gaussian distribution as follows,

lnqx_i(xi) =

−^<γ>₂ n

<||y−A¯ix¯i| |²> −(y−A¯i<x¯i>)^TAixi− xiA^Ti(y−A¯i<x¯i>) +||Ai| |²x²i

o

− ^<α₂ⁱ^>x²i+cx_i

= −_2σ¹2 i

(xi −µi)²+c⁰x_i.

(10) Note that we splitAxas,Ax = Aixi +A¯ix¯i, whereAirepre- sents thei^thcolumn ofA,A¯irepresents the matrix withi^thcolumn ofAremoved,xiis thei^thelement ofx, andx¯iis the vector with- outxi. Clearly, the mean and the variance of the resulting Gaussian distribution becomes,

σ²_i = <γ>||A¹_i||²+α_i,

< xi>=µi = σ_i²A^T_i (y −A¯i<x¯i>)< γ >, (11) whereµirepresents the point estimate ofxi.

Update ofqα_i(αi): The variational approximation leads to the following Gamma distribution for theqα_i(αi),

lnqα_i(αi) = (a−1 +¹₂) lnαi −αi

_<x2 i>

2 +b +cα_i, qα_i(αi) ∝ α^a+

1 2−1

i e^−αⁱ

_<x2 i>

2 +b

.

(12) The mean of the Gamma distribution is given by,

< αi>= _<x^a+₂ ¹² i>

2 +b

, where < x²_i >=µ²_i +σ_i². (13) Update of qγ(γ): Similarly, the Gamma distribution from the variational Bayesian approximation for theqγ(γ) can be written as, qγ(γ) ∝ γ^c+^N²⁻¹e^−γ

<||y−Ax||2>

2 +d

.The mean of the Gamma distribution forγis given by,

< γ >= ^c+

N 2 <||y−Ax||2>

2 +d

, (14) where, <||y −Ax| |²>= ||y| |² − 2y^TAµ+

tr A^TA(µµ^T +Σ)

,Σ = diag(σ²₁, σ²₂, ..., σ²_M), µ = [µ1, µ2, ..., µM]^T. From (11), it can be seen that the estimate of x = µ converges to the L-MMSE equalizer, xb = µ = (A^TA+_<γ>¹ Σ⁻¹)⁻¹A^Ty.

(3)

3.1. Computational Complexity

For our proposed SAVE, it is evident that we don’t need any matrix inversions compared to [14, 16]. Our computational complexity is similar to [15]. Update of all the variablex,α, γinvolves simple addition and multiplication operations. We introduce the following variables,q =y^TAandB=A^TA. q,Band||y| |²can be pre- computed, so only computed once. We also introduce the following notations,xi−= [x1...xi−1]^T,xi+ = [xi+1...xM]^T. Also we rep- resentγ^t =< γ >, α^t_i =< αi >,x^t_i = µi andΣ^t = Σin the following sections, wheretrepresents the iteration stage.

Algorithm 1SAVE SBL Algorithm Given:y,A, M, N.

Initialization: a, b, c, dare taken to be very low, on the order of 10⁻¹⁰.α⁰_i =a/b,∀i, γ⁰=c/dandσ_i²^,0=_||A ¹

i||²γ⁰+α⁰_i,x⁰=0.

At iterationt+ 1,

1. Updateσ_i²^,t+1,x^t+1_i =µi,∀ifrom (11) usingx^t+1_i− andx^ti+. 2. Compute< x²_i^,t+1>from (13) and updateα^t_i.

3. Update the noise variance,γ^t+1from (14).

4. Continue steps1−4till convergence of the algorithm.

4. RELATION BETWEEN SAVE AND AMP In this section, we interpret the computation ofx^ti =µiat iteration tin the message passing framework. The termA^T_i(y−A¯ix¯i)can be interpreted as a linear combination of the messages from each variable nodes. We show that the SAVE iterations can be written as update equations similar to the AMP.

Fig. 1. Factor Graph

In this sectiona ∈ A, whereA = {1,2....N}represents the indices of the variable nodesyaandi∈ B, whereB ={1,2....M}

represents the indices of the factor nodesxi. In the factor graph, factor nodefirepresents the computation of the prior distribution of xi. The message from factor nodeyato the variable nodexiwill be a Gaussian pdf distributed asVya→x_i ∼ N(Aa,ixi;za→i^t , c^ta→i), za→i^t =ya−X

j6=i

Aa,jx^tj→a, c^ta→i= 1 γ^t +X

j6=i

|Aa,j|² 1 α_j^t.Now using (11), consider the message passed from a variable nodexito the factor nodeya,

x^t+1_i→a=F(X

b6=a

Ab,izb→i^t ) = γ^t α^t_i+||Ai| |²γ^t

X

b6=a

Ab,izb→i^t . (15) Now we write the update equation forza→i^t as,

z^t+1_a→i=ya−

M

X

j=1

Aa,jx^t+1j→a+Aa,ix^t+1i→a

=z_a^t+1+δ^t+1_a→i, where, δ^t+1_a→i=Aa,ix^t+1_i→a.

(16)

Forx^t+1_i→a, a first order Taylor series approximation around

X

b

Ab,izb→i^t leads to,

x^t+1_i→a=F(X

b

Ab,iz^tb→i)

| {z }

x^t+1_i

−Aa,iz^taF⁰(X

b

Ab,izb→i^t )

| {z }

∆^t+1_i→a

+O(_M¹),

(17) Similar to [8, 17], the messages can be approximated as follows,

x^t+1_i→a=x^t+1_i + ∆^t+1_i→a+O(_M¹). (18) From the above expression, theupdate equation forx^t_iis,

x^t+1_i =F(X

b

Ab,izb→i^t ) =F(X

b

Ab,izb^t+X

b

A²b,ix^ti). (19)

In the large system limitAb,i ≈ N(0,1/N)and thusX

b

A²b,i = 1, leading to,x^t+1_i = F(X

b

Ab,iz^tb+x^ti).Substituting forx^t+1_i→a from (18),za^t+1gets simplified as,z^t+1a = ya−X

j

Aa,jx^t+1_j + (M/N)z^t_a<F⁰(X

b

Ab,jz^t_b+x^t_j)>,where, <F⁰(X

b

Ab,jz^t_b+ x^tj)>= 1

M X

j

F⁰(X

b

Ab,jzb^t+x^tj).^M_Nz^ta<F⁰(X

b

Ab,jz^tb+

x^t_j)>= 1 β

1 M

M

X

j=1

γ^t

||Ai| |²γ^t+α^t_i is the Onsager term [18] andβ is defined asβ= _M^N. Therefore the approximated SAVE algorithm can be written as,

Algorithm 2AMP SAVE Algorithm Definitions:

β≡ ^N_M, r^t≡A^Tz^t+x^t.

Foperates elementwise,F(ri^t) =_αt ^γ^t i+||A_i||²γ^tr^ti. Update Equations:

x^t+1=F(r^t).

z^t+1=y−Ax^t+1+ (1/β)z^t_M¹ PM j=1

γ^t

||A_i||²γ^t+α^t_i. Parameter tuning:

σ^2,t+1_i = ¹

α^t_i+||A_i||²γ^t,α^t+1_i = ^a+¹²

(xt+1

i )2 +σ2,t+1 i

2 +b

,∀i

γ^t+1 = ^c+

N 2

||y−Axt+1||2 +tr(ATAΣt+1)

2 +d

!.

It can be noted that the above SAVE AMP algorithm has more similarity to the optimally-tuned Non Parametric Equalizer (NOPE) proposed in [19, 20], which is an extended version of the AMP. It is to be noted that in [19], the variances of thexiare assumed to be the same for alli.

4.1. State Evolution

AMP based algorithms decouple the system of equations into par- allel AWGN channels with equal noise variance. This means that the quantityr^t+1_i = x^t_i+A^T_iz^tcan be expressed equivalently as xi+n^ti, wheren^ti∼ N(0, τt²)andτt²is the decoupled noise variance. In AMP, the decoupled noise variance can be tracked exactly by the SE framework.

Lemma 1. Considering the large system limit and a Lipschitz con- tinuous functionF, the decoupled noise varianceτ_t²andγ^tis given

(4)

by the following SE recursion,

τt+1² = _γ_t+1¹ +_β¹ ξ^t+ζ^tτt²

,

1

γ^t+1 =_N¹ ||y| |²+_β¹ ψ^t+τt²ζ^t

, ξ^t= E _αt i (γ^t+α^t_i)²

, ζ^t= E

(γ^t)² (γ^t+α^t_i)²

, ψ^t= E

(γ^t)² α^t_i(γ^t+α^t_i)²

.

(20) Proof:Following [18], we write the update equation ofr^tas,

r^t= (A^Tz^t+x^t)

=x+ (I−A^TA)(x^t−x) +A^Tw+r^t_Onsager, (21) wherer^t_Onsager = (1/β)A^Tz^t−1 <F⁰(r_j^t−1)>. For the conve- nience of the analysis, we define:

e^t≡x^t−xandn^t≡r^t−x,e^t+1=F(x+n^t)−x, n^t= (I−A^TA)e^t+A^Tw+r^t_Onsager, (22) wheren^t ∼ N(0, τt²I)and independent ofx. The SE for approximated SAVE AMP leads to the following recursion,

τ_t+1² =_β¹v²_t+1+ ¹

γ^t+1, where v²_t+1=_M¹tr E

(e^t+1)² = _M¹tr (I−Λ^t)²Ξ^t+ (Λ^t)²τ_t² , (23) where,Λ^t,diagonal with,(Λ^t)i,i=_||A ^γ^t

i||²γ^t+α^t_i,Ξ^t= diag( ¹

α^t₁, ¹

α^t₂, ..., ¹

α^t_M), also we made the approximation that A^Tw is a vector of i.i.d normal entries with mean 0 and variance(1/N)||w| |²which converges by the law of large numbers to

1

γ^t. Also we use theLemma4.2.1in [17] which show that each entry ofI−A^TAis approximately normal, with zero mean and variance1/N. Expanding forΛ^t,Ξ^tand||Ai| |²= 1, we can write the decoupled noise variance as,

τ_t+1² =_γ_t+1¹ + _βM¹

M

X

i=1

α^t_i+ (γ^t)²τ_t² (γ^t+α^t_i)²

!

. (24)

Now in the large system limit M, N → ∞ with a fixed β, τt+1² = _γt+1¹ + ¹_β ξ^t+ζ^tτt²

, where _M¹

M

X

i=1

α^ti

(γ^t+α^t_i)² and

1 M

M

X

i=1

(γ^t)²

(γ^t+α^t_i)² will converge to deterministic limits ξ^t and ζ^t. Now ast → ∞, the fixed point ofτt² can be evaluated as, τ∞² =

1 γ∞+^ξ^∞_β

1−^ζ_β^∞ .. From this, it can be concluded thatτ_t²will converge if^ζ_β^∞ <1. Similarly for theγ^t, a recursion can be obtained as follows,

1 γ^t+1 =

1

N||y||² + (_βM¹

M

X

i=1

(γ^t)²

α^t_i(γ^t+α^t_i)² + τt²

βM

M

X

i=1

(γ^t)² (γ^t+α^t_i)²),

(25) AsN, M→ ∞with fixedβ, this converges to (20).

5. SIMULATION RESULTS

In this section we present the simulation results to validate the performance of our SAVE SBL algorithm (Algorithm 1) compared to state of the art solutions. We compare our algorithm with the Fast

Inverse-Free SBL (Fast IF SBL) in [15], the G-AMP based SBL in [16] and the fast version of SBL (FV SBL) in [14]. For the sim- ulations, we have fixedM = 200andK = 30. All the elements ofAandxare generated i.i.d from a normal distribution,N(0,1).

The SNR is fixed to be20dBin the simulation.

5.1. MSE Performance

40 50 60 70 80 90 100 110 120 130 140 150

N, No of observations -45

-40 -35 -30 -25 -20 -15 -10 -5

NMSE

Proposed SAVE SBL FAST IF-SBL GAMP SBL FV SBL

Fig. 2. NMSE vs the number of observations.

From Figure 2, it is evident that the proposed SAVE algorithm performs better than the state of the art solutions in terms of the Normalized Mean Square Error (NMSE), which is defined as N M SE = _M¹ ||xb−x| |², xb represents the estimated value, N M SEdB= 10 log 10(N M SE).

5.2. Complexity

40 60 80 100 120 140 160

N, No of Observations 10^-4

10^-3 10^-2

Execution Time (in sec)

Proposed SAVE SBL FAST IF SBL GAMP SBL FV SBL

Fig. 3. Execution time vs the number of observations.

Since the proposed SAVE and [15,16] have similar computational re- quirements, we plot the execution time required for the convergence of the algorithms. It is clear from Figure 3 that proposed SAVE approach has a faster convergence rate than the existing fast SBL algorithm.

6. CONCLUSION

We presented a fast SBL algorithm called SAVE, which uses the variational inference techniques to approximate the posteriors of the data and parameters. SAVE helps to circumvent the matrix inversion operation required in conventional SBL using EM algorithm. We showed that the proposed algorithm has a faster convergence rate and better performance in terms of NMSE than even the state of the art fast SBL solutions. Possible extensions to the current work might include: i) the case in whichAis parametric in an unknownθ, ii) further analysis involving the mismatched CRBs for VB-SBL or SAVE and iii) SBL in the context of multiple measurement vectors case as in [21, 22].

(5)

7. REFERENCES

[1] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit ,”IEEE Trans.

Inf. Theory, vol. 53, no. 12, pp. 4655–4666, December 2007.

[2] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic de- composition by basis pursuit ,”SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1998.

[3] D. Wipf and S. Nagarajan, “Iterative reweighted l1 and l2

methods for finding sparse solutions ,”IEEE J. Sel. Topics Sig- nal Process., vol. 4, no. 2, pp. 317–329, April 2010.

[4] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,”J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.

[5] D. P. Wipf and B. D. Rao, “Sparse Bayesian Learning for Basis Selection ,”IEEE Transactions on Signal Processing, vol. 52, no. 8, pp. 2153–2164, August 2004.

[6] R. Giri and B. D. Rao, “Type I and type II bayesian methods for sparse signal recovery using scale mixtures,”IEEE Trans- actions on Signal Processing, vol. 64, no. 13, pp. 3418–3428, 2018.

[7] M. E. Tipping and A. C. Faul, “Fast marginal likelihood max- imisation for sparse Bayesian models,” inAISTATS, January 2003.

[8] D. L. Donoho, A. Maleki, and A. Montanari, “Message- passing algorithms for compressed sensing ,”PNAS, vol. 106, no. 45, pp. 18 914–18 919, November 2009.

[9] R. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” inProc. IEEE Int.

Symp. Inf. Theory, Saint Petersburg, Russia, August 2011, p.

21682172.

[10] S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approximate message passing with arbitrary matrices,” in Proc. IEEE Int. Symp. Inf. Theory, 2014.

[11] ——, “Vector approximate message passing,” inProc. IEEE Int. Symp. Inf. Theory, 2017.

[12] M. J. Beal, “Variational algorithms for approximate bayesian inference,” inThesis, Univeristy of Cambridge, UK, May 2003.

[13] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,”IEEE Signal Process. Mag., vol. 29, no. 6, pp. 131–146, November 2008.

[14] D. Shutin, T. Buchgraber, S. R. Kulkarni, and H. V. Poor, “Fast variational sparse bayesian learningwith automatic relevance determination for superimposed signals,” IEEE Transactions on Signal Processing, vol. 59, no. 12, December 2011.

[15] H. Duan, L. Yang, and H. Li, “Fast inverse-free sparse bayesian learning via relaxed evidence lower bound maximization,”

IEEE Signal Processing Letters, vol. 24, no. 6, June 2017.

[16] M. Al-Shoukairi, P. Schniter, and B. D. Rao, “GAMP-based low complexity sparse bayesian learning algorithm,” IEEE Transaction on Signal Processing, vol. 66, no. 2, January 2018.

[17] A. Maleki, “Approximate message passing algorithms for compressed sensing ,”Stanford University, November 2010.

[18] M. Bayati and A. Montanari, “The Dynamics of Message Pass- ing on Dense Graphs, with Applications to Compressed Sens- ing ,”IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 764–785, February 2011.

[19] R. Ghods, C. Jeon, G. Mirza, A. Maleki, and C. Studer,

“Optimally-tuned nonparametric linear equalization for mas- sive mu-mimo systems,” inProc. IEEE Int. Symp. Inf. Theory, June 2017.

[20] C. Jeon, A. Maleki, and C. Studer, “On the performance of mismatched data detection in large mimo systems,” inProc.

IEEE Int. Symp. Inf. Theory (ISIT), Honolulu, HI, USA, July 2016, pp. 180–184.

[21] Z. Zhang and B. D. Rao, “Sparse Signal Recovery with Tempo- rally Correlated Source Vectors Using Sparse Bayesian Learn- ing ,”IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 5, pp. 912 – 926, September 2011.

[22] M. Al-Shoukairi and B. Rao, “Sparse bayesian learning using approximate message passing,” in48th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, November 2014.