• Aucun résultat trouvé

Nonlinear MMSE using linear MMSE bricks and application to compressed sensing and adaptive Kalman filtering

N/A
N/A
Protected

Academic year: 2022

Partager "Nonlinear MMSE using linear MMSE bricks and application to compressed sensing and adaptive Kalman filtering"

Copied!
89
0
0

Texte intégral

(1)

Nonlinear MMSE using Linear MMSE Bricks and

Application to Compressed Sensing and Adaptive Kalman Filtering

Dirk Slock

Communication Systems Department, EURECOM, Sophia Antipolis, France

ICASSP 2020, May 4-8, ETON Talk

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(2)

Thanks to

Anthony Schutz Bensaid Siouar Christo Thomas

This talk is about more than the title = teaser.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(3)

Main Messages

There are manyBayesian estimationproblems, many of which are LMMSE (Wiener, Kalman), which containhyperparametersto be tuned, using various approaches.

Information combining: fromweighted least-squarestomessage passingin a more general overall Bayesian formulation (e.g. cooperative location estimation)

Empirical Bayes (EB)as principled framework forbias-variance trade-off

but not necessarily using empirical Bayes for hyperparameter estimation: SURE,Cross Validation compressive sensing, sparsemodels,generalization of model order selectiontosupport region, model complexity and structure

Sparse Bayesian Learning (SBL)is one EB instance, allowing to exploit (approximate) sparsity for compressed sensing

can be extended to time-varying scenarios with sparse variations also

can be extended to dictionary learning, in particular with Kronecker structured dictionaries

message passing(approximate iterative) inference techniques: easy to get the mean (estimate) correctbutmore difficult to get correct posterior variances

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(4)

Main Messages (2)

free energy optimization framework, guided bymismatched Cramer-Rao Bound (mCRB)for split in various MP simplification levels (Belief Propagation (BP),Variational Bayes (VB) - Mean Field (MF)), allowing performance-complexity trade-off

large system analysis (LSA)yields simplified asymptotic performance analysis for certain

measurement matrix models, allowing to show optimality and to justify algorithmic simplifications Approximate Message Passing (AMP)very similar to approximate largeturbo receivers for CDMAfor which heuristic LSA was performed based onReplica Method Analysis.

AMP can be derived more rigorously fromBP, using asymptotically justifiablefirst-order Taylor series expansionsandGaussian approximations.

LSA allows tracking of the AMP MSE through the iterations, calledState Evolution (SE), showing convergence to MMSE and hence optimality.

SE requiresstatistical models for the measurement matrixA. Pushing these model assumptions completely through to the xAMP algorithms may be an unnecessary simplification. The main requirement isindependent rows/columnsas in CDMA random spreading.

Most xAMP versions require i.i.d. x, which is not suited for SBL. We presentnew LSA for SBL-AMP.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(5)

Outline

1

Introduction

2

Static SBL

3

Combined BP-MF-EP Framework

4

Posterior Variance Prediction: Bayes Optimality

5

Performance Analysis of Approximate Inference Techniques (mCRB)

6

Dynamic SBL

7

Kronecker Structured Dictionary Learning using BP/VB

8

Numerical Results and Conclusion

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(6)

Kalman Filter

Linear state-space model:

state update equation:

x

k+1

= F

k

(θ) x

k

+ G

k

(θ) w

k

measurement equation:

y

k

= H

k

(θ) x

k

+ v

k

for k = 1, 2, . . ., with uncorrelated

initial state x

0

∼ N ( x b

0

, P

0

),

measurement noise v

k

∼ N (0, R

k

(θ)), state noise w

k

∼ N (0, Q

k

(θ)).

State model known up to some parameters θ.

Often F

k

(θ), G

k

(θ), H

k

(θ) linear in θ: bilinear case.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(7)

Numerous Applications

LMMSE wireless channel estimation:

xk= FIR filter response,θ: Power Delay Profile, AR(1) dynamics in e.g. diagonalF andQ Bayesian adaptive filtering (BAF):

analogous to LMMSE channel estimation, except measurement equation: only one 1D measurement is available per instance. An extremely simplified form of BAF is the so-called Proportionate LMS (P-LMS)algorithm.

Position tracking (GPS):

θ: acceleration model parameters (e.g. white noise, AR(1)) Blind Audio Source Separation (BASS):

xk= source signals,

θ: (short+long term) AR parameters, reverb filters

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(8)

Static LMMSE (Wiener) Applications

Direction of Arrival (DoA) estimation: x = decorrelated sources, apart from theDoA parameters there could also beantenna array calibration parametersorsource and noise covariance

parameters.

Blind and semi-blind channel estimation. In the techniques that exploit the (white) second-order statistics ofx, (the unknown part of)x gets modeled as Gaussian. Numerous variations:

single-carrier, OFDM and CDMA versions, single- and multi-user, single- and multi-stream, with black box FIR channel models or propagation based parameterized channel models.

Image Deblurring, NMRI Imaging

LMMSE receiver (Rx) design: x = Tx symbol sequence to be recovered on the basis of Rx signal, in single- or multi-user settings and other variations as in the channel estimation case. The crosscorrelation between Tx and Rx signals depends on the channel response, which is part of the parameters. The Rx signal covariance matrix on the other hand can be modeled in various ways, non-parametric or parametric. Account for the channel estimation error in the LMMSE Rx design. Another approach: consider the LMMSE filter directly as the parameters.

LMMSE Tx design, partial CSIR/CSIT.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(9)

Adaptive Kalman Filter solutions

Extended Kalman Filter (EKF)

other generic nonlinear Kalman Filter extensions:

Unscented Kalman Filter (UKF), Cubature Kalman Filter (CKF), Gaussian Sum Filter, Particle Filter (PF)

Recursive Prediction Error Method (RPEM) Kalman Filter Second-Order Extended Kalman Filter (SOEKF)

Expectation-Maximization (EM)/Variational Bayes (VB) Kalman Filter

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(10)

Time Varying Sparse State Tracking

Sparse signalxt is modeled using an AR(1) process with diagonal correlation coefficient matrixF.

Define:Ξ= diag(ξ),F = diag(f).

fi: correlation coefficient andxi,t∼ CN(xi,t; 0,ξ1

i). Further,wt∼ CN(wt;0,Γ−1−1(I−FFH)) andvt ∼ CN(vt;0, γ−1I). VB leads to Gaussian SAVE-Kalman Filtering (GS-KF).

Applications: Localization, Adaptive Filtering.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(11)

Compressed Sensing Problem

Noiseless case: Given underdeterminedy,A, the optimization problem is minx kxk0 subject toy =Ax.

Can recoverx and its support for smallN− kxk0 (small overdetermination if support were known) Noisy case:

minx kxk0 subject to ky−Axk2≤. l0norm minimization: an NP-hard problem.

Constrained problem ⇒ Lagrangian,Convex Relaxation(usingpnorm,p>1):

minx ky−Axk22+λkxkp.

Restricted Isometry Property (RIP): ATAsufficiently diagonally dominant Most identifiability work considerednoiselessdata &exact sparsity

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(12)

Sparse Signal Recovery Algorithms

Convex Relaxation based Methods:

Basis pursuit (l1norm)1. LASSO(l1norm)2 Dantzig selector3

FOCUSS (lp norm, withp<1).

Greedy Algorithms:

Matching Pursuit4

Orthogonal Matching Pursuit (OMP)5 CoSaMP6

Iterative Methods:

Iterative Shrinkage and Thresholding Algorithm (ISTA)7or Fast ISTA.

Approximate Message Passing variants (xAMP)- more details to follow.

Very recent: Deep learning based methodssuch as Learned ISTA (LISTA)8, Learned AMP (LAMP) and Learned Vector AMP (LVAMP)9.

1Chen, Donoho, Saunders’99,2Tibshirani’96,3Candes, Tao’07

4Mallat, Zhang’93,5Tropp, Gilbert’07,6Needell, Tropp’09

7Daubechies, Defrise, Mol’04,8Gregor, Cun’10,9Borgerding, Schniter, Rangan’17

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(13)

James-Stein Estimator

Bayesian interpretationof (possibly overdetermined) Compressed Sensing:

minx ky−Axk22−2σv2lnp(x)

Stein and James10showed that for i.i.d. linear Gaussian modelp(x) =N(x;0, ξ−1I), it is possible to construct a nonlinear estimate ofx with lower MSE than that of ML for all values of the true unknownx.

A popular design strategy: is to minimizeStein’s unbiased risk estimate (SURE), which is an unbiased estimate of the MSE.

SURE directly approximates the MSE of an estimate from the data, without requiring knowledge of the hyperparameters (ξ), it is an instance ofempirical Bayes.

Stein’s landmark discovery lead to the study ofbiased estimators that outperform minimum variance unbiased estimators (MVU)in terms of MSE, e.g. work by Yonina Eldar11. Shrinkage estimatorsandpenalized maximum likelihood (PML)estimators.

10James, Stein’61

11Eldar’08

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(14)

Kernel Methods in Automatic Control

Kernel methods inlinear system identification12(y=Ax+v,v ∼ N(v;0, γ−1I)).

Traditional methods: maximum likelihood (ML) or prediction error methods (PEM) ML/PEM optimal in the large data limit.

Questions: Model structure design for ML/PEM.Achieving a good bias-variance trade off.

Solution: Parameterized Kernel design and hyperparameter estimation. Methods for

hyperparameter estimation include cross-validation (CV), empirical Bayes (EB),Cpstatistics and Stein’s unbiased risk estimate (SURE).

Regularized least square estimator (P is symmetric and +ve semidefinite kernel matrix):

bx= arg min

x∈RM

ky−Axk2+1

γxTP−1x.

Parameterized family of matrices,P(η), whereη∈ Rp. η are the hyperparameters.

Can be interpreted as aconstrained form of SBL, with a zero-mean Gaussian prior forx of which the covariance matrix is a linear combination of some fixed matrices (SBL being a special case with fixed matriceseieiT).

A good overview of Kernel methods, connection with machine learning13.

12Pillonetto, Nicolao’10

13Pillonetto, Dinuzzo, Chen, Nicolao, Ljung’14

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(15)

Kernel Hyperparameter Estimation

Empirical Bayes (EB=Type II ML):

ηbEB= arg min

η fEB(P(η)),

fEB(P(η)) =yTQ−1y+ ln det(Q) withQ=APAT+γ1IN. Two SURE methods:

SURE 1: MSE of signal reconstruction (MSEx(P) = E(kxb−xk2)):

SUREx :ηbSx= arg min

η fSx(P(η)), with

fSx(P(η)) = γ12yTQ−TA(ATA)−2ATQ−1y+γ1tr{2R−1−(ATA)−1}, R=ATA+γ1P−1.

SURE 2: MSE of output prediction (MSEy(P) = E(kAxb+v−yk2)),v independent fromv:

SUREy :ηbSy= arg min

η fSy(P(η)), with

fSy(P(η)) = γ12yTQ−TQ−1y+ 2γ1tr{APATQ−1}

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(16)

Asymptotic Properties of Hyperparameter Estimators

Derivedfirst order optimality conditions. In general, no closed form expression shown except for special cases for e.g diagonalA, ridge regression withATA=NIM.

Theorem 1

14Assume thatP(η)is any +ve definite parameterization of the kernel matrix and ATNA N→∞→ Σ, where Σis positive definite. Then we have the following almost sure convergence.

ηbMSEx→ηx, ηbSx→ηx

ηbMSEy →ηy, ηbSy →ηy ηbEEB→ηEB, ηbEB→ηEB

ηbMSEx,ηbMSEy,ηbEEB being the ORACLE estimators which are optimal for any data lengthN.

The twoSURE estimators converge to the best possible hyperparameter in terms of MSE in the asymptotic limit, “asymptotically consistent”.

EB estimator converges to another best hyperparameter minimizing the expectation of the EB estimation criterion (EEB).

Convergence of EB is fasterthan that of the two SURE estimators.

14Mu, Chen, Ljung’18

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(17)

Sparse Bayesian Learning - SBL

Bayesian Compressed Sensing: 2-layer hierarchical prior forx as in15, inducing sparsity forx (conjugate priors: posterior pdf of same family as prior pdf) :

px(xi,ti) =N(xi,t; 0, ξ−1i ), p(ξi|a,b) = Γ−1(a)baξia−1e−bξi

⇒sparsifying Student-t marginal

px(xi,t) = baΓ(a+12)

(2π)12Γ(a) (b+xi,t2 /2)−(a+12)

Sparsification of the Innovation Sequence: we apply the (Gamma) prior not to the precision of the statex but of it’s innovationw, allowing to sparsify at the same time the components ofx AND their variation in time(innovation).

The inverse of the noise varianceγ is also assumed to have a Gamma prior, pγ(γ|c,d) = Γ−1(c)dcγc−1e−dγ.

15Tipping’01

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(18)

Outline

1

Introduction

2

Static SBL

3

Combined BP-MF-EP Framework

4

Posterior Variance Prediction: Bayes Optimality

5

Performance Analysis of Approximate Inference Techniques (mCRB)

6

Dynamic SBL

7

Kronecker Structured Dictionary Learning using BP/VB

8

Numerical Results and Conclusion

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(19)

Original SBL Algorithm (Type II ML)

Original SBL16, for a fixed estimate of thehyperparameters(ξ,b bγ),the posterior ofx is Gaussian, i.e. px(x|y,ξ,bbγ) =N(x;xb,ΣL)

leading to the(Linear) MMSE estimate forx

xb=bγ(bγATA+Ξ)b −1ATy,

ΣL= (γAb TA+Ξ)b −1. (1) Thehyperparametersare estimated from the likelihood function by marginalizing over the sparse coefficientsx, the marginalized likelihood being denoted aspy(y|ξ, γ). ξ, γare estimated by maximizingpy(y|ξ, γ) and this procedure is called as Type-II ML.Type-II ML is solved using EM, which leads to the following updates for the hyperparameters.

ξbi = a+12

<x2 i>

2 +b

!, where <xi2>=bxi2i2. < γ >= c+N2

<||yAx||2>

2 +d

, where, <||y −Ax| |2>= ||y| |2− 2yTAbx + tr ATA(bxbxT +Σ)

, Σ = diag(ΣL) = diag(σ21, σ22, ..., σM2), bx = [xb1,bx2, ...,bxM]T.

16Tipping’01, Wipf,Rao’04

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(20)

Type I vs Type II ML

Type I =⇒ standard MAP estimation(involves integrating out the hyperparameters) xb= arg max

x [logpy(y|x) +px(x)].

Type II =⇒ hyperparameters (Ψ={ξ, γ}) are estimated using anevidence maximization approach

Ψb = arg max

Ψ pΨ(Ψ|y) = arg max

Ψ pΨ(Ψ) Z

py(y|Ψ) = arg max

Ψ pΨ(Ψ) Z

py(y|x, γ)px(x|ξ)dx.

Why Type II is better than Type I?17In the evidence maximization framework instead of looking for the mode of the true posteriorpx(x|y), the true posterior is approximated aspx(x|y;Ψ),b whereΨb is obtained by maximizing the true posterior mass over the subspaces spanned by the non zero indexes.

Type I methods seek the mode of the true posterior and use that as the point estimate of the desired coefficients. Hence,if the true posterior distribution has a skewed peak, then the Type I estimate (Mode) is not a good representative of the whole posterior.

17Giri, Rao’16

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(21)

Variational Bayesian (VB) Inference

The computation of the posterior distribution of the parameters is usually intractable. As in SAGE,SAVE is simply VB with partitioning of the unknowns at the scalar level. Define θ={x,ξ, γ},θi represents each scalar andθi denotesθexcludingθi.

q(θ) = qγ(γ)

M

Y

i=1

qxi(xi)

M

Y

i=1

qξii).

VB compute the factorsqbyminimizing the Kullback-Leibler distancebetween the true posterior distributionp(θ|y) and theq(θ). From18,

KLDVB = DKL(q(θ)||p(θ|y)) =R

q(θ)lnp(θ|y)q(θ) dθ.

The KL divergence minimization is equivalent tomaximizing the evidence lower bound (ELBO)19. lnp(y) =L(q) +KLDVB=−DKL(q(θ)||p(θ,y)) +DKL(q(θ)||p(θ|y)), where,

lnp(y) is the evidence, and minKLDVB becomes equivalent to maxL(q), the ELBO.

We get for the element-wise VB recursions: (Expectation Maximization (EM)= special case:

θ={θsd}, θs random, hidden θd deterministic) ln(qii)) =<lnp(y,θ)>θi+ci,

18Beal’03,19Tzikas, Likas, Galatsanos’08

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(22)

Low Complexity-Space Alternating Variational Estimation (SAVE)

Mean Field (MF)approximation: VB partitioned to scalar level (MF vs VB // SAGE vs EM), results in a SBL algorithmwithout any matrix inversions.

The resultingalternating optimization of the posteriors for each scalar inθleads to ln(qii)) =<lnp(y,θ)>k6=i +ci,

p(y,θ) =py(y|x,ξ, γ)px(x|ξ)pξ(ξ)pγ(γ).

whereθ = {x,ξ, γ}andθi represents each scalar inθ.

lnp(y,θ) =N2lnγ−γ2||y−Ax| |2+

M

X

i=1

1

2lnξi−ξi

2xi2

+

M

X

i=1

((a−1) lnξi+alnb−bξi) + (c−1) lnγ+clnd−dγ+ constants.

Gaussian approximate posterior forxi: lnqxi(xi) =−<γ>2 n

<||y−A¯ix¯i| |2>−(y−A¯i<x¯i >)TAixi− xiATi (y−A¯i<xi¯>) +||Ai| |2xi2o

2i>xi2+cxi =− 1

2i (xi −xbi)2+cx0i.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(23)

SAVE Iterations Continued...

The SAVE iterations forx get obtained as

σ2i = <γ>||A1i||2+ξi, xbi2iATi (y −A¯ibx¯i)< γ > . Hyperparameter estimates: same as EM iterations. Gamma posterior forξi andγ.

No matrix inversions.

Update of all the variables,x, ξi, γ, requires simple addition and multiplication operations yTA,ATAand||y||2can be precomputed, so only need to be computed once.

Remarks: From LMMSE expression in (1),ith row ofγATAxb+Ξbx=γATy: γATi Axb+ξibxi =γATi y or (γkAik2i)xbi=γATi (y−Aixbi)

HenceSAVE does linear PIC iterations to compute the LMMSE estimate.

However, for the posterior variances : σ2i = ((Σ−1L )i,i)−1≤(ΣL)i,i with equality only for diagonalΣL

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(24)

Convergence of SAVE

Theorem 2

The convergence condition for the sparse coefficients xi of the SAVE algorithmacan be written as ρ(D−1H)<1, whereD=diag(bγATA+Ξ),b H=offdiag(bγATA). ρ(·)denotes the spectral radius.

Moreover, if SAVE converges, assuming the estimate of hyperparameters are consistent, the posterior mean (point estimate) always converges to the exact value (LMMSE). However, the predicted posterior variance is quite suboptimal.

aThomas,Slock’18

Remark: To fix the convergence of SAVE (whenρ(D−1H)>1), we can use the diagonal loading method20. The modified iterations (with a diagonal loading factor matrixΛ) can be written as,

(D+Ξ)xe (t+1)=−(H−Ξ)xe (t)+γAb Ty, =⇒ x(t+1)=−(D+Ξ)e −1(H−Ξ)x(t)+(D+Ξ)e −1bγATy.

The convergence condition gets modified asρ((D+Ξ)e −1(H−Ξ))e <1. Another point worth noting here is that, if the power delay profileΞis also estimated using MF,bγdiag(ATA) +Ξ, whereb Ξb=Ξ+Ξ, withe Ξe>0. In this case,Ξemay represent an automatic correction factor (diagonal loading) to force convergence of SAVE for cases whereρ(D−1H)>1.

20Johnson, Bickson, Dolev’09

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(25)

NMSE Results

40 50 60 70 80 90 100 110 120 130 140 150

N, No of observations -45

-40 -35 -30 -25 -20 -15 -10 -5

NMSE

Proposed SAVE SBL FAST IF-SBL GAMP SBL FV SBL

Figure 4:NMSE vs the number of observations (M= 200,L= 40,Lis the number of non-zero elements).

For sufficient amount of data,SAVE has significantly lower MSE than the other fast algorithms.

Why? The resulting problem of alternating optimization ofx and the hyperparametersξandγ appears to be characterized by many local optima. Apparently,the component-wise VB approach appears to allow to avoid a lot of bad local optima, explaining the better performance, apart from lower complexity.

At very low amount of data, suboptimal approaches such as AMP which don’t introduce individual hyper parameters perx component and assume that thexi behave i.i.d, behave better because of the lower number of hyper parameters to be estimated.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(26)

An Overview of Fast SBL Algorithms

Fast SBL using Type II MLby Tipping21: greedy approach handling onexi at a time, plus replacing precisions by their convergence values, leading to pruning of the smallxi components, i.e. explicit sparsity.

Fast SBL using VB by Shutinet. al.22: Shutin uses VB while Tipping is Type II ML as in original SBL. They do both replace precisions by their convergence values. Shutin also added some extra view points in terms of thepruning condition being interpreted as relating between sparsity properties of SBL and a measure of SNR. Main message of the both beingfaster convergence compared to original SBL, not much reduction in per iteration complexity.

BP-SBL23: In SBL, with fixed hyperparameters, MAP or MMSE estimate (follows from the Gaussian posterior) ofx can be efficiently computed using belief propagation (BP), since all the messages involved are Gaussian (without any approx.).

Inverse Free SBL (IF-SBL)24: Optimization using a relaxed ELBO.

Hyperparameter free sparse estimation25: Does not require hyperparameter tuning compared to SBL. Uses covariance matching, equivalent to square root LASSO.

21Tipping, Faul’03,22Shutin, Buchgraber, Kulkarni, Poor’11

23Tan, Li’10,24Duan, Yang, Fang, Li’17

25Zachariah, Stoica’15

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(27)

Complexity Comparisons-SBL Algorithms

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(28)

Outline

1

Introduction

2

Static SBL

3

Combined BP-MF-EP Framework

4

Posterior Variance Prediction: Bayes Optimality

5

Performance Analysis of Approximate Inference Techniques (mCRB)

6

Dynamic SBL

7

Kronecker Structured Dictionary Learning using BP/VB

8

Numerical Results and Conclusion

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(29)

Approximate Inference Cost Functions: An Overview

MLmin. KLD ofpy(y|θ) to empirical distribution ofy (py(y) =δ(yy)):

θmin,KL= arg minθDKL(py(y)||py(y|θ)) = arg max

θ ln(py(y|θ)) =θMLE. VB minimizesKLD of factored approximate posterior(q(θ) =Y

i

qθii)).:

KLDVB =DKL(q(θ)||p(θ|y)).

Variational Free Energy (VFE)(U(q) = Average System Energy,H(q) = Entropy). Assume actual posteriorp(θ|y) =p(θ,y)p(y) =

Q apaa)

Z andFH=lnZ(Helmholtz Free Energy or log-partition function).

F(q(θ)) =DKL(q(θ)||p(θ|y)) +FH=X

θ

q(θ)X

a

lnpaa)

| {z }

U(q)

+X

θ

q(θ) lnq(θ)

| {z }

−H(q)

=DKL(q(θ)||Q

apaa)).

Figure 5:A small factor graph representing the posteriorp(x1,x2,x3,x4) =Z1fA(x1,x2)fB(x2,x3,x4)fC(x4)26. 26Yedidia, Freeman, Weiss’05

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(30)

Approximate Inference Cost Functions: An Overview (2)

F(q)≥FH, equality only ifq(θ) =p(θ|y). Practical approach: upper boundFHby minimizing F(q) over a restricted class of probability distributions leading to Kikuchi, BP or MF

approximations.

Belief Propagation (BP) minimizes Bethe Free Energy (BFE), Mean Field (MF) minimizes MFFE (MF Free Energy). BP converges to exact posterior when the factor graph is a tree. For MF (VB pushed to scalar level),q(θ) =

M

Y

i=1

qθii).

MFFE ≥BFE ≥ VFE.

Region based Free Energy approximations (RFE)(more details in the next slide): The intuitive idea behind a RFE approximation is tobreak up the factor graph into a set of large regions that include every factor and variable node, and say that the overall free energy is the sum of the free energies of all the regions. BP is a special case of this.

Expectation Propagation (EP): derived using BFE under moment matching constraints.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(31)

Region Based Free Energy

A regionR of a factor graph to be a setVR of variable nodes and setAR of factor nodes, such that a∈ AR = all variable nodes connected toaare inVR.θR is defined as the set of all variable nodes belonging to the regionR.

Region energy is defined asERR) = P

a∈AR

lnpaa).

Region free energy using region entropy and region average energy:

UR(qR) =P

θR

qRR)ERR), HR(qR) =P

θR

qRR) lnqRR).

andFR(qR) =UR(qR)HR(qR).

Region-based free energy using region-based entropy and region-based average energy:

UR({qR}) = P

R∈R

cRUR(qR), HR({qR}) = P

R∈R

cRHR(qR).

andFR({qR}) =UR({qR})HR({qR}).

The intuitive idea: break up the factor graph into a set of large regions that include every factor and variable node, and say thatthe overall VFE is the sum of the VFEs of all the regions. If some of the large regions overlap, then we will have erred by counting the free energy contributed by some nodes two or more times, so we then need tosubtract out the free energies of these overlap regions in such a way that each factor and variable node is counted exactly once(weightcR takes care of this).

BP:Each factor node (and it’s neighbouring variable nodes)form one set of regions. Another set of regions which containonly one variable node.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(32)

Variational Free Energy (VFE) Framework

Intractable joint posterior distribution of the parametersθ={x,A,f,Γ, γ}.

Actual posterior: p(θ)=Z1 Y

a∈ABP

paa) Y

b∈AMF

pbb)

| {z }

factor nodes

, whereABP,AMF = set of factor nodes

belonging to the BP/MF part withABP∩ AMF=∅.

The wholeθis partitioned into the setθi (variable nodes), and we want toapproximate the true posteriorp(θ) by an approximate posteriorq(θ) =Q

iqii).

NBP(i),NMF(i)−the set of neighbouring factor nodes of variable node i which belong to the BP/MF part.

IMF= S

a∈AMF

N(a), IBP = S

a∈ABP

N(a). N(a)−the set of neighbouring variable nodes of any factor nodea.

The resultingFree Energy (Entropy−Average Energy) obtained by the combination of BP and MF27are written as below (letqii) represents the belief aboutθi (the approximate posterior))

FBP,MF= P

a∈ABP

[DKL(qaa)||paa) +DKL(qaa)||Q

i∈N(a)qii))]+ P

b∈AMF

DKL( Q

i∈N(b)

qii)||pbb)),

27Riegler, Kirkelund, Manch´on, Fleury’13

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(33)

Message Passing (MP) Expressions

The beliefs have to satisfy the followingnormalization and marginalization constraints P

θi

qii) = 1,∀i∈ IMF\ IBP, P

θa

qaa) = 1,∀a∈ ABP, qii) = P

θai

qaa), ∀a∈ ABP,i∈ N(a).

Thefixed point equations of the constrained optimization of the approximate VFE:

qii) =zi Q

a∈NBP(i)

mBPa→ii) Q

a∈NMF(i)

ma→iMFi),= Product of incoming beliefs

ni→ai) = Q

c∈NBP(i)\a

mc→ii) Q

d∈NMF(i)

md→ii), = variable to factor nodes

mMFa→ii) = exp(<lnpaa)> Q

j∈N(a)\i

nj→aj)), <>q is the expectation w.r.t q

mBPa→ii) =<paa)> Q

j∈N(a)\i

nj→aj) factor to variable nodes

(2)

Expectation Propagation (EP): The constraints in BFE can often be too complex to yield computationally tractable messages, the following constraint relaxation leads to EP28.

Eqa(t(θi)) = Eqi(t(θi) mBPa→ii) =Projφ(R qaa)Q

j∈N(a),6=ij)

ni→ai) ,qaa) = z1

apaa)Q

j∈N(a)nj→aj) whereφrepresents the family of distributions characterized by the sufficient statisticst(θi).

28Minka’01

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(34)

What do the MP Expressions Indicate?

BP-MF combo = alternating optimization of Lagrangian29: L=FBP,MF+P

a

γa[P

θa

qaa)1] +P

i

γi[P

θi

qii)1] +P

i

P

a∈N(i)

P

θi

λaii)[qii) P

θai

qaa)].

At any iteration or convergence:

qaa) =paa)( Q

i∈N(a)

qii) exp [−λaii)]) exp [γa1] = z1

apaa) Q

i∈N(a)

qii) ma→ii)

| {z }

ni→ai)

,a∈ ABP

qii) = exp

|NBP(i)| −1 + IIMF\IBP(i)γi

| {z }

1/zi

Q

a∈NMF(i)

exp(<lnpaa)>qjj),j∈N(a)\i)

| {z }

mMFa→ii)

Q

a∈NBP(i)

exp(λaii))

| {z }

mBPa→ii)

.

where IA(i) = indicator function fori∈ A.

Applying the marginalization constraintqii) = P

θai

qaa), ∀a∈ ABP leads to the expression for ma→iBP i) as in (2).

TheLagrange multipliersλai are indeed the log of the BP messagesandγa, γi lead to the normalization constantsza,zi for the beliefsqaa),qii), respectively.

λaii) = lnmBPa→ii).

29Yedidia, Freeman, Weiss’05

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(35)

Outline

1

Introduction

2

Static SBL

3

Combined BP-MF-EP Framework

4

Posterior Variance Prediction: Bayes Optimality

5

Performance Analysis of Approximate Inference Techniques (mCRB)

6

Dynamic SBL

7

Kronecker Structured Dictionary Learning using BP/VB

8

Numerical Results and Conclusion

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(36)

SBL using BP: Predictive Posterior Variance Bayes Optimality

. . .

. . .  

 

 

  . . . . . .

 

 

  . . . . . . .

. .

. . .

. . . . . .

  Illustration of Message Passing 

   

Figure 6:Factor Graph for the static SBL. Dark square nodes are the factor nodes and circle nodes represent the variable nodes.

All the messages (beliefs or continuous pdfs) passed between them are all Gaussian30. So in message passing (MP), it suffices to represent them by two parameters, which are themean and variance of the beliefs.

We representσn,k−2as the inverse variance (precision) of the message passed from variable noden (corresponding toxn) to factor nodek (corresponds toyk) andbxn,kbe the mean of the message passed fromntok, totalNM of them.

Similarlyσ−2k,n,bxk,nfor messages fromk ton.

35Tan, Li’10

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(37)

SBL using BP: Message Passing Expressions

We start with the MP expressions derived in36. Define the matrixSwith entriesσk,n−2. The Gaussian beliefs are parameterized asmk→n(xn) =N(xn;bxk,n, σk,n2 ) andnn→k(xn) =N(xn;xbn,k, σ2n,k).

Interpretation ofmn→k(xn):Bayesian information combining: At variable noden, we have

xbn=

bx1,n

.. . xbN,n

=

1 .. . 1

xn+N(xen;0,diag(S:,n)−1) with prior N(xn; 0, ξ−1n ).

xn,bxk,n(bxnexcludingxbk,n) are jointly Gaussian and hence lead to ”extrinsic” ”posterior” message for nodek:

bxn,k=σn,k2 P

i6=k

σi−2,nbxi,n, σn,k−2=ξn+P

i6=k

σi,n−2.

Interpretation ofmk→n(xn):Interference Cancellation: Substitutingxm=bxm,k+exm,k(”extrinsic”

information from variablesm6=nfor measurementk) inyk=P

m

Ak,mxm+vk leads to the 1-1 measurement

(ykP

m6=nAk,mbxm,k) =Ak,nxn+ (vn+P

m6=nAk,mxem,k), with total ”noise”vn+P

m6=nAk,mexm,kof varianceγ−1+P

m6=nA2k,mσm,k2 . So the (deterministic) estimate and variance from this measurement by itself are

xbk,n=A−1k,n(ykP

m6=nAk,mxbm,k) andσk,n−2=A2k,n(1γ+P

m6=n

A2k,mσm,k2 )−1.

36Tan, Li’10

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(38)

SBL using BP: MP Expressions in Matrix Form

Posterior marginals: xn,xbf are jointly Gaussian and hence MMSE estimate leads to the messages CN(xn;xbn, σ2n): σ2n= (ξn+P

iσ−2i,n)−1, bxnn2(P

iσi−2,nxbi,n).

In matrix form(S0,M0 of dimensionM×N,S,M of dimensionN×M with entries σn,k−2,xbn,k, σ−2k,n,bxk,n,respectively):

S0=Ξ1M1TN+ST(1N1TN−IN),

L= diag(STM)1N1TN−(S◦M)T,M0=S0inv◦L,Mn,k0 =bxn,k.

Similarly, for the messages at the factor nodes, defineCto be the matrix with entriesA2k,nσ2k,n(◦

represents Hadamard (element-wise) product,Ainv denotes element-wise inverse.) C=

1

γIN+ diag(BS0inv)

(1N1TM)−B◦S0invT, S=Cinv◦Binv,B=A◦A, V= (y−diag(AM0)1N)1TM+A◦M0T, M=Ainv◦V,

Computational complexityO(dMN),d M,N.

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(39)

Existing Convergence Conditions of Gaussian BP

In loopy GaBP, if the mean of the posterior belief converges, it converges to the true posterior37. Independently analyzed in38.

Posterior variances (if initialized with values> 0) always converge to a unique stationary point, but need not to the true posterior variance.

Further in39show that the convergence condition of GaBP can be shown to be to related to the spectral radius of a matrix|R|(element-wise absolute values), whereJ=I−R, with

J=γATA+Ξ, which is indeed the posterior precision matrix.

Diagonally dominantJis one such example which satisfies this condition.

37Rusmevichientong, Van Roy’01

38Weiss, Freeman’01

39Malioutov, Johnson, Willsky’06

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

(40)

Existing Convergence Conditions of Gaussian BP (Cont’d)

In40shows that depending on the underlying graphical structure (Gaussian Markov Random Field (GMRF) or factor graph based factorization) Gaussian BP (GaBP) may exhibit diffferent convergence properties.

They prove that the convergence condition for the mean provided based on the factor graph representation encompasses much larger class of models than those given by the GMRF based walk-summable condition41.

GaBP always converges if thefactor graph is a union of a single loop and a forest (a forest is a disjoint union of trees).

Moreover, they also analyze the convergence of the inverse of the message variances (message information matrix) and analytically show that with arbitrary positive semidefinite matrix initialization,the message information matrix converges to a uniques positive definite matrix.

So we can conclude that for BP there is adecoupling between the dynamics of the variance updates and that of the mean updates.

(Generalized) approximate message passing (GAMP or AMP) or their variantvector approximate message passing (VAMP)exhibit convergence toBayes Optimal MMSEfori.i.d. orright orthogonally invariant matricesA.

40Du, Ma, Wu, Kar, Moura’18,41Malioutov, Johnson, Willsky’06

Nonlinear MMSE using Linear MMSE Bricks and Application to Compressed Sensing and Adaptive Kalman Filtering

Références

Documents relatifs

$UE TO THEIR LOW COST AND SMALL SIZE MICROSENSORS ARE OF GREAT INTEREST FOR MANY APPLICATIONS (OWEVER FEW ARE AS ACCURATE THAN THEIR CLASSICAL FORERUNNERS SO THAT THEIR OUTPUT

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The results are used to prove that the filter associated with some nonlinear filtering problem with manifold-valued time dependent system and time depen- dent observation

In this context, we investigate the MMSE filtering performance of a dual-hop amplify-and- forward channel MIMO multiple-access channel and we propose a tight lower bound for the

In this regard, the contribution in [18] analyzed the MMSE filtering performance of Dual-Hop Amplify and Forward (DH-AF) multiple access channels and the authors in [15] extended

It is shown that attackers can agree on a data-injection vector construc- tion that achieves the best trade-off between distortion and detection probability by sharing only a

In this paper, a new linearization algorithm of Power Amplifier, based on Kalman filter- ing theory is proposed for obtaining fast convergence of the adaptive digital predistor-

On Fig.2 is represented the influence of the channel frequency correlation on the BER for the transmission of 10 active users with equal power signals using the three different