• Aucun résultat trouvé

Compressed sensing with uncertainty. The Bayesian estimation perspective

N/A
N/A
Protected

Academic year: 2021

Partager "Compressed sensing with uncertainty. The Bayesian estimation perspective"

Copied!
5
0
0

Texte intégral

(1)

HAL Id: hal-01245392

https://hal.archives-ouvertes.fr/hal-01245392

Submitted on 3 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Compressed sensing with uncertainty. The Bayesian estimation perspective

Stéphanie Bernhardt, Remy Boyer, Sylvie Marcos, Pascal Larzabal

To cite this version:

Stéphanie Bernhardt, Remy Boyer, Sylvie Marcos, Pascal Larzabal. Compressed sensing with uncer-

tainty. The Bayesian estimation perspective. IEEE 6th International Workshop on Computational

Advances in Multi-Sensor Adaptive Processing (CAMSAP 2015), Dec 2015, Cancùn, Mexico. �hal-

01245392�

(2)

Compressed Sensing with Uncertainty - the Bayesian Estimation Perspective

St´ephanie Bernhardt, R´emy Boyer and Sylvie Marcos L2S

Universit´e Paris-Sud (UPS), CNRS, CentraleSupelec Gif-Sur-Yvette, France

{stephanie.bernhardt, remy.boyer, sylvie.marcos}@lss.supelec.com

Pascal Larzabal SATIE ENS Cachan Cachan, France

pascal.larzabal@satie.ens-cachan.fr

Abstract—The Compressed Sensing (CS) framework outper- forms the sampling rate limits given by Shannon’s theory. This gap is possible since it is assumed that the signal of interest admits a linear decomposition of few vectors in a given sparsifying Basis (Fourier, Wavelet, ...). Unfortunately in realistic operating systems, uncertain knowledge of the CS model is inevitable and must be evaluated. Typically, this uncertainty drastically degrades the estimation performance of sparse-based estimators in the low noise variance regime. In this work, the Off-Grid (OG) and Basis Mismatch (BM) problems are compared in a Bayesian estimation perspective. At first glance, we are tempted to think that these two acronyms stand for the same problem. However, by comparing their Bayesian Cram`er-Rao Bounds (BCRB) for the estimation of a L-sparse amplitude vector based on N measurements, it is shown that the BM problem has a lower BCRB than the CS one in a general context. To go further into the analysis we provide for i.i.d. Gaussian amplitudes and in the low noise variance regime an interesting closed-form expression of a normalized 2-norm criterion of the difference of the two BCRB matrices. Based on the analysis of this closed-form expression, we obtain two conclusions.

Firstly, the two uncertainty problems cannot be confused for a non-zero mismatch error variance and with finite N and L.

Secondly, the two problems turn to be similar for any mismatch error variance in the large system regime, i.e., for N, L → ∞ with constant aspect ratio N/L → ρ.

Keywords—Compressed Sensing with Uncertainty, Off-Grid ef- fect and Basis Mismatch, Bayesian estimation

I. I

NTRODUCTION

Compressed Sensing (CS) has been a hot research domain in the last decade [1], [2]. CS potentially outperforms the limit sampling rate predicted by classical sampling theory. This is done by exploiting the a priori knowledge that many natural measurement signals admit a sparse representation in a given redundant dictionary. In today’s world that has to face more and more data, this technique has opened many perspectives on a wide panel of application domains as for instance direction of arrival estimation, sparse radar, etc. The CS framework has thus driven the design of a plethora of sparse-based estimation algorithms as for instance OMP [3], CoSAMP, LASSO or BPDN [4], etc.

In the context of non-ideal CS, there exists two types of uncertainty. The first-one is the well-known Off-Grid (OG)

This work was supported by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM#

(Grant agreement no. 318306), and by the two following projects: ANR MAGELLAN and MI-TITAN.

problem [5], [6] which occurs in sparse estimation. In this case, the continuous estimation parameters cannot match with high probability the pre-fixed discretization of the parameter set. Consider a redundant dictionary obtained from the pre- fixed discretization: as the parameters of interest do not belong to the grid, it appears a ”quantization error”-type [7]. To take into account this effect the measurement vector, denoted by

˜

y, is given by a linear combination of few vectors extracted from the initial dictionary corrupted by an additive perturbation while the initial dictionary is known. The Basis Mismatched (BM) problem [8], [6] is described in the following way.

The measurement vector, denoted by y, is given by a linear combination of few vectors extracted from the initial dictionary but we dispose of an uncertain knowledge of the initial dictionary or equivalently it is assumed the knowledge of a corrupted dictionary. As a consequence, the sparse-based estimators exhibit very poor estimation accuracies [7] even if the noise variance is low and the support is perfectly estimated (or a priori known). Note that the interest of the CRB in the context of noisy CS with deterministic amplitudes and with a perfect knowledge of the support has been demonstrated in references [9], [10], [11]. In particular it is demonstrated that the CRB can be reached by some sparse-based estimators knowing only the support cardinality L and in the low variance regime. We think that this context is well adapted to the BM and OG problems. Based on these arguments, the main concern of this work is to compare these two types of uncertainty in the Bayesian estimation point of view, i.e. based on the derivation and the analysis of each BCRB associated with the OG and BM problems.

In the second section we present both uncertainty models, and derive the corresponding BCRB in section 3. First this BCRB is derived for the general case, then we consider the case of an i.i.d. gaussian amplitude vector in the low noise variance regime and finally we compare both bounds. In the fourth section we show by simulation the difference of both bounds.

II. C

OMPRESSED

S

ENSING MODELS WITH

U

NCERTAINTY

Assume that the dictionary is corrupted according to H ˜ = H + E where H is a random but time-invariant matrix [10]

whose entries are i.i.d. zero-mean and of variance 1/N where

N is the number of measurements i.e. the number of lines

of H. Matrix E modelizes the mismatched error following a

Gaussian pdf of zero-mean and of variance σ

e2

I. In addition,

(3)

we define the two following observations BM problem : y = Hx + n = Hx ˜ − Ex + n

known by the estimator: H ˜

whose entries are stochastic processes OG problem : ˜ y = Hx ˜ + n = Hx + Ex + n,

known by the estimator: H

whose entries are random but non-stochastic The two models are described on Fig. 1.

Fig. 1. (a)BM problem,(b)OG problem

In the two above models, the noise term Ex + n is data- dependent due to the mismatched error E. At first glance the two above observations seem to describe the same formalism and intuitively, we are tempted to conclude that the two considered frameworks are related to the same estimation problem. We will show in the sequel that this is a wrong intuition.

III. BCRB

WITH UNCERTAINTY

The VanTrees’ Bayesian bound [12] is a benchmark against which any Bayesian estimator can be compared. More pre- cisely, let ˆ x be an estimate of x, the BMSEs for each the two contexts of interest are

BMSE

OG

= E

˜y,x|H

kx − ˆ x(˜ y)k

2

≥ TrB

OG

(1) BMSE

BM

= E

y,H,x˜

x − ˆ x(y, H) ˜

2

≥ TrB

BM

(2) where for 1 ≤ i, j ≤ L,

[B

−1OG

]

ij

=

Var

∂ log p(y, x|H)

∂x

ij

, (3) [B

−1BM

]

ij

=

"

Var ∂ log p(y, H, ˜ x)

∂x

!#

ij

. (4) where Var (x) is the variance of x.

A. BCRB in case of the off-grid problem

The off-grid problem is described according to the follow- ing scenario: Estimate the amplitude vector x based on the observation ˜ y knowing the uncorrupted dictionary H. So, the conditional observation is

y|H, ˜ x ∼ N Hx, R = (σ

e2

||x||

2

+ σ

2

)I

. (5)

Using the Slepian-Bang formula [13], we obtain the fol- lowing Fisher Information Matrix (FIM):

F

OG

= H

T

H

σ

2e

||x||

2

+ σ

2

+ 2σ

e4

Nxx

T

2e

||x||

2

+ σ

2

)

2

(6) and thus the BCRB takes the following expression

B

OG

= H

T

HP (σ

2e

, σ

2

) + Σ(σ

e2

, σ

2

) + P

x

−1

(7) where P

x

= Var

logp(x)

∂x

and P (σ

e2

, σ

2

) = E

x

1

σ

2e

||x||

2

+ σ

2

, (8) Σ(σ

e2

, σ

2

) = E

x

e4

Nxx

T

2e

||x||

2

+ σ

2

)

2

. (9) B. BCRB in case of the Basis Mismatched (BM) problem

The Basis Mismatched (BM) problem is described accord- ing to the following scenario: Estimate the amplitude vector x based on the observation y knowing the corrupted dictionary H. The log-joint pdf is given by ˜

log p(y, H, ˜ x) = log p(y| H, ˜ x) + log p( H) + log ˜ p(x) (10) where the conditional observation is given by

y| H, ˜ x ∼ N

Hx, ˜ (σ

2e

||x||

2

+ σ

2

)I

. (11)

Observe that ∂ log p( H)/∂x ˜ vanishes [14] and thus the BCRB matrix is given by

B

BM

=

E

x,H˜

(F

BM

) + P

x

−1

(12) where the FIM F

BM

is given by the Slepian-Bang formula according to

F

BM

=

H ˜

T

H ˜

σ

e2

||x||

2

+ σ

2

+ 2σ

4e

N xx

T

2e

||x||

2

+ σ

2

)

2

. (13) Noting that x and H ˜ are two multidimensional independent processes, we have

E

x,H˜

(F

BM

) = E

x

E

( H ˜

T

H) ˜ σ

2e

||x||

2

+ σ

2

+ E

x

e4

Nxx

T

2e

||x||

2

+ σ

2

)

2

(14) where E

( H ˜

T

H) = ˜ H

T

H + σ

e2

I. Finally, we obtain

B

BM

= H

T

H + σ

e2

I

P (σ

e2

, σ

2

) + Σ(σ

2e

, σ

2

) + P

x

−1

. (15) C. Comparaison

Result 3.1: From expressions (7) and (15), we deduce B

BM

< B

OG

(16) for σ

2e

6= 0.

Proof: The proof is direct considering that all the matrices involved in the two bounds are positive and definite.

Remark that we verify the trivial property that if σ

2e

→ 0, then B

BM

= B

OG

. The above result shows that for a given noise power, the BM problem exhibits a better accuracy than the OG problem. In the following sections, we provide qualitative analytic expressions to precisely characterize the

”distance” between the two studied uncertainty problems. To

reach this goal we add some realistic assumptions.

(4)

D. Closed-form expressions for i.i.d. Gaussian amplitude vec- tor in the low noise variance regime

In this section,

A

1

. We consider the low noise variance regime since it is well known that the uncertainty context including the Off-Grid and Basis Mismatched problems appears only in the regime where the noise variance is suffi- ciently low to be dominated by σ

e2

. This fact directly implies that any sparse-based estimator in this context cannot be statistically efficient. As this property is a highly desired feature in the context of the estimation theory, the reader can measure the importance of the analysis of this context. At contrary, when the noise variance is high with respect to σ

e2

, the uncertainty context cannot be measured and thus can be ignored.

A

2

. The last assumption is to assume that the L ampli- tudes belonging to the support, i.e. taking non-zeros values follow an i.i.d. centered Normal distribution with variance σ

x2

/L. So, the sparse amplitude vector, denoted by ˜ x, is composed by K −L zeros-values and L random non-zero amplitudes. In this case, the prior matrix of the BIM is given by P

x

=

σL2

x

I

L

. Lemma 3.2: In the low noise variance regime, we have

lim

σ2→0

P(σ

2e

, σ

2

) = L

(L − 2)σ

2e

σ

2x

, (17) lim

σ2→0

Σ(σ

2e

, σ

2

) = 2N

(L − 2)σ

2x

I

L

. (18) Proof: For centered i.i.d. Gaussian amplitudes of variance σ

x2

, we have 1/||x||

2

∼ Inv − χ

2L

where Inv − χ

2L

stands for an Inverse Chi-squared distribution of L degrees of freedom.

Thus, for L > 2, E

x 1

||x||2

=

σ2 L

x(L−2)

. Using the fact that lim

σ2→0

P (σ

e2

, σ

2

) =

σ12

e

E

x 1

||x||2

we obtain expression (17).

It is easy to see that E

xxxT

||x||4

is a diagonal matrix pro- portional to the identity matrix, i.e. E

xxxT

||x||4

= t I

L

. Note that Tr E

xxxT

||x||4

= E

x 1

||x||2

=

(L−2)σL 2

eσ2x

. Thus, we obtain t =

1

(L−2)σ2x

. Using the fact that lim

σ2→0

Σ(σ

e2

, σ

2

) = 2N E

xxx

T

||x||4

we obtain expression (18).

Using the above Lemma, we can give the following result.

Result 3.3: For i.i.d. Gaussian amplitude vector and in the low noise variance regime, the traces of the BCRB matrices are given by

TrB

OG

= σ

2x

(L − 2)

L Tr

"

H

T

H

σ

e2

+ S

N,L

I

−1

#

, (19)

TrB

BM

= σ

2x

(L − 2)

L Tr

" H

T

H

σ

e2

+(S

N,L

+ 1)I

−1

#

. (20)

respectively, where S

N,L

= 2

NL

− 2 + L.

E. Comparaison of the two bounds

Based on the above Result, we can easily compare the two bounds. Define the following normalized 2-norm criterion:

η

def.

=

B

−1BM

(B

BM

− B

OG

)

2

. (21) To compare B

BM

and B

OG

the 2-norm of the difference B

BM

− B

OG

is naturally used. However this 2-norm can be numerically small even for distant bounds due to the small value of B

BM

and B

OG

. Thus, to improve the robustness of the 2-norm criterion, we normalize the difference by B

−1BM

which takes large values when the bounds are too small to obtain the criterion in (21).

Result 3.4: The normalized 2-norm criterion introduced in (21) takes the following simple form:

η = σ

e2

λ

min

(H

T

H) + σ

e2

S

N,L

(22) where λ

min

(·) stands for the minimal singular values.

Proof: Taking into account B

BM

=

B

−1OG

+

σ2 L

x(L−2)

I

−1

and using the definition of the 2-norm involving the maximal singular value, denoted by λ

max

(·), we obtain

η = L

σ

2x

(L − 2) kB

OG

k

2

= λ

max

1

σ

2e

H

T

H + S

N,L

I

−1

!

.

Now using the fact λ

max

Q

−1

= 1/λ

min

(Q) in the above expression, we obtain (22).

1) Analysis of η wrt. σ

2e

with finite N and L: As intuitively expected, one can verify lim

σ2

e→0

η = 0. For growing σ

2e

, recalling that as L < N we have λ

min

(H

T

H) > 0, expression (22) is a monotonically increasing function. This result means that the two considered uncertainty problems are not equivalent for sufficient σ

e2

with finite N and L.

2) Asymptotic behavior of η in the large system scenario : In the Random Matrix Theory (RMT) [15], [16], [17], it is assumed that N, L → ∞ with N/L → ρ. It is also usual to assume in the CS framework that dictionary H is a random and time-invariant matrix whose entries follow a given distribution with zero-mean and variance 1/N. This construction verifies the RIP with high probability [2], [1]. We can now give the following result:

Result 3.5: For any σ

e2

and if N, L → ∞ with N/L → ρ, then η → 0. This means that the OG and BM problems are identical in the Bayesian estimation point of view in the large system scenario.

Proof: It is well-known that λ

min

(H

T

H) → 1 − q

1 ρ

2

in almost sure convergence [15], [16], [17], meaning that

λ

min

(H

T

H) converges to a finite value. Now, remark that

sequence S

N,L

> L due to the fact that for avoiding the

singularity of the FIM we have L < N . So, for infinite

L, sequence S

N,L

diverges. Thus, criterion given in (22)

converges to 0 for any σ

2e

.

(5)

IV. N

UMERICAL ILLUSTRATIONS

In this section, we assume i.i.d. Gaussian amplitude vector and the low noise variance regime. On Fig. 2, it is compared the trace of expressions (7) and (15) obtained thanks to 2000 Monte-Carlo trials to the expressions (20) and (19) obtained under the assumption of low noise variance. We can verify the good agreement of our approximations in the low noise variance regime. It is also possible to numerically check Result (16). In addition, we can see that the two bounds tend to be very close for small σ

2e

, as intuitively expected.

10−3 10−2 10−1 100 101

10−3 10−2 10−1 100

σe 2 [log scale]

Tr−BCRB [log scale]

OG in (7) BM in (15) OG in (21) BM in (22)

Fig. 2. Trace-BCRB Vs.σ2e withN = 20measurements,L= 4non-zero amplitudes withσ2x= 1. Expressions (7) and (15) are obtained thanks to 2000 Monte-Carlo trials.

On Fig. 3 are compared expression (21) obtained by means of Monte-Carlo simulations and expression (22). As noticed before, quantity η increases when the error variance σ

e2

increases. Furthermore, Fig. 3 shows that expression (22) corresponds to the observed Monte Carlo results.

100−4 10−3 10−2 10−1

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

σ e 2 [log scale]

η

Exp. (23) Exp. (24)

Fig. 3. CriterionηVs.σe2.

V. C

ONCLUSION

In this work, we compare two non-ideal CS frameworks in a Bayesian estimation perspective. More precisely, we focus our study on two well-known uncertainty models known under the acronyms of Off-Grid (OG) and Basis Mismatch (BM) problems. These two problems degrade drastically the estima- tion accuracy of any sparse-based estimator even if the noise variance is low and the support set of the non-zero amplitudes is perfectly estimated or a priori known. At first glance, the OG and BM problems seem sharing more similarities than

differences. In this work, we first show that the BM problem has a better estimation accuracy than the OG problem in a general context. In a second part, we provide a closed- form expression of a normalized 2-norm for i.i.d. Gaussian amplitude vector and in the low noise variance regime. This closed-form expression allows us to precisely quantify the difference between these two problems and show that these two problems cannot be confused even for small mismatch error variance and for a finite number of measurements and support cardinality. Finally, we also show that these two problems turn to be identical in the Bayesian estimation perspective if the number of measurements and support cardinality grow to infinity with a constant ratio.

R

EFERENCES

[1] D. L. Donoho, “Compressed sensing,”Information Theory, IEEE Trans- actions on, vol. 52, no. 4, pp. 1289–1306, 2006.

[2] E. Candes and M. Wakin, “An introduction to compressive sampling,”

Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21–30, March 2008.

[3] Y. C. Pati, R. Rezaiifar, Y. C. Pati, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” pp. 40–44, 1993.

[4] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basis pursuit,”SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–

61, 1998.

[5] H. Zhu, G. Leus, and G. Giannakis, “Sparsity-cognizant total least- squares for perturbed compressive sampling,”Signal Processing, IEEE Transactions on, vol. 59, no. 5, pp. 2002–2016, 2011.

[6] M. Herman and T. Strohmer, “General deviants: An analysis of pertur- bations in compressed sensing,”Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp. 342–349, 2010.

[7] S. Bernhardt, R. Boyer, B. Zhang, S. Marcos, and P. Larzabal, “Perfor- mance analysis for sparse based biased estimator: Application to line spectra analysis,” inSensor Array and Multichannel Signal Processing Workshop (SAM), invited paper, 2014 IEEE 8th. La Coruna, Spain:

IEEE, 2014, pp. 365–368.

[8] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” Signal Processing, IEEE Transac- tions on, vol. 59, no. 5, pp. 2182–2195, 2011.

[9] Z. Ben-Haim and Y. Eldar, “The cram`er-rao bound for estimating a sparse parameter vector,” Signal Processing, IEEE Transactions on, vol. 58, no. 6, pp. 3384–3389, June 2010.

[10] R. Niazadeh, M. Babaie-Zadeh, and C. Jutten, “On the achievability of cramer–rao bound in noisy compressed sensing,”Signal Processing, IEEE Transactions on, vol. 60, no. 1, pp. 518–526, 2012.

[11] B. Babadi, N. Kalouptsidis, and V. Tarokh, “Asymptotic achievability of the cram´er–rao bound for noisy compressive sampling,” Signal Processing, IEEE Transactions on, vol. 57, no. 3, pp. 1233–1236, 2009.

[12] E. L. Lehmann and G. Casella,Theory of point estimation. Springer Science & Business Media, 1998, vol. 31.

[13] P. Stoica and R. L. Moses, Spectral analysis of signals. Pear- son/Prentice Hall Upper Saddle River, NJ, 2005.

[14] A. Wiesel, Y. Eldar, and A. Yeredor, “Linear regression with gaussian model uncertainty: Algorithms and bounds,”Signal Processing, IEEE Transactions on, vol. 56, no. 6, pp. 2194–2205, June 2008.

[15] R. Couillet and M. Debbah, Random matrix methods for wireless communications. Cambridge University Press, 2011.

[16] W. Hachem, P. Loubaton, X. Mestre, J. Najim, and P. Vallet, “Large information plus noise random matrix models and consistent subspace estimation in large sensor networks,” Random Matrices: Theory and Applications, vol. 1, no. 02, 2012.

[17] A. M. Tulino and S. Verd´u, “Random matrix theory and wireless communications,”Commun. Inf. Theory, vol. 1, no. 1, pp. 1–182, Jun.

2004.

Références

Documents relatifs

This  study  employs  a  traditional  Granger‐causality  test.  However,  this  paper  is  different 

Due to the discrete nature of the charge carriers, the current presents some fluctuations (noise) known as shot noise , which we aim to characterize here (not to be confused with

We have shared large cross-domain knowledge bases [4], enriched them with linguistic and distributional information [7, 5, 2] and created contin- uous evolution feedback loops

K¬rmac¬, Inequalities for di¤erentiable mappings and applications to special means of real numbers and to midpoint formula, Appl.. Ion, Some estimates on the Hermite-Hadamard

In the first part, by combining Bochner’s formula and a smooth maximum principle argument, Kr¨ oger in [15] obtained a comparison theorem for the gradient of the eigenfunctions,

Irrespective of the standard random matrix tools, this approach is extended in [1] to the more general case when X 0 and Z are tensors of order d ≥ 3; namely, if the Frobenius norm

(iii) As already stated in order to evaluate a model’s ability to detect out of sample instances with high uncertainty we utilised entropy on the predictions of a model to

Intuitively, wherever the gradient of a function u is small, the function will be Lipschitz, so we should not need any further information from the equation at those points in order