• Aucun résultat trouvé

Large deviation analysis of the CPD detection problem based on random tensor theory

N/A
N/A
Protected

Academic year: 2022

Partager "Large deviation analysis of the CPD detection problem based on random tensor theory"

Copied!
6
0
0

Texte intégral

(1)

HAL Id: hal-01572144

https://hal-centralesupelec.archives-ouvertes.fr/hal-01572144

Submitted on 4 Aug 2017

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Large deviation analysis of the CPD detection problem based on random tensor theory

Remy Boyer, Philippe Loubaton

To cite this version:

Remy Boyer, Philippe Loubaton. Large deviation analysis of the CPD detection problem based on

random tensor theory. 25th European Signal Processing Conference (EUSIPCO 2017), Aug 2017, Kos

Island, Greece. �10.23919/eusipco.2017.8081289�. �hal-01572144�

(2)

LARGE DEVIATION ANALYSIS OF THE CPD DETECTION PROBLEM BASED ON RANDOM TENSOR THEORY

R´emy Boyer

L2S - Department of Signals and Statistics Universit´e Paris-Sud, CNRS, CentraleSupelec

France

remy.boyer@l2s.centralesupelec.fr

Philippe Loubaton

LIGM

Universit´e de Marne la Vall´ee France

philippe.loubaton@u-pem.fr

ABSTRACT

The performance in terms of minimal Bayes’ error probabil- ity for detection of a random tensor is a fundamental under- studied difficult problem. In this work, we assume that we observe under the alternative hypothesis a noisy rank-Rten- sor admitting aQ-order Canonical Polyadic Decomposition (CPD) with large factors of sizeNq×R,i.e., for1≤q≤Q, R, Nq → ∞with R1/q/Nq converges to a finite constant.

The detection of the random entries of the core tensor is hard to study since an analytic expression of the error probability is not easily tractable. To mitigate this technical difficulty, the Chernoff Upper Bound (CUB) and the error exponent on the error probability are derived and studied for the considered tensor-based detection problem. These two quantities are re- lied to a key quantity for the considered detection problem due to its strong link with the moment generating function of the log-likelihood test. However, the tightest CUB is reached for the value, denoted bys?, which minimizes the error expo- nent. To solve this step, two methodologies are standard in the literature. The first one is based on the use of a costly numer- ical optimization algorithm. An alternative strategy is to con- sider the Bhattacharyya Upper Bound (BUB) fors? = 1/2.

In this last scenario, the costly numerical optimization step is avoided but no guaranty exists on the optimality of the BUB.

Based on powerful random matrix theory tools, a simple ana- lytical expression ofs?is provided with respect to the Signal to Noise Ratio (SNR) and for low rank CPD. Associated to a compact expression of the CUB, an easily tractable expres- sion of the tightest CUB and the error exponent are provided and analyzed. A main conclusion of this work is that the BUB is the tightest bound at low SNRs. At contrary, this property is no longer true for higher SNRs.

1. INTRODUCTION

Detection of random parameters in noise is well-known to be a difficult problem. Indeed, the optimal Bayes’ decision rule can often only be derived at the price of a costly numerical computation of the log posterior-odds ratio and an exact cal-

culation of the minimal Bayes’ error probability is often in- tractable [1,2]. This is particularly true in the context of the under-studied tensor detection problem,i.e., when the param- eters of interest are multidimensional and random. Specif- ically, the context of this work is about random core ten- sor detection when the noise-free tensor follows a Canonical Polyadic Decomposition (CPD) [17] with large factors. Fol- lowing the same methodology presented in [10] for the detec- tion of one-dimensional data, we exploit well-known geome- try information divergence [9,11]. The Chernoff upper bound (CUB) is an upper bounds on the minimal Bayes’ error proba- bility and the error exponent characterizes the asymptotic ex- ponentially decay of the Bayes’ error probability. These two metrics turn out to be useful in many problems of practical importance as for instance, distributed sparse detection [13], sparse support recovery [14], energy detection, MIMO radar processing, network secrecy [15], Angular Resolution Limit in array processing [16], detection performance for informed communication systems,etc.. The theory of low rank CPD is a timely and important research topic [3,12]. This class of tensor-based model is useful to extract and analyze relevant information confined into a small dimensional subspace from a massive volume of measurements.

The Random Matrix Theory (RMT) provides a powerful for- malin to study the asymptotic performance limits of a large scale system [4,6]. While it exists a plethora of well-known results for linear systems, there is a lack of results forstruc- turedlinear systems encounter with the CPD.

2. RANDOM TENSOR DETECTION 2.1. CPD and noisy structured linear system 2.1.1. Preliminary definitions

The rank-RCPD of orderQis defined according to X =

R

X

r=1

sr

φ(1)r ◦. . .◦φ(Q)r

| {z }

Xr

with rankXr= 1

(3)

where◦is the outer product [12],φ(q)rRNq×1andsris a real scalar. An equivalent formulation using the mode product [12] is

X =S×1Φ1×2. . .×QΦQRN1×...×NQ (1) where ×q stands for the q-mode product, S is the R × . . . × R diagonal core tensor with [S]r,...,r = sr and Φq = [φ(q)1 . . .φ(q)R ]is theq-th factor matrix of sizeNq×R.

Theq-mode unfolding matrix for tensorX is given by X(q)qS(ΦQ. . .Φq+1Φq−1. . .Φ1)T (2) whereS = diag(s)withs = [s1. . . sR]T andstands for the Khatri-Rao product.

2.1.2. Vectorized CPD

Assume that the multidimensional measurement tensor fol- lows a noisyQ-order tensor of sizeN1×. . .×NQgiven by

Y=X +N (3) whereN is the noise tensor where each entry is assumed to be centeredi.i.d. Gaussian, i.e. [N]n1,...,nQ ∼ N(0, σ2) and the noise-free tensorX follows a rank-RCPD defined in eq. (1). We assume that each diagonal entry of the core tensor is centeredi.i.d. Gaussian,i.e. [S]r,...,r = sr ∼ N(0, σ2s).

LetN =N1· · ·NQ. The vectorization of eq. (3) is given by yN = vecY(1)=x+n (4) wheren = vecN(1) andx = vecX(1). Using eq. (2) and propertyvec{Adiag{c}BT}= (BA)c, we obtain

x= vec

Φ1S(ΦQ. . .Φ2)T =Φs

whereΦ=ΦQ. . .Φ1is aN×Rstructured matrix. At this point, it is important to note that the CPD formalism im- plies that vectorxin eq. (4) is related to the structured linear systemΦ.

2.2. Binary hypothesis test formulation for structured lin- ear system

2.2.1. Formulation based on aSNR-type criterion

LetSNR =σs22andpi(·) =p(·|Hi)withi∈ {0,1}. The equi-probable binary hypothesis test for the detection of the random signal,s, is

H0:p0(yN;Φ,SNR = 0) =N(0,Σ0),

H1:p1(yN;Φ,SNR6= 0) =N(0,Σ1) (5) whereΣ02IN andΣ12

SNR·ΦΦT +IN . The data-space for the null hypothesis (H0) is given byX0=X \ X1where

X1=

yN : Λ(yN) = logp1(yN) p0(yN) > τ0

is the data-space for the alternative hypothesis (H1). In the above test,Λ(yN)is the log likelihood ratio test andτ0is the detection threshold given by the following two expressions:

Λ(yN) = yTNΦ

ΦTΦ+ SNR·I−1

ΦTyN

σ2 ,

τ0=−log det

SNR·ΦΦT +IN

wheredet(·)andlog(·)stand for the determinant and the nat- ural logarithm, respectively.

2.2.2. Geometry of the expected log-likelihood ratio Considerp(yN

H) =ˆ N(0,Σ)associated to the estimated hypothesisH. The expected log-likelihood ratio is given byˆ

EyN

HˆΛ(yN) = Z

X

p(yN

H) logˆ p1(yN) p0(yN)dyN

=KL( ˆH||H0)− KL( ˆH||H1)

= 1 σ2Tr

ΦTΦ+ SNR·I−1

ΦTΣΦ

where

KL( ˆH||Hi) = Z

X

p(yN

H) logˆ p(yN

H)ˆ pi(yN) dyN

is the Kulbback-Liedbler Divergence (KLD) [9]. The ex- pected log-likelihood ratio test admits to a simple geomet- ric characterization based on the difference of two KLDs [2].

But, the performance of the detector of interest in terms of the minimal Bayes’ error probability, denoted byPe(N), is quite often difficult to determine analytically [1,2].

3. CHERNOFF UPPER BOUND (CUB) AND ERROR EXPONENT

Define the minimal Bayes’ error probability conditionally to vectoryN according to

Pr(Error|yN) =1

2min{P1,0, P0,1}

wherePi,i0 = Pr(Hi|yN ∈ Xi0). The (average) minimal Bayes’ error probability defined byPe(N) =EPr(Error|yN) is upper bounded according to the CUB [11] such as

Pe(N)≤ 1

2·exp[−µN(s)] (6) where the (Chernoff)s-divergence fors∈(0,1)is given by

µN(s) =−logMΛ(yN|H1)(−s) (7) in whichMX(t) = Eexp[t·X] is the moment generating function (mgf) of variableX. The error exponent, denoted by

(4)

µ(s), is given by the Chernoff information is an asymptotic characterization on the exponentially decay of the minimal Bayes’ error probability. The error exponent is derived thanks to the Stein’s lemma according to [18]

− lim

N→∞

logPe(N)

N = lim

N→∞

µN(s) N

def.= µ(s).

As parameters ∈ (0,1)is free, the CUB can be tighter thanks the unique minimizer:

s?= arg min

s∈(0,1)µ(s). (8)

Finally using eq. (6) and eq. (8), we obtain the Cher- noff Upper Bound (CUB). The Bhattacharyya Upper Bound (BUB) is obtained by eq. (6) and by fixings= 1/2instead of solving eq. (8). We have the following relation of order:

Pe(N)≤ 1

2 ·exp[−µN(s?)]≤1

2 ·exp[−µN(1/2)].

3.1. Error exponent expression based on the RMT 3.1.1. The CUB and the error exponent

In this section, we first recall in the following Lemma the closed-form expression of the CUB for the test of eq. (5).

Next, the error exponent,µ(s), is derived in Result 3.2 in the RMT context.

Lemma 3.1 ( [10]) The log-mgf given by eq.(7)for test of eq.(5)is given by

µN(s) = 1−s

2 log det

SNR·ΦΦT +I

−1

2log det

SNR·(1−s)ΦΦT +I . In the following, we assume that matrices (Φq)q=1,...,Q

are random matrices with GaussianN(0,N1

q)entries, an eval- uate the behaviourµN(s)/Nwhen(Nq)q=1,...,Qconverge to- wards+∞at the same rate and that NR converges towards a non zero limit.

Result 3.2 In the asymptotic regime whereN1, . . . , NQcon- verge towards+∞at the same rate and whereR→+∞in such a way thatcR= NR converges towards a finite constant c >0, it holds that

µN(s) N

−→a.s µ(s)

= 1−s

2 Ψc(SNR)−1

c((1−s)·SNR) (9) witha.sstanding for ”almost sure convergence” and

Ψc(x) = log

1 + 2c

u(x) + (1−c)

+c·log

1 + 2

u(x)−(1−c)

− 4c

x(u(x)2−(1−c)2) (10)

withu(x) = 1x+q

(1x+c)(1xc)whereλ±c = (1±

√c)2.

Proof See Appendix 6.1.

3.1.2. Approximated analytical expressions forc1 For low rank CPD we haveRNand thus it is realistic to assumec1.

Result 3.3 In this context, the error exponent can be approx- imated according to

µ(s)c1≈ c 2

(1−s) log(1 + SNR)−log(1 + (1−s)SNR) .

Proof See Appendix 6.2.

As the second-order derivative ofµ(s)is strictly positive, µ(s)is a strictly convex function over interval(0,1). In ad- dition, as a strictly convex function has at most one global minimum, we deduce that the stationary points?is a global minimizer and is given by zeroing the first-order derivative of the error exponent. This optimal value is given by

s? c1≈ 1 + 1

SNR− 1

log(1 + SNR). (11) We can identify the two following limit scenarios:

• At lowSNR, the error exponent associated with the tight- est CUB, denoted byµ(s?), coincides with the error expo- nent associated with the BUB. Indeed, the optimal value in eq. (11) admits a second-order approximation forc 1according to

s?2 1 + 1 SNR

1−

1 +SNR 2

=1 2.

Using Result 3.2 and the above approximation, the best error exponent at lowSNRand forc1is given by

µ 1

2

SNR1

≈ 1

c1(SNR)−1 2Ψc1

SNR 2

= c 2log

√1 + SNR 1 +SNR2 .

• At contrary forSNR → ∞, we haves? → 1. So, the error exponent associated to BUB cannot be considered as optimal in this regime. Using eq. (11) in Result 3.3 and assuming that log SNRSNR → 0, the optimal error exponent for largeSNRcan be approximated according to

µ(s?)SNR1≈ c

2(1−log SNR + log log(1 + SNR)).

(5)

4. NUMERICAL SIMULATIONS

In this simulation part, we consider a cubic tensor of order Q = 3withN1 = N2 = N3 = 100. The factorsΦ12

andΦ3are generated as a singlei.i.d.Gaussian realization of rankR= 20. We can checkc= 2e−51. In Fig. 1, it is drawn parameters?with respect to theSNRin dB. The pa- rameters?is obtained thanks to three different methods. The first one is based on the brut force/exhaustive computation of the CUB thanks to a discretization of thesparameter over a fine grid. This approach has two drawbacks. First,s?cannot be estimated with an accuracy depending to the grid preci- sion. The second problem is that this exhaustive procedure has a very high computational cost especially in our asymp- totic context. The second approach is based on the numerical optimization of the closed-form ofµ(s)given in Result 3.3.

In this scenario, the drawback in terms of the computational cost is mitigated but the precision drawback due to the grid design remains. Finally, the last approach is based on the an- alytic expression given in eq. (11) under the hypothesis that c1. This last strategy has a negligible computational cost and does not suffer from the grid precision limitation. We can check that the three methods coincide with a high accu- racy. We can also verify the limit values ofs?given in section 3.1.2 in the small and large SNR regimes. In Fig. 2, the same three scenarios are considered. Here again, we can observe the good agreement of the three approaches.

-20 -10 0 10 20 30 40 50

SNR [dB]

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

s

Numerical optimization Analytical expression Approx. forc1

BUB

Fig. 1. Optimals-parametervs.SNRin dB

5. CONCLUSION

The derivation and the analyze of the asymptotic performance in terms of minimal Bayes’ error probability for the detec- tion of a random parameters is addressed in this work. More precisely, we assume that under te alternative hypothesis, we observe a noisyQ-order tensor following a rank-RCPD with large factors and an unknown random core tensor of interest.

The term “large” means that the number of available measure- ments,N, and the number of desired random parameters,R,

-20 -10 0 10 20 30 40 50

SNR [dB]

-9 -8 -7 -6 -5 -4 -3 -2 -1 0

Errorexponent:µ(s)

×10-5

Expression in Lemma 2.1 Analytical expression Approx. forc1

Fig. 2. Error exponentvs.SNRin dB

grow jointly to infinity with an asymptotically constant ratio.

The CUB and the error exponent are proposed in closed-form.

In addition, it is provided analytical expressions of the opti- mal parametersfor which the CUB is a tight upper bound on the Bayes’ error probability.

6. APPENDIX 6.1. Proof of Result 3.2

Large random matrix theory allows to evaluate the asymptotic behavior of µNN(s) whenNq → +∞for eachq = 1, . . . , Q, R → +∞ in such a way that RN1/q

q converge towards a non zero constant for each q = 1, . . . , Q. In other words, N1, . . . , NQconverge towards+∞at the same rate (i.e. NNq

p

converge towards a non zero constant for each (p, q)), and cR = NR converges towards a constantc > 0. In this con- text, the empirical eigenvalue distribution of matrix ΦΦT converges towards a relevant Marcenko-Pastur distribution.

More precisely, we define the Marcenko-Pastur distribution µc(dλ)as the probability distribution given by

µc(dλ) =δ(λ) [1−c]++ q

λ−λc

λ+c −λ 2πλ 1

c+c](λ)dλ whereλc = (1−√

c)2andλ+c = (1 +√

c)2. The Stieltjes transform ofµc defined astc(z) = R

R+ µc(dλ)

λ−z is known to satisfy the equation

tc(z) =

−z+ c 1 +tc(z)

−1

.

Whenz ∈ R−∗, i.e. z = −ρ, withρ > 0, it is well known thattc(ρ)is given by

tc(−ρ) = 2

ρ−(1−c) + q

(ρ+λc)(ρ+λ+c)

(12)

(6)

It was established for the first time in [6] that if X repre- sents aM ×P random matrix with zero mean and M1 vari- ance i.i.d. entries, and if(λm)m=1,...,M represent the eigen- values of XXT arranged in decreasing order, then the so- called empirical eigenvalue distribution ofXXT defined as

1 M

PM

m=1δ(λ−λm)converges weakly almost surely towards µc in the asymptotic regime whereM → +∞,P → +∞,

P

M → c. In particular, for each continuous functionf(λ), it holds that

1 M

M

X

m=1

f(λm)−→a.s Z

R+

f(λ)µc(dλ). (13) In practice, this result means that if M and K are large enough, then the histogram of the eigenvalues of each real- ization ofXXT tends to accumulate around the graph of the probability density ofµc.

The columns(φr)r=1,...,R of Φ are vectors(φ(Q)r ⊗. . .⊗ φ(1)r )r=1,...,R. These vectors are mutually independent, iden- tically distributed, and satisfyE(φrφTr) = INN. However, the elements of Φ are not mutually independent because the components of each columnφrare not independent. In the asymptotic regime considered in this paper, the results of [8] (see also [5]) allow to establish that the empirical eigenvalue distribution ofΦΦT still converges almost surely towardsµc, where we recall that RN → c. Using (13) for f(λ) = log(1 +λ/ρ)as well as a well-known formula that allows to expressR

R+log(1+λ/ρ)µc(dλ)in terms oftc(−ρ) given by (12) (see e.g. [7]), we obtain the following result.

6.2. Proof of Result 3.3 We haveu(x)≈1 x1+

q

(1x+ 1)2= 2x+1andu(x)+(1−c)≈1 2 x1+ 1

,u(x)−(1−c)≈1 2x,u(x)2−(1−c)21 4x 1x+ 1 . Using the above first-order approximations, eq. (10) is Ψc1(x)≈1 c· x

1 +x+clog(1 +x)−c x

1 +x=clog(1 +x).

Using the above approximation and eq. (9), we obtain Re- sult 3.3.

REFERENCES

[1] S. M. Kay,Fundamentals of statistical signal process- ing: Detection theory, PTR Prentice-Hall, Englewood Cliffs, NJ, 1993.

[2] Y. Cheng, X. Hua, H. Wang, Y. Qin and X. Li, “The Ge- ometry of Signal Detection with Applications to Radar Signal Processing,”Entropy, 18(11), 381.

[3] P. Comon,“Tensors: a brief introduction,”IEEE Signal Processing Magazine, vol. 31, no. 3, pp. 44?53, 2014.

[4] Z. Bai, J.W. Silverstein, “Spectral analysis of large di- mensional random matrices”,Springer Series in Statis- tics, 2nd ed., 2010.

[5] A. Ambainis, A.W. Harrow and M.B. Hastings, “Ran- dom matrix theory: extending random matrix theory to mixtures of random product states”, Commun. Math.

Phys., vol. 310, no. 1, pp. 25-74 (2012)

[6] V. A. Marchenko,and L.A. Pastur, “Distribution of eigenvalues for some sets of random matrices”,Mat. Sb.

(N.S.), 72(114):4, 507–536, 1967.

[7] A. M. Tulino and S. Verdu,Random Matrix Theory and Wireless Communications, Hanover, MA, USA: Now Publishers Inc., Jun. 2004, vol. 1, no. 1.

[8] A. Pajor and L.A. Pastur, “On the Limiting Empiri- cal Measure of the sum of rank one matrices with log- concave distribution,Studia Math, 195 (2009), pp: 11- 29.

[9] T. M. Cover and J. A. Thomas,Elements of information theory, John Wiley & Sons, 2012

[10] R. Boyer and F. Nielsen, “Information Geometry Metric for Random Signal Detection in Large Random Sensing Systems,”IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), 2017 [11] F. Nielsen,“An information-geometric characterization

of Chernoff information,”IEEE Signal Processing Let- ters, vol. 20, no. 3, pp. 269-272, 2013.

[12] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q.

Zhao, C. Caiafa, and H. A. Phan,“Tensor decomposi- tions for signal processing applications: From two-way to multiway component analysis,” IEEE Signal Pro- cessing Magazine, vol. 32, no. 2, pp. 145-163, 2015.

[13] S. P. Chepuri and G. Leus,“Sparse sensing for dis- tributed detection,” IEEE Transactions on Signal Pro- cessing, vol. 64, no. 6, pp. 1446-1460, 2015.

[14] G. Tang and A. Nehorai,“Performance analysis for sparse support recovery,” IEEE transactions on infor- mation theory, vol. 56, no. 3, pp. 1383-1399, 2010.

[15] R. Boyer and C. Delpha,“Relative-entropy based beam- forming for secret key transmission,” IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), June 2012

[16] N. D. Tran, R. Boyer, S. Marcos, and P. Larzabal, “An- gular Resolution Limit for array processing: Estimation and information theory approaches,” European Signal Processing Conference (EUSIPCO), Aug 2012 [17] R. A. Harshman,“Foundations of the PARAFAC pro-

cedure: Models and conditions for an ?explanatory?

multi-modal factor analysis,” UCLA Working Papers Phon., vol. 16, pp. 1-84, 1970.

[18] S. Sinanovic and D. H. Johnson, ‘Toward a theory of information processing,”Signal Processing, vol. 87, no.

6, pp. 1326- 1344, 2007.

Références

Documents relatifs

In a second time, we investigate the inverse problem consisting in recovering the shape of the plasma inside the tokamak, from the knowledge of Cauchy data on the external boundary Γ

We consider random rooted maps without regard to their genus, with fixed large number of edges, and address the problem of limiting distri- butions for six different

Let S be a reduced scheme of degree δ, and let G(S) be the rooted MO-graph with no chain-vertices obtained by substituting each chain-vertex of S by a chain of dipoles of length 2 or

In this paper, we derive the theoretical performance of a Tensor MUSIC algo- rithm based on the Higher Order Singular Value Decomposition (HOSVD), that was previously developed

In Section 5, we give a specific implementation of a combined MRG of this form and we compare its speed with other combined MRGs proposed in L’Ecuyer (1996) and L’Ecuyer (1999a),

To summarize the current state of the art, the convergence of the discrete penalized solution towards both the solution to the penalized problem and of the Signorini problem are

The aim of this study was to analyze the spatial distribution of the probability of location and the positioning error over the entire surface of a hydropower reservoir, prior

Key words : Asymptotic normality, Ballistic random walk, Confidence regions, Cramér-Rao efficiency, Maximum likelihood estimation, Random walk in ran- dom environment.. MSC 2000