Low-Complexity Polynomial Channel Estimation in Large-Scale MIMO With Arbitrary Statistics

(1)

HAL Id: hal-01098879

https://hal.archives-ouvertes.fr/hal-01098879

Submitted on 29 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Large-Scale MIMO With Arbitrary Statistics

Nafiseh Shariati, Emil Björnson, Mats Bengtsson, Mérouane Debbah

To cite this version:

Nafiseh Shariati, Emil Björnson, Mats Bengtsson, Mérouane Debbah. Low-Complexity Polynomial Channel Estimation in Large-Scale MIMO With Arbitrary Statistics. IEEE Journal of Selected Topics in Signal Processing, IEEE, 2014, 8 (5), pp.815 - 830. �10.1109/JSTSP.2014.2316063�. �hal-01098879�

(2)

arXiv:1401.5703v1 [cs.IT] 22 Jan 2014

Low-Complexity Polynomial Channel Estimation in

Large-Scale MIMO with Arbitrary Statistics

Nafiseh Shariati, Student Member, IEEE, Emil Bj¨ornson, Member, IEEE, Mats Bengtsson, Senior Member, IEEE, and M´erouane Debbah, Senior Member, IEEE

Abstract—This paper considers pilot-based channel estimation in large-scale multiple-input multiple-output (MIMO) communi- cation systems, also known as “massive MIMO”, where there are hundreds of antennas at one side of the link. Motivated by the fact that computational complexity is one of the main challenges in such systems, a set of low-complexity Bayesian channel estima- tors, coined Polynomial ExpAnsion CHannel (PEACH) estimators, are introduced for arbitrary channel and interference statistics.

While the conventional minimum mean square error (MMSE) estimator has cubic complexity in the dimension of the covariance matrices, due to an inversion operation, our proposed estimators significantly reduce this to square complexity by approximating the inverse by a L-degree matrix polynomial. The coefficients of the polynomial are optimized to minimize the mean square error (MSE) of the estimate.

We show numerically that near-optimal MSEs are achieved with low polynomial degrees. We also derive the exact com- putational complexity of the proposed estimators, in terms of the floating-point operations (FLOPs), by which we prove that the proposed estimators outperform the conventional estimators in large-scale MIMO systems of practical dimensions while providing a reasonable MSEs. Moreover, we show that L needs not scale with the system dimensions to maintain a certain normalized MSE. By analyzing different interference scenarios, we observe that the relative MSE loss of using the low-complexity PEACH estimators is smaller in realistic scenarios with pilot con- tamination. On the other hand, PEACH estimators are not well suited for noise-limited scenarios with high pilot power; therefore, we also introduce the low-complexity diagonalized estimator that performs well in this regime. Finally, we also investigate numerically how the estimation performance is affected by having imperfect statistical knowledge. High robustness is achieved for large-dimensional matrices by using a new covariance estimate which is an affine function of the sample covariance matrix and a regularization term.

Index Terms—Channel estimation, large-scale MIMO, polyno- mial expansion, pilot contamination, spatial correlation.

I. INTRODUCTION

MIMO techniques can bring huge improvements in spectral efficiency to wireless systems, by increasing the spatial reuse

N. Shariati, E. Bj¨ornson, and M. Bengtsson are with the Signal Processing Lab, ACCESS Linnaeus Centre, KTH Royal Institute of Technology, Stock- holm, Sweden (e-mail:{nafiseh, emil.bjornson, mats.bengtsson}ee.kth.se).

E. Bj¨ornson, and M. Debbah are with the Alcatel-Lucent Chair on Flexible Radio, SUPELEC, Gif-sur-Yvette, France (e-mail: {emil.bjornson, merouane.debbah}@supelec.fr).

This work was presented in part at IEEE Symposium on Personal, Indoor, Mobile and Radio Communications (PIMRC), London, UK, Sept. 2013. [1]

E. Bj¨ornson is funded by the International Postdoc Grant 2012-228 from The Swedish Research Council. This research has been supported by the ERC Starting Grant 305123 MORE (Advanced Mathematical Tools for Complex Network Engineering).

through spatial multiplexing [2]. While8×8 MIMO transmissions have found its way into recent communication standards, such as LTE-Advanced [3], there is an increasing interest from academy and industry to equip base stations (BSs) with much larger arrays with several hundreds of antenna elements [4]–

[9]. Such large-scale MIMO, or “massive MIMO”, techniques can give unprecedented spatial resolution and array gain, thus enabling a very dense spatial reuse that potentially can keep up with the rapidly increasing demand for wireless connectivity and need for high energy efficiency.

The antenna elements in large-scale MIMO can be either collocated in one- or multi-dimensional arrays or distributed over a larger area (e.g., on the facade or the windows of buildings) [8]. Apart from increasing the spectral efficiency of conventional wireless systems, which operate at carrier frequencies of one or a few GHz, the use of massive antenna configurations is also a key enabler for high-rate transmissions in mm-Wave bands, where there are plenty of unused spectrum today [9]. In particular, the array gain of large-scale MIMO mitigates the large propagation losses at such high frequencies and 256 antenna elements with half-wavelength minimal spacing can be packed into6× 6 cm at 80 GHz [9].

The majority of previous works on large-scale MIMO (see [4]–[8] and references therein) considers scenarios where BSs equipped with many antennas communicate with single- antenna user terminals (UTs). While this assumption allows for closed-form characterizations of the asymptotic throughput (when the number of antennas and UTs grow large), we can expect practical UTs to be equipped with multiple antennas as well—this is indeed the case already in LTE-Advanced [3].

However, the limited form factor of terminals typically allows for fewer antennas than at the BSs, but the number might still be unconventionally large in mm-Wave communications.

A major limiting factor in large-scale MIMO is the availabil- ity of accurate instantaneous channel state information (CSI).

This is since high spatial resolution can only be exploited if the propagation environment is precisely known. CSI is typically acquired by transmitting predefined pilot signals and estimating the channel coefficients from the received signals [10]–[15]. The pilot overhead is proportional to the number of transmit antennas, thus it is commonly assumed that the pilots are sent from the array with the smallest number of antennas and used for transmission in both directions by exploiting channel reciprocity in time-division duplex (TDD) mode.

The instantaneous channel matrix is acquired from the received pilot signal by applying an appropriate estimation scheme. The Bayesian MMSE estimator is optimal if the

(3)

channel statistics are known [12]–[16], while the minimum- variance unbiased (MVU) estimator is applied otherwise [12].

These channel estimators basically solve a linear system of equations, or equivalently multiply the received pilot signal with an inverse of the covariance matrices. This is a mathematical operation with cubic computational complexity in the matrix dimension, which is the product of the number of antennas at the receiver (at the order of 100) and the length of the pilot sequence (at the order of 10). Evidently, this operation is extremely computationally expensive in large-scale MIMO systems, thus the MMSE and MVU channel estimates cannot be computed within a reasonable period of time. The high computational complexity can be avoided under propagation conditions where all covariance matrices are diagonal, but large-scale MIMO channels typically have a distinct spatial channel correlation due to insufficient antenna spacing and richness of the propagation environment [7]. The spatial correlation decreases the estimation errors [15], but only if an appropriate estimator is applied. Moreover, the necessary pilot reuse in cellular networks creates spatially correlated inter-cell interference, known as pilot contamination, which reduces the estimation performance and spectral efficiency [5]–[7], [10], [11].

Polynomial expansion (PE) is a well-known technique to reduce the complexity of large-dimensional matrix inversions [17]. Similar to classic Taylor series expansions for scalar functions, PE approximates a matrix function by an L-degree matrix polynomial. PE has a long history in the field of signal processing for multiuser detection/equalization, where both the decorrelating detector and the linear MMSE detector involve matrix inversions [17]–[22]. PE-based detectors are versatile since the structure enables simple multistage/pipelined hardware implementation [17] using only additions and multiplications. The degree L basically describes the accuracy to which the inversion of each eigenvalue is approximated, thus the degree needs not scale with the system dimensions to achieve near optimal performance [20]. Instead, L is simply selected to balance between computational complexity and detection performance. A main problem is to select the coefficients of the polynomial to achieve high performance at small L;

the optimal coefficients are expensive to compute [17], but alternatives based on appropriate scalings [18], [21], [23] and asymptotic analysis [19], [22] exist. Recently, PE has also been used to reduce the precoding complexity in large-scale MIMO systems [24]–[26], and high performance was achieved by optimizing the matrix polynomials using asymptotic analysis.

The optimization of the polynomial coefficients is the key to high performance when using PE. Since the system models and performance metrics are fundamentally different in multiuser detection and precoding, the derivation of optimal and low-complexity suboptimal coefficients become two very different problems in these two applications. In this paper, we consider a new signal processing application for PE, namely pilot-based estimation of MIMO channels. We apply the PE technique to approximate the MMSE estimator and thereby obtain a new set of low-complexity channel estimators that we

coin Polynomial ExpAnsion CHannel (PEACH) estimators.¹A main contribution of the paper is to optimize the coefficients of the polynomial to yield low MSE at any fixed polynomial degree L, while keeping the low complexity. The PEACH estimators are evaluated under different propagation/interference conditions and show remarkably good performance at low polynomial degrees. An important property is that L needs not scale with the number of antennas to maintain a fixed normalized MSE loss (as compared to MMSE estimation).

However, L should increase with the transmit power to keep a fixed loss, while it can actually be decreased as the interference becomes stronger. The computational complexity of the PEACH estimators and conventional MMSE/MVU estimators are compared analytically. This reveals that the proposed estimators have smaller complexity exponents. The numerical results confirm that much fewer FLOPs are required to compute the PEACH estimators in large-scale MIMO systems of practical dimensions. Finally, the diagonalized estimator is introduced with even lower complexity and it is shown in which scenarios it is suitable.

A. Outline

The organization of this paper is as follows. In Section II, we describe the system model and formulate the problem of estimating channel coefficients for a large-scale MIMO communication system where the computational complexity is a major issue. Following the Bayesian philosophy, we propose a set of low-complexity estimators in Section III and provide an exact complexity analysis. In Section IV, we numerically evaluate the performance of the proposed estimators in different interference scenarios where comparison is performed with respect to conventional estimators. Finally, conclusions are drawn in Section V.

B. Notation

Boldface (lower case) is used for column vectors, x, and (upper case) for matrices, X. Let X^T, X^H, and X⁻¹ denote the transpose, the conjugate transpose, and the inverse of X, respectively. The Kronecker product of X and Y is denoted X⊗Y, vec(X) is the vector obtained by stacking the columns of X,tr(X) denotes the trace,kXkF is the Frobenius norm, and kXk² is the spectral norm. The notation , denotes definitions, while the big-O notation O(M^x) describes that the complexity is bounded by CM^x for some 0 < C <∞.

A circularly symmetric complex Gaussian random vector x is denoted x∼ CN (¯x, Q), where ¯x is the mean and Q is the covariance matrix.

II. PROBLEMFORMULATION

We consider a MIMO channel where the receiver and the transmitter are equipped with N_r and N_tnumber of antennas,

1After the submission of this paper, we became aware of the concurrent work of [27] which also applies PE to reduce the complexity of MMSE estimation. However, orthogonal frequency division multiplexing (OFDM) systems with a large number of subcarriers are considered in [27], while large-scale single-carrier MIMO systems are our focus. This makes the system models, analysis, and results non-overlapping.

(4)

Pilot signal Transmitter

(Few antennas)

Receiver (Very many antennas)

Fig. 1. Illustration of pilot signaling in a large-scale N^t× N^r MIMO system, where typicallyN^r≫ N^t. The complexity of conventional channel estimators is very large in these systems, which calls for low-complexity alternatives.

respectively. This can be one of the links in a multi-cell multiuser network of arbitrary size. The problem of estimating the instantaneous MIMO channel coefficients for a quasi- static flat-fading channel H ∈ C^N^r^×^N^t is investigated. The channel matrix H is modeled as Rician fading withvec(H)∼ CN (vec( ¯H), R) where the non-zero mean matrix ¯H implies that there might be line-of-sight propagation and the channel covariance matrix R∈ C^N^t^N^r^×^N^t^N^r is positive semi-definite.

Observe that R is generally not a scaled identity matrix, but describes the spatial propagation environment. In order to estimate the channel coefficients, we exploit pilot signals similar to [13]–[15]. This means that the transmitter sends the columns of a fixed predefined pilot matrix P ∈ C^N^t^×^B over B channel uses; see Fig. 1. The integer B is the length of the pilot sequence and usually satisfies B≥ N^t.²

During the pilot signaling, the received matrix Y , [y(1),· · · , y(B)] equals

Y= HP + N (1)

where the disturbance N ∈ C^N^r^×^B is assumed to be circularly-symmetric complex Gaussian distributed and modeled as vec(N) ∼ CN (vec( ¯N), S). Here, ¯N ∈ C^N^r^×^B is the mean disturbance and S ∈ C^N^r^B×N^r^B is the positive definite covariance matrix. The additive disturbance term describes the receiver noise and the interference from all other concurrent transmissions, which might involve the same or other receivers. The latter is commonly referred to as pilot contamination in the large-scale MIMO literature [4]–[8] and can in general have a non-zero line-of-sight component. The analysis herein holds for any ¯N and S, but some typical special cases are described and evaluated numerically in Section IV.

Vectorizing the received matrix in (1) yields y= ePh+ n

2Pilot sequences shorter thanN^tare optimal in highly correlated channels where the pilot matrix P is tailored to the channel and interference statistics [15]. The analysis herein permits anyB≥ 1, but we stress that B ≥ N^tis the case of main interest. This is due to the fact that pilot matrix optimization is cumbersome in large-scale MIMO systems since the transmitter and receiver need to acquire the same statistical information to agree on the pilot matrix.

where y = vec(Y), eP , (P^T⊗ I), h = vec(H) and n = vec(N). This transforms the matrix estimation in (1) into the canonical form of vector estimation in [12] which enables the use of classical estimation results.

If the channel and disturbance statistics (i.e., ¯H, R, ¯N and S) are perfectly known at the receiver, the Bayesian MMSE estimator of the MIMO channel is [12]–[15]

bhMMSE= vec( bHMMSE) = ¯h+ R eP^H

PR ee P^H+ S−1

d (2) where ¯h = vec( ¯H), ¯n = vec( ¯N) and d = y− eP¯h− ¯n.

We measure the performance in terms of the estimation MSE.

Using the MMSE estimator, it follows that MSE= E{kH − bH_MMSEk²F} = tr

(R⁻¹+ eP^HS⁻¹Pe)⁻¹ . (3) Alternatively, if the channel distribution is unknown to the receiver, the classic MVU estimator is [12, Chapter 4]

hb_MVU= vec( bH_MVU) =

Pe^HS⁻¹Pe−1

Pe^HS⁻¹(y− ¯n).

(4) The corresponding performance measure is then the estimation variance E{kH − bH_MVUk²F} = tr

( eP^HS⁻¹P)e ⁻¹ .

Note that the mean matrices of the channel and the disturbance have no impact on the performance with MMSE and MVU estimation. Moreover,

tr

(R⁻¹+ eP^HS⁻¹P)e ⁻¹

<tr

( eP^HS⁻¹P)e ⁻¹ (5) for any R 6= 0, thus the MMSE estimator achieves a better average estimation performance than the MVU estimator since it utilizes the channel statistics.

Remark 1 (Arbitrary Statistics). While having Gaussian channels and disturbance is a well-accepted assumption in conventional MIMO systems, the channel modeling for large- scale MIMO is still in its infancy. By increasing the number of antennas we improve the spatial resolution of the array which eventually may invalidate the rich-scattering assumption that is behind the use of Gaussian channel distributions [7].

However, we stress that the results of this paper can be applied and give reasonable performance under any arbitrary statistical distributions on the channel and disturbance; this is since (2) is also the linear MMSE estimator and (4) is the best linear unbiased estimator (BLUE) in cases when only the first two moments of H and/or N are known [12], [15].

Recall that we assumed that the statistical parameters H, R, ¯¯ N, and S of the channel and disturbance are known at the receiver. Since user mobility and large-scale fading cause continuous changes in the statistics, this implicitly means that the receiver can keep track of these changes. Such tracking can, for example, be achieved by exploiting the pilot signals on multiple flat-fading subcarriers since the large-scale fading properties can be transformed between different adjacent subcarriers [28], [29]. Interestingly, the coherence time of the long-term statistics is relatively short; the measurements in [30] observe coherence times of 5–23 seconds, depend- ing on the propagation environment. High user velocity or

(5)

rapid scheduling decisions in neighboring systems can further reduce the coherence time. More importantly, the number of channel realizations within each coherence time of the statistics is around 13–126, according to [30]. This means that the matrix inversion in the MMSE estimator has to be recomputed frequently.

A. Complexity Issues in Large-Scale MIMO Systems

The main computational complexity when computing the MMSE and MVU estimators in (2) and (4) lies in solving a linear system of equations or, equivalently, in computing the matrix inversions directly. Both approaches have computational complexities that scale asO(M³), where M , BNr

is the matrix dimension.³This complexity is relatively modest in conventional MIMO communication systems where 2× 2, 4× 4, or 8 × 8 are typical configurations.

Recently, there is an increasing interest in large-scale MIMO systems where there might be hundreds of antennas at one side of the link [4]–[9]. To excite all channel dimensions, the pilot length B should be of the same order as N_t. Large-scale MIMO systems are therefore envisioned to operate in TDD mode and exploit channel reciprocity to always have N_t< N_r in the channel estimation phase—N_r can even be orders of magnitude larger than N_t without degrading the estimation performance per antenna element.

Observe that in a potential future large-scale MIMO system with N_r = 200 and Nt = B = 20, the MMSE and MVU estimators would require inverting matrices of size4000×4000 (or similarly, solving a linear system of equations with 4000 unknown variables) which has a complexity at the order of 3.4· 10¹¹ floating-point operations, see Section III-E for details. This massive matrix manipulation needs to be redone every few seconds since R and S change due to mobility.

Motivated by these facts, the purpose of this paper is to develop alternative channel estimators that allow for balancing between computational/hardware complexity and estimation performance.

B. A Diagonalization Approach to Complexity Reduction There is a special case when the computational complexity of MMSE estimation can be greatly reduced, namely when the matrices R, S, and P are all diagonal matrices. The matrix

e

PR eP^H+ S is then also diagonal which allows for computing ( ePR eP^H+ S)⁻¹ by simply inverting each diagonal element.

The corresponding complexity is only 8M − 1 = O(M) FLOPs. This special case is, unfortunately, of limited practical

3Note thatO(M³) refers to the complexity scaling of the classical inversion algorithms, such as Gaussian elimination and inversion based on Cholesky decomposition [31]. The exponent is reduced toO(M^2.8074) by Strassen’s algorithm in [32], which is a divide-an-conquer algorithm that exploits that 2× 2 matrices can be multiplied efficiently. Using the complexity expressions in [32], it is easy to show that the algorithm is only computationally beneficial for very large matrices (e.g.,M & 8000) due to heavy overhead computations.

It also has other drawbacks, such as lower computational accuracy and that the matrix dimensions must beM = 2^kfor some integerk. The exponent can be further reduced toO(M^2.373) [33], but at the cost of more overhead that pushes the breaking point to even higher values ofM . In this paper, we propose new estimators with the complexity scalingO(M²), which both is a asymptotically better and is proved to be beneficial at large but practical M .

interest for large-scale MIMO systems which are prone to non- negligible spatial channel correlation and pilot contamination.⁴ Inspired by this special case, a simple approach to complexity reduction is to diagonalize the covariance matrices R and S by replacing all off-diagonal elements by zero. Let Rdiag

and Sdiagdenote the corresponding matrices, assume B= Nt, and set P=√

PtI wherePt is the average pilot power. The MMSE estimator in (2) is approximated as

b˜h = ¯h +p

P^tR_diag(P^tR_diag+ Sdiag)⁻¹d (6) where the matrix R_diag(Rdiag+ Sdiag)⁻¹ can be precomputed with a computational complexity proportional to M . From now on, we refer to (6) as the diagonalized estimator.

It achieves the following MSE.

Theorem 1. The diagonalized estimator in (6) with P=√ P^tI achieves the MSE

tr

R⁻_diag¹ +P^tS⁻_diag¹ −1

. (7)

In noise-limited scenarios with S = σ²I, the MSE of the diagonalized estimator goes to zero as the powerPt→ ∞.

Proof: The diagonalized estimator in (6) estimates each channel element separately, thus the MSE is equivalent to that of MMSE estimation with Rdiagas channel covariance matrix and Sdiagas disturbance covariance matrix [15]. This gives the MSE expression in (7). By lettingPt→ ∞ in (7), it follows directly that the MSE approaches zero asymptotically.

This theorem shows that the diagonalized estimator performs well in noise-limited scenarios with high signal-to- noise ratio (SNR). Unfortunately, the simulations in Section IV reveals that this is the only operating regime where it is comparable to the MMSE estimator. More precisely, the draw- back of the diagonalized estimator is that it does not exploit the statistical dependence neither between the received pilot signals nor between the channel coefficients. We recall from [15] that exploiting such dependence (e.g., spatial correlation) can give great MSE improvements. Therefore, the next section develops a new sophisticated type of channel estimators that reduces the computational complexity of MMSE estimation while retaining the full statistical information. These estimators are great complements to the diagonalized estimator since they perform particularly well at low to medium SNRs and under interference.

III. LOW-COMPLEXITYBAYESIANPEACH ESTIMATORS

In this section, we propose several low-complexity Bayesian channel estimators based on the concept of polynomial expansion. To understand the main idea, we first state the following lemma which is easily proved by using standard Taylor series.

4The elements of each column of H are highly correlated due the insufficient antenna spacing and limited richness of the scattering around the large array at the receiver. The correlation between the columns depends more on the scattering and size of the small array at the transmitter, thus the correlation might be weaker but complete independence is seldom seen in practice. In the ideal case of exactly independent columns, the covariance matrix ePReP^H+ S is block-diagonal which can be exploited for complexity reduction. The complexity scaling of the MMSE estimation is, however, still cubic inN^r and the proposed estimators have a computational advantage whenN^r is sufficiently large; see Section III-E.

(6)

Lemma 1. For any Hermitian matrix X ∈ C^{N ×N}, with bounded eigenvalues|λⁿ(X)| < 1 for all n, it holds that

(I− X)⁻¹= X∞ l=0

X^l. (8)

Observe that the impact of X^l in (8) reduces with l, as λ_n(X)^l for each eigenvalue. It therefore makes sense to consider L-degree polynomial expansions of the matrix inverse using only the terms l = 0, . . . , L. In principle, the inverse of each eigenvalue is then approximated by an L-degree Taylor polynomial, thus L needs not to scale with the matrix dimension to achieve a certain accuracy per element. Instead, L can be selected to balance between low approximation error and low complexity. To verify this independency in the area of estimation, we investigate the MSE performance of large- scale MIMO systems of different dimensions in Section IV.

We observe an almost identical performance for a fixed L when we vary the number of antennas. Note that a similar remark was made in [20] where the authors show that their system performance metric does not depend on the system dimensions but only the filter rank.

In order to apply Lemma 1 on matrices with any eigenvalue structure, we obtain the next result which is similar to [21].

Proposition 1. For any positive-definite Hermitian matrix X and any 0 < α < _max_n²_λ_n_(X), it holds that

X⁻¹= α I− (I − αX)−1

= α XL l=0

(I− αX)^l+ E (9)

where αPL

l=0(I− αX)^l is an L-degree polynomial approxi- mation and the error term E is bounded askEk²=O k(I − αX)k^L+12

. The error vanishes as L→ ∞.

A. Unweighted PEACH Estimator

Applying the approximation in Proposition 1 on the MMSE estimator in (2) gives the low-complexity L-degree Polynomial ExpAnsion CHannel (PEACH) estimator which we denote by b

h_PEACH= vec( bH_PEACH) and define as hb_PEACH, ¯h+ R eP^H

XL l=0

α I− α( ePR eP^H+ S)l

d. (10) Note that (10) does not involve any inversions. Furthermore, the polynomial structurePL

l=0X^ld lends itself to a recursive computation

XL l=0

X^ld= d + X

d+ X

d+ X d + X(. . .) (11)

where X = I− α( ePR eP^H+ S) for the PEACH estimator.

The key property of (11) is that it only involves matrix-vector multiplications, which have a complexity of O(M²) instead of the cubic complexity of matrix-matrix multiplications [31].

The computational complexity of (10) is therefore O(LM²) where M , BN_r. Whenever L ≪ M, O(LM²) is a large complexity reduction as compared to O(M³) for the original MMSE estimator. Furthermore, the recursive structure enables

an efficient multistage hardware implementation similar to the detection implementation illustrated in [17, Fig. 1].

Theorem 2. The PEACH estimator in (10) achieves the MSE tr

R+ R eP^HA_L( ePR eP^H+ S)A^H_LPRe − 2R eP^HA_LPRe (12) where A_L=PL

l=0α I− α( ePR eP^H+ S)l

.

Proof: This theorem follows from direct computation of the MSE using the definitionMSE = E{kh − bh^PEACHk²}.

It remains to select the scaling parameter α to satisfy the convergence condition in Proposition 1. From a pure complexity point of view, we can select α to be equal to

2

tr( ePReP^H+S) [18]. However, the choice of α also determines the convergence speed of the polynomial expansion. Among the values that satisfy the condition in Proposition 1, the choice

α= 2

maxnλ_n( ePR eP^H+ S) + minnλ_n( ePR eP^H+ S) (13) minimizes the spectral radius of I− α( ePR eP^H + S)

and therefore provides the fastest asymptotic convergence speed [21].⁵ Although the computation of the extreme eigenvalues is generally quite expensive, these eigenvalues can be approximated with lower complexity. For example, as mentioned earlier, if the convergence speed is not the main concern maxnλ_n( ePR eP^H + S) + minnλ_n( ePR eP^H + S) simply can be estimated by tr( ePR eP^H + S). Alternatively, the smallest eigenvalue can be taken as the noise variance and largest eigenvalue can be approximated using some upper bound on the pilot power and on the average channel attenuation to the receiver. In general, a low-complexity method to approximate the extreme eigenvalues of any arbitrary covariance matrix was proposed in [21], based on the Gershgorin circle theorem [34].

This approach exploits the structure of the matrix imposed by the system setup to improve the convergence speed. For more details on how to choose α with low-complexity and compute the extreme eigenvalues we refer to [21].

B. Weighted PEACH Estimator

Although the PEACH estimator (10) converges to the MMSE estimator as L→ ∞, it is generally not the best L- degree polynomial estimator at any finite L. More specifically, instead of multiplying each term in the sum with α, we can assign different weights and optimize these for the specific degree L. In this way, we obtain the weighted PEACH es- timator which we denote as bh_W-PEACH = vec( bH_W-PEACH) and define as

bh_W-PEACH, ¯h+ R eP^H XL l=0

wlα^l+1_w PR ee P^H+ Sl

d (14) where w= [w0, . . . , w_L]^T are scalar weighting coefficients.⁶ Observe that the α-parameter, now denoted αw, is redundant

5The error term in Proposition 1 is bounded byO k(I − αX)k^L2⁺¹

. The spectral norm is minimized by making the largest and smallest eigenvalues symmetric around the origin [21]:maxⁿλⁿ(I− αX) = − minⁿλⁿ(I− αX). By solving for α we obtain α = 2/(maxⁿλⁿ(X) + minⁿλⁿ(X)) which becomes (13) for the problem at hand.

6W-PEACH is obtained by expanding each(I− α(ePRPe^H+ S))^l as a binomial series, collecting terms, and replacing constant factors with weights.

(7)

and can be set to one. For numerical reasons, it might still be good to select

αw≤ 1

maxnλ_n( ePR eP^H+ S) (15) since this makes all the eigenvalues of α^l+1_w PR ee P^H + Sl

smaller than one and thus prevent them from growing un- boundedly as l becomes large. This simplifies the implementation of the following theorem, which finds the weighting coefficients that minimize the MSE.

Theorem 3. The MSE E{kh − bh_W-PEACHk²} is minimized by

w_opt= [w^opt₀ . . . w_L^opt]^T = A⁻¹b (16) where the ijth element of A∈ C^L+1×L+1and the ith element of b∈ C^L+1 are

[A]ij = α^i+j_w tr

R eP^H( ePR eP^H+ S)^i+j−1PRe , [b]i= αⁱ_wtr

R eP^H( ePR eP^H+ S)ⁱ⁻¹PRe

.

(17)

The resulting MSE of the W-PEACH estimator is

MSE = tr(R)− b^HA⁻¹b. (18) Proof: The W-PEACH estimator achieves an MSE of MSE = E{kvec(H) − vec( bH_W-PEACH)k²F}

= tr R− R eP^H XL l=0

(wl+ w_l^∗)α^l+1_w Z^lPRe

+ XL l1=0

XL l2=0

wl1w^∗_l₂α_w^l¹^+l²⁺²R eP^HZ^l¹^+l²⁺¹PRe

! (19)

where Z = ePR eP^H + S. For a given pilot matrix P and polynomial degree L, the coefficients w₀, . . . , wL can be selected to minimize the MSE as

minimize

w0,...,wL

MSE. (20)

The solution to this unconstrained optimization problem is achieved by computing the partial derivatives with respect to each coefficient and looking for stationary points:

∂

∂wl

MSE =−α^l+1w tr

R eP^HZ^lPRe

+ XL l2=0

w^∗_l₂tr

R eP^Hα_w^l¹^+l²⁺²Z^l¹^+l²⁺¹PRe .

(21)

By equating to zero for each l= 0, . . . , L, we achieve L + 1 linear equations that involve the L+ 1 unknown coefficients.

These are Aw= b with A, b as in (17); note that we made a change of variables i = l1+ 1 and j = l2+ 1 for A and i = l + 1 for b, because the sums in (21) begin at 0 while the indices of matrices/vectors usually begin at 1. The MSE minimizing weights are now computed as in (16).

Finally, we note that, using A, b in (17), the MSE expression in (19) can be expressed as tr(R) + w^HAw− b^Hw− w^Hb. For optimal weights wopt= A⁻¹b, the minimum MSE becomes (18).

Observe that the MSE expressions of PEACH and W- PEACH in (12) and (18), respectively, are independent of the mean matrices of the channel and the disturbance. Therefore, the performance is the same as in our conference paper [1], where we assumed zero-mean channel and disturbance.

From (19) in the proof of Theorem 3, we also obtain the MSE expression

MSE(w) = tr(R) + w^HAw− b^Hw− w^Hb (22) for the W-PEACH estimator with any choice of the weighting coefficients.

Remark 2 (Weights of the PEACH estimator). The PEACH estimator can also be expressed as a W-PEACH estimator using certain weights. To find these weights, we observe that

XL l=0

α I− α( ePR eP^H+ S)l

= XL l=0

α Xl n=0

l n

(−α)ⁿ( ePR eP^H+ S)ⁿI^l−n

= XL l=0

Xl n=0

l n

(−1)ⁿαⁿ⁺¹( ePR eP^H+ S)ⁿ. By gathering all terms that belong to a certain exponent n, we see that

w_n = (−1)ⁿ XL l=n

l n

. (23)

Plugging these weights into (22) yields an alternative way of computing the MSE of the PEACH estimator.

Although Theorem 3 provides the optimal weights, the computational complexity isO(M³) since it involves pure matrix multiplications of the form Zⁱ. This means that computing the optimal weights for the W-PEACH estimator has the same asymptotic complexity scaling as computing the conventional MMSE estimator. To benefit from the weight optimization we thus need to find an approximate low-complexity approach to compute the weights, which is done in the next subsection.

Note that the weights cannot be optimized by random matrix theory (as was done for multiuser detection in [19], [22] and precoding in [24]–[26]) due to lack of randomness in the MMSE estimation expression in (2).

Remark 3 (Low-Complexity Classical PEACH Estimators).

Following the same approach as used to derive low-complexity PEACH estimators for the Bayesian case, we form the corre- sponding low-complexity estimators to approximate the classic MVU estimator in (4). Note that if the quality of the channel covariance matrix estimate is very poor, then the MVU esti- mator performs better than the MMSE estimator.

First, we define a regularization factor ǫ >0 which in the form of ǫI is added to ( eP^HS⁻¹Pe). Then, we use the matrix inversion lemma which results in

bh^ǫ_MVU =

ǫI+ eP^HS⁻¹Pe−1

e

P^HS⁻¹(y− ¯n)

= eP^H

P eeP^H+ ǫS⁻1

(y− ¯n) = bhMVU|^ǫ→0. (24)

(8)

The approximation in Proposition 1 can now be applied. The set of low-complexity PEACH estimators obtained by this approach are

bh^MVU_PEACH= eP XL l=0

α

I− α( eP eP^H+ ǫS)^l

(y− ¯n) (25)

and

b

h^MVU_W-PEACH= eP XL l=0

w_lα^l+1_w ( eP eP^H+ ǫS)^l(y− ¯n). (26)

Observe that the last equality in (24) equals to (2) if R=¹_ǫI, therefore all the results presented in Theorems 2 and 3 can be derived for bh^MVU_PEACH and bh^MVU_W-PEACHin a similar way.

Remark 4 (Other PEACH estimators). The PE technique can be applied to any type of channel estimators that involve ma- trix inversions. For example, [35] derives a robust estimator, the minimax regret estimator, under certain uncertainty and statistical assumptions. This estimator has a similar expression as the MMSE estimator, but involves other matrices. Hence, the PE technique is straightforward to apply and the weights can be optimized similar to what is described herein.

C. Low-Complexity Weights

Next, we propose a low-complexity algorithm to compute weights for the W-PEACH estimator. We exploit that

( ePR eP^H+ S) = E{vec(Y)vec(Y)^H} = lim_{T →∞} 1 T

XT t=1

y_ty^H_t (27) where y_t= vec(Y) denotes the received signal at estimation time instant t. This means that ( ePR eP^H + S) is closely approximated by the sample covariance matrix _T¹PT

t=1y_ty^H_t if the number of samples T is large. Although one generally needs T ≫ BNr to get a consistent approximation, we can get away with much smaller T since we only use it to compute traces—this is verified numerically in Section IV.

For any fixed T ≥ 1 and i ≥ 1, we now observe that tr

R eP^H( ePR eP^H+ S)ⁱPRe

(28)

≈ tr R eP^H( ePR eP^H+ S)ⁱ⁻¹ 1 T

XT t=1

y_ty^H_t

! PRe

! (29)

= 1 T

XT t=1

y^H_t

PRe ²Pe^H( ePR eP^H+ S)ⁱ⁻¹

y_t. (30) Since the elements of A and b in (17) are of the form in (28), we can approximate each element using (30).⁷ By computing/updating these approximations over a sliding time window of length T , we obtain Algorithm 1. At any time

7Note thatb0= tr( ePR²Pe^H) needs to be treated differently since there is no( ePRPe^H+ S) term. In the case when eP^HP is a scaled identity matrix,e we only need to computetr(R²) which can be done efficiently since only the diagonal elements of R² are of interest. Otherwise, one can select a set ofT vectors vi∼ CN (0, I) and apply the approximation tr(ePR²Pe^H)≈

α_w T

PT i=1v^H

i PRe ²Pe^Hvi. This is the approach included in Algorithm 1.

Algorithm 1: Low-complexity weights for W-PEACH Input: Polynomial degree L and time window T ; Input: Current time t;

Input: New and old received signals y_t, yt−T;

Input: Approximations eA_t−1, ˜b_t−1at previous time t−1;

1 Set[ eA_t]ij= [ eA_t−1]ij

+α^i+j_w T y^H_t

e

PR²Pe^H( ePR eP^H+ S)^i+j−2 y_t

−α^i+j_w T y^H_t−T

e

PR²Pe^H( ePR eP^H+ S)^i+j−2

y_t−T ∀i, j

2 Set[˜b_t]i= [˜b_t−1]i

+αⁱ_w T y^H_t

e

PR²Pe^H( ePR eP^H+ S)ⁱ⁻² y_t

−αⁱ_w T y^H_t−T

PRe ²Pe^H( ePR eP^H+ S)ⁱ⁻²

y_t−T ∀i ≥ 2

3 Set[˜b_t]1= ^α_T^w PT

i=1v^H_i PRe ²Pe^Hv_i for v_i∼CN (0, I);

4 Compute wapprox,t= eA⁻_t¹b˜_t;

Output: Approximate weights wapprox,t at time t;

instant t, this algorithm computes approximations of A, b, denoted by eA_t, ˜b_t, by using the received signals y_t, . . . , y_t−T+1. These are used to compute approximate weights wapprox,t. To reduce the amount of computations, eA_t, ˜b_t are obtained from eA_t−1, ˜b_t−1by adding one term per element based on the current received signal y_tand removing the impact of the old received signal y_t−T (which is now outside the time window).

The algorithm can be initialized in any way; for example, by accumulating T received signals to fill the time window.

The asymptotic complexity of computing the elements in Ae_t and ˜b_t is O(LM²) FLOPs per time instant. For each element, we need to compute a series of multiplications between vectors and matrices of complexity O(M²). This is explained in detail in Section III-E where we derive the exact computational complexity. Next, wapprox,t is obtained by solving an L-dimensional system of equations, which has complexity O(L³). Finally, the W-PEACH estimate is computed in the recursive manner described in Section III-A with a complexity ofO(LM²). To summarize, the W-PEACH estimator along with Algorithm 1 has a computational complexity ofO(LM²+ L³).

One additional feature of Algorithm 1 is that it can easily be extended to practical scenarios where only imperfect estimates of the covariance matrices R and S are available. Apart from enabling adaptive tracking of the slow variations in the channel and disturbance statistics, this practical scenario is relevant to understand how sensitive Bayesian channel estimators are to mismatches in the statistical knowledge. We perform a numerical study in Section IV, based on the statistical estimation described in the next subsection.

D. Imperfect Covariance Matrix Estimation

Suppose we want to obtain some covariance matrix C from N observations c1, . . . , c_N, where C might be R or ePR eP^H+

(9)

S. The sample covariance matrix Csample,_N¹ PN

i=1c_ic^H_i is conventionally used to estimate C. However, this approach is unsuitable for large-scale systems where it can be hard to accumulate more samples than the dimension of C, which is N_tNr for the channel covariance matrix R. In fact, the sample covariance matrix is not even invertible if the number of samples is smaller than the matrix dimension. Instead of using the pure sample covariance matrix, we suggest to follow a similar approach as in [36] and use a new estimator ˆC which is an affine function of the sample covariance matrix Csample. In [36], the authors have shown that this estimator is a better fit for large-dimensional covariance matrices.

Here, different from the diagonal loading approach in [36], where they consider an affine combination of the identity matrix and the sample covariance matrix, we assume ˆC = κCd+ (1− κ)C^samplewhere C_d is the diagonal matrix com- prising the diagonal elements of C_sample and κ is chosen to minimize the squared difference E{k ˆC−Ck²F}. The advantage of ˆC is that the diagonal elements converge quickly with N to their true values, while the reliance on the off-diagonal elements is controlled by the parameter κ. The optimal κ is given by the following theorem.

Theorem 4. The solution κ^⋆ to the optimization problem minκ

E{k ˆC− Ck²}, where ˆC= κCd+ (1− κ)C^sample, is

κ^⋆= Φ(Csample)−¹2Ψ(Cd, Csample)

Φ(Csample) + Φ(Cd)− Ψ(Cd, Csample) (31) where Φ(Csample) = E{kC^sample − Ck²F}, Φ(C^d) = E{kC^d − Ck²F} and Ψ(C^d, Csample) = E{tr (C^d − C)(Csample− C)

}.

Proof: The objective function can be rewritten as E{kκC^d+ (1− κ)C^sample− C − κC + κCk²F}

= E{kκ(Cd− C)k²F} + E{k(1 − κ)(C^sample− C)k²F} + 2κ(1− κ)E{tr (C^d− C)(C^sample− C)

}.

Considering Φ(Csample), Φ(Cd), and Ψ(Cd, Csample), the first-order optimality condition is

2κΦ(Cd)−2(1−κ)Φ(C^sample)+(1−2κ)Ψ(Cd, Csample) = 0, which yields the optimal solution κ^⋆ in (31).

Note that as the number of samples N grows large, the optimal κ^⋆ will be smaller which implies that we put larger trust in the sample covariance matrix. In Section IV, we apply this theory to the channel covariance matrix and compare the estimation performance when using ˆR to performance with the true covariance matrix R. Interestingly, we observe that the proposed W-PEACH estimator adapts itself very well to imperfect statistics.

E. Asymptotic and Exact Computational Complexity

The asymptotic complexity of the conventional estimators, the diagonalized estimator described in Section II-B, and the proposed PEACH estimators are summarized as follows:

Channel Estimators Computational Complexity MMSE and MVU O(B³N_r³)

Diagonalized O(BNr)

PEACH O(LB²N_r²)

W-PEACH O(LB²N_r²+ L³)

These asymptotic complexity numbers are supported by an exact complexity analysis below. We note that the cubic complexity scaling in BN_rfor the conventional MMSE and MVU estimators is reduced to linear complexity in the diagonalized approach and squared complexity for the proposed PEACH estimators. The degree L of the polynomial expansion has a clear impact on the complexity, but recall that it needs not scale with BN_r [20]. This property is illustrated in the next section, where we also show that small values on L yields good performance.

The high complexity of the conventional estimators is not an issue if the channel and disturbance statistics are fixed over a very long time horizon; the system can then simply compute the inverse and then use it over and over again. As described in Section II-A, the statistics change continuously in practice and it is thus necessary to redo the inversion every few seconds.⁸ To make a precise and fair comparison, we need to consider the relationship between the coherence time of the long-term statistics, τ_s, and the channel coherence time, denoted by τ_c. The analysis below reveals how the computational complexity, in terms of the number of FLOPs, depends on the system dimensions, polynomial degree L, and the coherence times τ_s and τ_c. For the sake of brevity, we consider complex-valued FLOPs and neglect the computational small complexity of scalar multiplications and additions of matrices and vectors.

The ratio Q = ^τ_τ^s_c describes how stationary the channel statistics are [30], in terms of how many channel realizations that fit into the coherence time of the statistics. The propagation environment has significant impact on this ratio;

for example, in [30] the authors have shown that Q equals 13, 108 and 126 for indoor, rural and urban environments, respectively, under their measurement setup. Smaller number are expected when the transmitter/receiver travel with high velocity. Similarly, the disturbance statistics can change rapidly if it contains interference from other systems (particularly if adaptive scheduling is performed) [38]. For a given total time Ttot, the computational complexity for each of the estimators consists of two parts: one part which can be precomputed once per coherence time of the statistics (i.e., k_s = ^T_τ^tot_s times) and one part that is computed at channel realization (i.e., k_c= ^T_τ^tot_c ). Note that k_c= Qks.

We use the notation M = NrB and N = NrNt. For given vectors x, y ∈ C^{N ×1} and matrices A ∈ C^{M ×N} and B ∈ C^{N ×P}, there are M P(2N − 1), M(2N − 1) and 2N− 1 FLOPs required for the matrix-matrix product AB, matrix-vector product Ax, and vector-vector product x^Hy, respectively. In the special case of M = P and C = AB

8The MMSE estimator can be implemented recursively [37], which is suitable for tracking variations in the covariance matrices. The complexity of each recursion isO(M²), but we need more than M recursions (per long- term statistics coherence time) to obtain a stable covariance estimate [37].

Hence, the recursive implementation also has a cubic complexity.

(10)

being symmetric, only ¹₂M(M + 1)(2N − 1) FLOPs are required to obtain C. Moreover, the Cholesky factorization of a positive definite matrix A∈ C^{M ×M} is computed using

1

3M³ FLOPs. To solve a linear system of equations Ax= b, where b ∈ C^{M ×1}, by exploiting Cholesky factorization and back-substitution, a total of ¹₃M³+ 2M² FLOPs is needed [31].

We denote the total computational complexity in FLOPs by χ. For the MMSE estimator, the two parts UMMSE = R eP^H

e

PR eP^H+ S−1

and v = eP^H¯h+ ¯n are computed once per τ_s and the parts d= y− v and ¯h+ Ud once per τ_c. It results in a total computational complexity of χMMSE = kc

N(2M− 1) + ks

₁

3M³+ (3N − 0.5)M²+ (2N²+ 2N −³2)M

in FLOPs.

For the MVU estimator, there is U_MVU =

Pe^HS⁻¹Pe

⁻1

Pe^HS⁻¹ which is computed once per τ_s, and the parts y− ¯n (neglected) and UMVU(y− ¯n) computed once per τ_c, yielding to χMVU = kc

N(2M−1) +ks

₁

3M³+ 2N M²+ (3N²+ N )M +¹₃N³− 0.5N²− 0.5N

.

For the proposed PEACH and W-PEACH estimators, only v is computed once per τs. The rest of the computations take place once per τ_c. As described in (11), the polynomial PL

l=0X^ld, where X = I− α( ePR eP^H + S), is computed recursively. The first term d is readily available. The second term Xd is computed as a series of matrix-vector products.

First, we compute Sd and eP^Hd. Next, we multiply R with the resulting vector of( eP^Hd), and then eP is multiplied with the vector (R eP^Hd). The vector d− α ePR eP^Hd− αSd is then computed. We repeat this procedure L times and exploit X^ld to compute X^l+1d. For the PEACH estimator, the total computational complexity is χ_PEACH = kc

2LM²+ ((4L + 2)N− 2L)M + 2(L + 1)N²− 2(L + 1)N

+ ks

M(2N− 1) FLOPs.

The polynomial structure of W-PEACH estimator requires the same number of FLOPs as the PEACH estimator, but there are two additional sources of computations: solving the linear system of equations A⁻¹b to compute the weight vector wopt

(which requires ¹₃(L + 1)³+ 2(L + 1)² FLOPs) and using Algorithm 1 to find the approximated elements of A and b.

The computational complexity of Algorithm 1 is counted by considering the following: Firstly, we only need to obtain the elements in ˜A_t, since all the elements of ˜b_tcan be extracted out from ˜A_t. In particular, all the elements contain similar terms Z^k with Z = ˜PR ˜P^H + S, where 0 ≤ k ≤ 2L in A˜_t and 0 ≤ k ≤ L − 1 in ˜b_t. Secondly, we exploit the fact that Z^kyt for 0 ≤ k ≤ L has been already computed in the estimator expression PL

l=0Z^lyt. Thirdly, to determine all the elements in ˜A_t, we first need to compute Z^kyt for L+ 1 ≤ k ≤ 2L which results in doing a recursive matrix- vector multiplication L times (i.e., L[M (2M− 1) + N(2M − 1) + N (2N − 1) + M(2N − 1)] FLOPs) and then compute ytHPR˜ ²P˜^H. Note that this term can be considered as the multiplication of ytHPR and R ˜˜ P^H, where the first term ytHPR has already been computed. This results in two˜ matrix-vector products (i.e., N(2N−1)+M(2N −1) FLOPs).

Finally, for each element, we have the vector-vector multiplication (y^t^HPR˜ ²P˜^H)(Z^kyt) resulting in (2L + 1)(2M− 1)

FLOPs. To summarize, for the W-PEACH estimator, we have χW−PEACH = kc

4LM²+ (8L + 4)M N + (4L + 4)N²+ M− (4L + 3)N +¹3L³+ 3L²+ 3L +⁴₃

+ ks

M(2N− 1) FLOPs.

In the following table we summarize the exact total computational complexity of the different estimators when B= Nt, which makes M = N .

Estimators FLOPs

MMSE kc

2M²−M +ks₁₆

3M³+³₂M²−³2M

MVU kc

2M²−M +ks

₁₇

3M³+¹₂M²−¹2M PEACH k_c

(8L+4)M²−(4L+2)M +ks

2M²−M

W-PEACH k_c

(16L+8)M²−(4L+2)M +¹₃L³+3L²+3L+⁴₃

+ks

2M²−M Now, recalling k_c = Qks and comparing the dominating terms of the MMSE and PEACH estimators, we can obtain a condition (the relation between the values L, Q and M ) for when the PEACH estimators are less complex than the MMSE estimator. This condition is

16

3 M ≥ 8QL + 2Q ⇒ M ≥ Q

3 2L+3

8

(32) for the PEACH estimator, and

16

3 M ≥ 16QL + 6Q ⇒ M ≥ Q

3L +9

8

(33) for the W-PEACH estimator. This implies that only under certain numbers of the channel stationarity, polynomial degree, and the number of antennas, PEACH estimators are less complex than the MMSE estimator and will provide reasonable performance. For the practical values of Q= 50 and L = 2, (32) and (33) show that the PEACH and W-PEACH estimators outperform the MMSE estimator in terms of complexity for M = NtN_r ≥ 167 and M ≥ 357, respectively. Hence, the PEACH estimator is practically useful for setup such as N_t = 2 and Nr = 100 or Nt = 1 and Nr = 200, similarly the W-PEACH estimator for N_t= 4 and Nr= 100 or Nt= 1 and N_r= 400.

As demonstrated by the complexity analysis, the PEACH estimators are computed using only matrix-vector multiplications. This is a standard operation that can easily be par- allelized and implemented using efficient integrated circuits.

On the contrary, the matrix inversions in the MMSE/MVU estimators are known to be complicated to implement in hardware [39]. Consequently, whenever the PEACH estimators and MMSE/MVU estimators are similar in terms of FLOPs, the computational delays and energy consumption are probably lower when implementing the proposed PEACH estimators.

IV. PERFORMANCEEVALUATION

In this section, we analyze and illustrate the performance of the proposed diagonalized, PEACH, and W-PEACH estimators. The analysis so far has been generic with respect to the disturbance covariance matrix S. Here, we consider two scenarios: noise-limited and cellular networks with pilot contamination. We describe the latter scenario in more detail since