Linear precoding for spatial multiplexing MIMO systems: blind channel estimation aspects

(1)

Linear Precoding for Spatial Multiplexing MIMO Systems:

Blind Channel Estimation Aspects

Abdelkader Medles, Dirk T.M. Slock

Eurecom Institute¹ 2229 route des Crˆetes, B.P. 193

06904 Sophia Antipolis Cedex FRANCE

Abstract - For the case of white uncorrelated inputs, most of the blind multichannel identification techniques are not very robust and only allow to estimate the channel up to a number of ambiguities, especially in the MIMO case.

On the other hand, all current standardized communica- tion systems employ some form of known inputs to allow channel estimation. The channel estimation performance in those cases can be optimized by a semiblind approach which exploits both training and blind information. When the inputs are colored and have sufficiently different spec- tra, the MIMO channel may become blindly identifiable up to one constant phase factor per input, and this under looser conditions on the channel. For the case of spatial multiplexing, possible cooperation between the channel in- puts allows for more complex MIMO source prefiltering that may allow blind MIMO channel identification up to just one global constant phase factor. We introduce semib- lind criteria that are motivated by the Gaussian ML ap- proach. They combine a training based weighted least- squares criterion with a blind criterion based on linear pre- diction. A variety of blind criteria are considered for the various cases of source coloring.

I. INTRODUCTION

The multichannel aspect has led to the development of a wealth of blind channel estimation techniques over the last decade. In this paper, blind identification shall mean channel identification on the basis of the second-order statistics of the received signal. Consider linear digital modulation over a linear channel with additive Gaussian noise. Assume that we have

p

transmitters and

m > p

receiving channels (e.g. antennas in BLAST or SDMA). For more details on the used notation in this paper refer to [1].

II. BLINDIDENTIFICATION FORCOLOREDINPUTS

In this section we aim to improve the Second-Order Statis- tics (SOS) based blind channel identification by exploiting correlation in the inputs. In the context of digital communications, the inputs are symbol sequences which are typically uncorrelated. Correlation can be introduced by linear convolutive precoding, which corresponds to MIMO prefiltering of the actual vector sequence b

k

of symbols to be transmit- ted with a MIMO prefilter T

(z)

such that the transmitted vec- tor signal becomes a

k =

^T

(q)

^b

k

. In this paper we consider full rate linear precoding so that T

(z)

^{is a}

p

^{square ma-}

trix transfer function (in [2] an example of low rate precoding appears since the same symbol sequence gets distributed over all TX antennas). We get for the transmitted signal spectrum

S

^aa

(z) =

^T

(z)S

bb

(z)

^T^y

(z) = _b

²^T

(z)

^T^y

(z)

^{and for}

1Eurecom’s research is partially supported by its industrial partners: As- com, Swisscom, France Télécom, La Fondation CEGETEL, Bouygues Télé- com, Thales, ST Microelectronics, Motorola, Hitachi Europe and Texas In- struments. The work reported here was also partially supported by the French RNRT project ERMITAGES.

the received signal spectrum

S

yy

(z) =

^H

(z)S

aa

(z)

^H^y

(z) + S

vv

(z) =

²

_b

^H

(z)

^T

(z)

^T^y

(z)

^H^y

(z) +

²

_v I m

. The choice of appropriate prefiltering, as we shall see below, may reduce the non-identifiability to a phase factor per source or even to a global phase factor. In the context of wireless communications, two scenarios may be distinguished:

Noncooperative scenario: this scenario corresponds to the multi-user case (on the transmitter side) without cooperation between users. We shall consider the simple case in which the users transmit through only one antenna. This noncooperative scenario can also arise in other source separation applications since natural sources tend to have different spectra. In this sce- nario, H

(z)

has no structure, other than possibly being FIR, and T

(z)

^and

S

aa

(z)

are diagonal. This scenario has been considered in [3],[4].

Cooperative scenario: this is the single-user spatial multiplex- ing case. In this case, since transmit antennas are near each other and also receive antennas are near each other, all (FIR) entries in H

(z)

have the same delay spread and hence are polynomial of the same order.

S

aa

(z)

is allowed to be nondiagonal.

In the noncooperative case the channel will tend to be irreducible, a characteristic we have assumed so far, due to the fact that the users tend to be spread out in space. In the spatial multiplexing case however, in which the TX antennas are essentially colocated, the irreducibility of the channel depends on the richness of the scattering environment. In general, we need to consider a reducible channel. Such a channel can be factored as H

(z) =

^G

(z)

^C

(z)

^{where G}

(z)

is irreducible and column reduced with columns in order of e.g. non-increasing degree. If

r

is the (generic) rank of H

(z)

^{, then G}

(z)

^is

m

r

whereas C

(z)

^is

r

p

. In the noncooperative scenario C

(z)

^has

no particular structure. In the cooperative scenario however, all entries in a particular row of C

(z)

have the same degree and the degrees of the rows are non-decreasing (the degree profile of the rows in C

(z)

is complementary to the degree profile of the columns in G

(z)

^).

If

r

m

^,

1

^{, then}

²

_v

is blindly identifiable from

S

^yy

(z)

^and

G

(z)

is blindly identifiable from the signal/noise subspaces of

S

yy

(z)

up to a postmultiplication factor L

(z)

that is block lower triangular with block sizes according to the multiplici- ties of the degrees of the columns of G

(z)

^{and L}

(z)

^{is also}

polynomial with the degree of block

(i;j)

being the difference between the degrees of block

i

^{and block}

j

of the columns of G

(z)

[5]. So in particular, the diagonal blocks of L

(z)

^{are con-}

stant. Also, L^,1

(z)

has the same polynomial structure as L

(z)

^.

In the cooperative case, L^,1

(z)

^C

(z)

has the same polynomial structure as C

(z)

^{. If}

r

m

^,

1

, there are essentially no restric- tions on the number of inputs

p

for identifiability. If

r = m

^,

identifiability of

_v

²becomes an issue and there’s no longer a point in considering a factorization of H

(z)

for its identifica-

(2)

tion.

Finally, let us note that TX pulse shape filters can be incor- porated in T

(z)

^or

S

^aa

(z)

and that oversampling at the RX also leads to an increase in the number of RX channels. Also, the formulation of complex quantities as a superposition of real quantities may lead to an extra MIMO dimension. In the next two sections we investigate channel identifiability with diago- nal or full prefiltering T

(z)

^.

III. NONCOOPERATIVE/DIAGONALPREFILTERING

In general, we would like to handle the reducible channel case. The rank

r

can be identified from

S

^yy

(z)

^{. If}

r

m

^,

1

, then we can denoise the SOS and identify the factor G

(z)

from the subspaces. G

(z)

is unique up to a factor L

(z)

^{. For}

whichever G

(z)

in this equivalence class, it remains to identify C

(z)

^{in H}

(z) =

^G

(z)

^C

(z)

^from

S(z) =

^G^#

(z)(S

yy

(z)

^,

²

_v I m )

^G^#^y

(z) =

^C

(z)S

aa

(z)

^C^y

(z)

(1) where G^#

(z)

is a MMSE ZF equalizer for G

(z)

^{: G}^#

(z)

^G

(z) = I r

^{. For}

r = m

, the problem is similar to the one in (1) with C

(z)

replaced by H

(z)

(apart from the

²

_v

identification issue which will be discussed below). The value of the rank

r

² ^f

1;2;:::;min(m;p)

^gis unpredictible in general. For a certain rank

r

, subsets of

r

^,

1

columns of C

(z)

could be identified jointly from

S(z)

using certain

S

^aa

(z)

and under certain conditions on C

(z)

(or subsets of

r

columns under more strin- gent conditions on C

(z)

). So to be general,

S

^aa

(z)

^should

be such that it allows identifiability for the worst case of

r

^,

which is

r = 1

. In that case, each column of C

(z)

^{needs to}

be identified separately. On the other hand, since in the case

r = 1

each column of C

(z)

is a scalar FIR transfer function, only its minimum-phase equivalent is identifiable. So a column would be truly identifiable only if it is minimum-phase.

To avoid having zeros would require to impose

r

2

^{. In any}

case, to be fully general, it is desirable to have

S

^aa

(z)

^{such that}

it allows identifiability of each column of C

(z)

separately. So the MIMO problem gets converted into a set of disconnected SIMO (

r > 1

) or SISO (

r = 1

) problems. This will allow identifiability of each column up to a constant phase factor of the form

e ^j

if the column has no maximum-phase zeros (which is quite possible if

r

2

but highly unlikely for

r = 1

^{). An-}

other issue is the degree of C

j (z)

^{, column}

j

^{of C}

(z)

^{. We have}

H

j (z) =

^G

(z)

^C

j (z)

. The degree of C

j (z)

is unpredictible and can be up to

N j

^,

1

, the degree of the corresponding col- umn H

j (z)

^{of H}

(z)

. For identifiability, we need to consider the worst case and hence we shall assume that the degree of C

j (z)

^is

N j

^,

1

. Of course, the

N j

themselves may be unpredictible and in practice need to be replaced by an upper bound.

We now consider two approaches for identification, leading to two classes of solutions for

S

aa

(z)

^.

Frequency domain approach: The idea here is to introduce zeros into the diagonal elements of T

(z)

^{or hence}

S

^aa

(z)

^such

that all other elements other than diagonal element

j

^share

N j

zeros

T

jj (z) =

^Y

^p

i

⁼¹

;

⁶⁼

j N

i

Y

k

⁼¹

(1

^,

z _i;k z

^,1

)

⁽²⁾

This allows identifiability of C

j (z)

^from

S(z)

up to a phase since

S(z j;k ) =

^C

j (z j;k )S a

j

a

j

(z j;k )

^C^y

_j (z j;k ) ; k = 1;:::;N j

(3)

where

S a

j

a

j

(z) =

²

_b T jj (z)T _jj

^y

(z)

^{. If all}

N j

are equal (to

N

¹), then we can choose equispaced zeros on the unit cir- cle:

Q

N k

⁼¹i

(1

^,

z i;k z

^,1

) = 1

^,

e ^j

ⁱ

z

^,

^N

¹. If we furthermore choose overall equispacing by taking

i = (i

^,

1)2=pN

¹^,

then

T ii (z) =

^1,^1,

_e

_ji

^z

^,

_z

^pN^,¹N¹. In [4] very similar work appears in which the degree of

S

aa

(z)

(in the case of equal

N _j

) is at least

pN

¹(compared to

(p

^,

1)N

¹here), but the discussion in [4] is limited to the case

m > p = 2

. Note that here we can easily allow

p > m

even (more inputs than outputs!). Remark also that by introducing a number of zeros in T

(z)

, we can furthermore identify an equal number of noise parameters (such as

²

_v

for instance when

r = m

^{). Non-FIR}

S

aa

(z)

can be considered also. For instance we can consider the case in which the

S a

j

a

j

(e ^j

²

^f )

have at least

N j

disjoint expansion coefficients in some orthogonal basis. An extreme example of this would be

S a

j

a

j

(e ^j

²

^f )

that are bandlimited with the

p

bands being non-overlapping.

Time domain approach: The idea here is to introduce delay in the prefilter so that the correlations of each C

j (z)

^appear

separately in certain portions of the correlation sequence of

S(z)

. This can be obtained for instance with T

jj (z) = 1

^,

j z

^,

^d

^j

; d j =

^X

^j

^,1

i

⁼¹

N i :

⁽⁴⁾

Identification can be done with a correlation sequence peeling approach that starts with the last column C

p (z)

of which the (1-sided) correlation sequence appears in an isolated fashion in the last

N p

correlations of

S(z)

. Identification of C

p (z)

^from

its correlation sequence can be done up to a phase factor

e ^j

^p

(and up to the phase of zeros if C

p (z)

has zeros). We can then subtract

S a

j

a

j

(z)

^C

p (z)

^C^y

_p (z)

(which does not require C

p (z)

but only its correlation sequence) from

S(z)

which will then reveal the correlation sequence of C

p

^,1

(z)

in its last

N p

^,1 correlations, etc. The degree of

S

^aa

(z)

is in this case the degree

d p

^of

S a

p

a

p

(z)

which, in the case of all equal

N j

^{, is again}

(p

^,

1)N

¹, which leads to a degree of

pN

¹^,

1

^for

S(z)

^or

hence

pN

¹correlations. Such a degree for

S

^aa

(z)

is not only sufficient but also necessary since when

r = 1

, there are

pN

¹

parameters to be identified for which indeed at least

pN

¹^cor-

relations are needed. Note that in the temporal approach, increasing all the delays

d j

with an amount

D

allows furthermore the (straightforward) identification of MA(

D

^,

1

) noise (e.g.

D = 1

for white noise with arbitrary spatial correlation).

In practice, with estimated correlations, the correlation peeling approach leads to increasing estimation errors as the columns of C

(z)

get processed. This error increase can be avoided by doubling the delay separation between sources, which may furthermore lead to simpler algorithms (e.g. SIMO subspace fitting with asymmetric covariance matrices). Time domain approaches also appear in [3].

IV. COOPERATIVE/SPATIAL-MULTIPLEXING

PREFILTERING

Non-cooperative approaches can of course also be applied in the cooperative scenario, so diagonal prefiltering can be used for spatial multiplexing. However, this leads to at least a unknown phase per TX antenna and hence requires either differ- ential encoding or training symbols per TX antenna. By ap- plying full prefiltering, such that

S

^aa

(z)

is not blockdiagonal

(3)

in which case it is said to be fully diverse, the channel may possibly be identified up to a global phase factor only. Since better identifiability results in this case, better estimation may possibly be another consequence. We consider here linear precoding by time-invariant MIMO prefiltering. In [6], a block precoding approach is considered.

We can work with the eigen or LDU decompositions of

S

^aa

(z)

^. To begin with, consider the eigendecomposition Saa

(z) =

^V

(z)

^D

(z)

^V^y

(z)

^{where V}

(z)

is paraunitary (i.e.

V

y

(

^z

)

^V

(

^z

) =

^I) and contains the eigenvectors as columns, and D

(z)

is diagonal with the diagonal elements, the eigenval- ues, being valid scalar spectra.

A paraunitary matrix V

(z)

is said to be full diverse, if

PV

(z)

^V^y

(1)

^P

^T

cannot be made block diagonal for any permutation^P.

Theorem 1: : An irreducible FIR MIMO channel is blindly identifiable up to a phase factor per user if

S

^aa

(z)

has distinct eigen value functions, and up to one global phase factor if its eigenvector matrix is fully diverse.

It may perhaps be more practical to work with the LDU (Lower triangular-Diagonal-Upper triangular) decomposition Saa

(z) =

^L

(z)

^D

(

^z

)

^L^y

(

^z

)

^{where L}

(z)

is lower triangular with unit diagonal and the non-zero off-diagonal elements be- ing unconstrained transfer functions, and D

(z)

is diagonal with the diagonal elements being valid scalar spectra. The rela- tion between the LDU decomposition and the prefilter T

(z)

is immediate if T

(z)

is of the form T

(z) =

^L

(z)(z)

^where

(z)

is diagonal. An example of such a T

(z)

that allows irreducible channel identification up to one global phase factor is

(z) = I _p

^{and T}

(z) =

^L

(z) = I _p + D z

^e ^,1^where

D

^e^{has only}

non-zero elements on the first subdiagonal and those elements are all different constants.

Stationary precoding can be generalized to cyclostationary precoding via periodically timevariant prefiltering. By stack- ing

q

consecutive symbol period quantities y

k

^{, v}

k

^{, a}

k

^{, b}

k

^{, we} obtain Y

k

^{, V}

k

^{, A}

k

^{, B}

k

. We can then introduce a

pq

^LTI

MIMO prefilter T

(z)

^{such that}

Y

k

^,^V

k = (I _q

^H

(q))

^A

_k = (I _q

^H

(q))

^T

(q)

^B

_k :

⁽⁵⁾

Cyclostationary prefiltering introduces more information, hence should allow improved estimation (and possibly avoid stationary noise).

V. GAUSSIAN ML SEMIBLINDCHANNELIDENTIFICATION

For Gaussian ML, which will allow to exploit the SOS, we model the unknown symbols as uncorrelated Gaussian vari- ables whereas the known symbols b

k

lead to a non-zero mean.

By neglecting the non-stationarity due to the known symbols, the Gaussian likelihood function can be written in the frequency domain:

H

[M lndet(S

yy

(z))+

(

^y

(z)

^,^H

(z)

^T

(z)

^b

K (z))

^y

S

yy^,1

(z)(

^y

(z)

^,^H

(z)

^T

(z)

^b

K (z))]

(6) where

H

is short for ¹

2

j

H

dzz

^{and y}

(z)

^{, b}

K (z)

^{denote the}

z

transforms of the signal of

M

^{samples y}

_k

and the known sym- bols b

k

. The gradient of this criterion is the same as the gradient of the following sum of two subcriteria. The first subcriterion is

I

(

^y

(z)

^,^H

(z)

^T

(z)

^b

K (z))

^y

S

^,1yy

(z)(

^y

(z)

^,^H

(z)

^T

(z)

^b

K (z))

(7)

which is a weighted LS criterion (quadratic in H

(z)

^{) with the}

training information. The second subcriterion is tr

I

f

S

yy^,1

(z) S

^e^yy

(z)S

yy^,1

(z) S

^e^yy

(z)

^g ⁽⁸⁾

where

S

^e^yy

(z) = S

^yy

(z)

^,

S

^b^yy

(z)

^,

S

^b^yy

(z) = _M

¹^y

(z)

^y

(z)

^y

(periodogram), and the gradient is taken by considering

S

yy^,1

(z)

as constant. This second criterion is one of weighted spectrum matching and expresses the blind information. By taking the gradient of the sum of the two subcriteria, we combine training and blind information in an optimal fashion (com- pare to the CRB expression for GML). Asymptotically we can replace

S

yy^,1

(z)

by a consistent estimate

S

^b^,1yy

(z)

such as the periodogram. Also, we can replace the periodogram by an AR model which matches the covariance sequence estimate ap- pearing implicitely in the periodogram (or asymptotically by a consistent AR model) such that P

(z) S

^b^yy

(z)

^P^y

(z) = I

^so

that

S

^b^,1yy

(z) =

^P^y

(z)

^P

(z)

^{, where P}

(z)

is the MIMO prediction error filter in which a square-root of the prediction error covariance matrix inverse has been absorbed. The blind subcriterion then becomes^I

P

(z)S

^yy

(z)

^P^y

(z)

^,

I m

²

F

⁽⁹⁾ which is of fourth order in H

(z)

. One solution consists of inter- preting H¹

(z)

^{and H}²

(z)

ⁱⁿ

S

^yy

(z) =

^H¹

(z)S

^aa

(z)

^H²

(z) +

²

_v I m

as different quantities and performing alternating opti- mizations between them (and

_v

²^).

VI. BLINDGML CHANNELIDENTIFICATION FOR AFLAT

CHANNEL

Here we shall focus on the blind identification part for a frequency flat channel H

=

^{G C with}

r < m

. We can take G

= V

^S, an orthonormal matrix spanning the signal subspace.

We shall estimate first

V

^Sand then C.

Identification of the signal subspace

V

^S^:

We can alternatively estimate the noise subspace

V

^N^{. Ide-}

ally, R

L (I L

V

^N

) = 0

^{where R}

L

is the denoised covariance matrix of

L

symbol periods of y

k

. We shall estimate

V

^N ^using

a weighted LS criterion

V

^N^:

V

^N^H

min V

^N⁼

I

^fm

vect(

^,r ^R^b

L (I L

V

^N

))

^g

^H

^W^f

vect(

^R^b

L (I L

V

^N

))

^g

(10) where R^b

L

is the sample covariance matrix, R^b

L =

^R

L +

Re

L

^. The optimal weighting is W

=

^E

vect(

^R^b

L (I L

V

^N

))

^gf

vect(

^R^b

L (I L

V

^N

))

^g

^H

^{. With}^R^b

L

^{based on}

M

^sam-

ples, we get E^f

vect(

^R^e

_L )

^gf

vect(

^R^e

_L )

^g

^H = _M

¹^R

_TL

^R

_L

^{. This}

allows us to work out the WLS criterion (10) to become

V

^N^:

V

^N^H

min V

^N⁼

I

m^,r

tr

^f

V

^N

^H

^ryy

(0)V

^N^g ⁽¹¹⁾

where ryy

(0)

is the estimated correlation matrix of y

k

^{at lag}

0

. The solution is clearly given by the noise subspace of the matrix in the middle, so that

V

^S becomes its signal subspace.

Identification of C:

By equalizing

V

^S^{, we get r}

(i) = V _S ^H

^ryy

(i)V S =

C raa

(i)

^C

^H

. We introduce a normalization so that r^,1

=

²

(0)

^{C r}¹aa

⁼

²

(0)

^raa

(i)

^r

^H=

aa²

(0)

^C

^H

^r^,

^H=

²

(0) =

^r

(i)

(12)

(4)

for

i = 0;1;:::;L

, and where r

(i) =

^r^,1

⁼

²

(0)

^r

(i)

^r^,

^H=

²

(0)

and raa

(i) =

^r^,1aa

⁼

²

(0)

^raa

(i)

^r^,aa

^H=

²

(0)

^{. From}

i = 0

^{, we ob-}

serve that r^,1

=

²

(0)

^{C r}¹aa

⁼

²

(0) = Q

for some matrix

Q

^with

orthonormal rows, or hence C

=

^r¹

⁼

²

(0)Q

^r^,1aa

⁼

²

(0)

^{. To find}

estimate

Q

, we shall assume

r = p

^{so that}

Q

is square and unitary, and we shall solve (12), which becomes

Q

raa

(i)Q ^H =

r

(i)

^{, for}

i = i;:::;L

in a least-squares sense:

Q

^:^tr^f

min Q

^H

Q

^g=

r L

X

i

⁼¹

k

Q

^raa

(i)

^,^r

(i)Q

^k²

_F

⁽¹³⁾

The solution of this problem involves an eigendecomposition and is unique in general up to a phase factor.

VII. PRECODER OPTIMIZATION

We focus here on the flat channel case, and we study the optimization of the precoder to maximize the available capacity of the system, which is also the mutual information when the channel is estimated I

S

^{a a}

(

^y

;

^b^j^H^b

)

^.

The ergodic capacity of an AWGN channel, when the channel knowledge is absent at the transmitter and perfect at the receiver is given by:

C(S

^{a a}

) =

^EI

_S

^{a a}

(

^y

;

^b^j^H

)

=

^E^H

lndet(I +

¹_v²^H

S

^{a a}

(z)

^H

^H )

⁽¹⁴⁾

Where

S

^aa

(z) =

^P

^L _i

^raa

(i)z

^,

ⁱ

, the expectation E is here w.r.t. the distribution of the channel. As in [7], we as- sume the entries H

i;j

of the channel to be mutually inde- pendent, identically distributed zero mean complex Gaus- sian variables (Rayleigh flat fading MIMO channel model).

Telatar has shown [7] that for such a channel model, the optimization of the capacity subject to the TX power constraint

H

tr

(S

^{a a}

(z))

N tx

²

_b

leads to the requirement of a white (and Gaussian) vector transmission signal

S

^{a a}

(z) =

²

_b I

^.

For a block of length

M

(sufficiently large for the frequency domain expressions to be sufficiently accurate), and using a precoding

S

^{a a}

(z) =

²

_b (I + S(z))

⁽

<< 1

^,

S(z) =

^P

^L _i S i z

^,

ⁱ

^and^H ^tr

(S(z)) =

^tr

(S

⁰

) = 0

^for

power constraint), the deviation of the available capacity for the whole block is:

= MC(

²

_b I)

^,

M

^{E I}

S

^aa

(

^y

;

^b^j^H^b

)

=M(C( _b

²

I)

^,

C(S

^{a a}

))

| {z }

1

+M(C(S

^{a a}

)

^,^{E I}

S

^aa

(

^y

;

^b^j^H^b

))

| {z }

2

¹ represents the deviation (decrease) of the capacity due to(15) the precoding, whereas

²represents the deviation caused by channel estimation errors.

Let

=

²^b²_v ^and

R = I +

^{H H}

^H

^{, then:}

¹

=

^,

M

^E^H

lndet(I + R

^,1^H

S(z)

^H

=

^,

M

^E^H

[tr

^f

R

^,1^H

S(z)

^H

^g

,

²

tr

^f

R

^,1^H

S(z)

^H

^H R

^,1^H

S(z)

^H

^g

+ O(

³

)]

,

Mtr

^f^EH

^H R

^,1^H

S

⁰^g

+M

²

²^E

tr

^f^H^H

^H R

^,1^H

S(z)

^H

^H R

^,1^H

S(z)

^g

(16) We will show below that E H

H R

^,1H is a multiple of the unity matrix

I

. In fact, for every permutation matrix

P

^and

unitary diagonal matrix

D = diag(e ^j

¹

;:::;e ^j

^p

)

^{, H}⁰

=

H

PD

has the same distribution as H, then E H

H R

^,1^H

= D

P ^H

^{E H}

^H R

^,1^H

PD

. By averaging over the set of permu- tations P and phases

[0;2] ^p

^{, we get:}

E H

H R

^,1^H

=

⁽²¹

⁾p

Z

1 p!

X

P

²^P

D

P ^H

^EH

^H R

^,1^H

PD d

¹

:::d p

= ^tr

^{E HHR}

_p

^,1^H

I

(17) The first term in

¹^is^,

M ^tr

^{EHH R}

_p

^,1^H

trS

⁰

= 0

^.

¹

=M

²

²^P

_i;j

^E

tr

^f^H

z

^,(

ⁱ

⁺

^j

⁾^H

^H R

^,1^H

S i

^H

H R

^,1^H

S j

^g

=M

²

²^P

_i

^E

tr

^f^H^H

^H R

^,1^H

S i

^H

H R

^,1^H

S _Hi

^g

=M

²^P

_i vect ^H (S i )

^W

vect(S i ):

(18) where W

=

²^E

(

^H

^H R

^,1^H

) ^T

(

^H

^H R

^,1^H

)

^,^P

_i

is done over all index values

i

^{for which}

S i

⁶

= 0

^and

^H

^H R

^,1^H

= (I + (

^H

)

^,1

)

^,1then for high SNR i.e

>> 1

^{: W}

I

^.

As shown in [8], the deviation of the capacity for a small channel estimation error is

²

= ME

^H^e²^. ^{To mini-}

mize

²the channel estimator should be the MMSE estimator.

The MMSE estimator doesn’t achieve in general the Cramer- Rao Bound (

CRB

) but a good measure of the deviation is

²

= Mtr

^E

CRB = Mtr

^E

J _HH

^# ^{, where}

J HH

is the Fis- cher Information Matrix.

Subspace fitting gives an estimation of the channel H^b⁰ from the estimated correlation at lag 0 (see Section VI.) ryy

(0) =

^H

S

⁰^H

^H + I

, which leads asymptotically to H⁰

=

H

S

⁰¹

⁼

²^Q

^H

^. The identification of the unitary matrix Q is done from the remaining correlation lag, H^0#ryy

(i)

^H^0#

^H =

Q

S

⁰^,1

⁼

²

S i S

⁰^,

^H=

²^Q

^H

; as a result we can assume without loss of generality that

S

⁰

= I

^{and hence}

S

⁰

= 0

^{. The Mean}

Square Error of the subspace estimator H⁰is of order

M

^,1^.

H

=

^H⁰^Q

=

^H^b⁰

Q

^b

|{z}

=

Hb

+

^H^b⁰^Q^e

|{z}

/

g

⁽

⁾

=

^p

M

+

^H^e⁰^Q

|{z}

/1

=

^p

M

(19)

where

g()

is an increasing function of

^,1. For large

M

^and

small

we can neglect the last term, and assume thatH^b⁰

=

^H⁰^.

The source of the channel error is then caused by the estimation error on Q. If X

= (

^x¹

;

^x²

;:::;

^x

n

p

)

is a real parameterization of the unitary matrix, i.e Q

=

^Q

(

^X

)

^{, then:}

J

HH = @ @

^X^h

JJ XX

@

^X

@

^h

T

(20) where h

= [Re(vect(

^H

)) ^T ;Im(vect(

^H

)) ^T ] ^T

^and

JJ XX =

,EY^jH⁰

@

^ln

p

⁽^Y^j^X

;

^H⁰⁾

@

^X

@

^ln

p

⁽^Y^j^X

;

^H⁰⁾

@

^X

T

. An asymptotic expression of

JJ XX

^{for large}

M

^{is then :}

JJ XX (i;j) = M

^I

tr

^f

S

yy^,1

(z)@S

^yy

(z)

@

^x

_i S

^,1yy

(z)@S

^yy

(z)

@

^x

_j

^g

(21) By proceeding as for

¹, we can show that:

(5)

JJ XX (i;j) = M

^f

²^X

k vect ^H (S k )W _Ji;j vect(S k )+O(

³

)

^g

(22) where

W _Ji;j =

^@

^Q

_@

xi^Q

H

R

^0,

^T

R

^0,1

^@

^Q

_@

xj^Q

and

R

⁰

= I + (

^H⁰

^H

^H⁰

)

^,1. We can note that for high SNR i.e

>> 1

^:

R

⁰

I

^. The decrease is now

²

=

^,2

^E

tr

^J

_M

^HH²^#

=

^,2

^E

tr

^JJ _M

^XX²^,1

@

^X

@

^h

T @

^X

@

^h

,1

. The minimization of the capacity lost

can be performed by evaluating the expectation, which is quite complex, and then optimizing the resulting cost function. An alternative approach is to consider a close lower bound of

that is easier to calculate and to minimize. From the solution for

S(z)

^,

we can then find the solution for the precoder T

(z)

^{. For a}

precoder T

(z) = (1

^,

²

)

¹

⁼

²

I +

^P

_k M _F

^,1

F _k z

^,

k

^{, where}

M F = (

^P

_i F i F _i ^H )

¹

⁼

² ^verifies

M F M _F ^H =

^P

_i F i F _i ^H

^and

F

⁰

= 0

, the power spectrum is:

S

^{a a}

(z)= _b

²^f

I + (1

^,

²

)

¹

⁼

²^P

_k

^,

M _F

^,1

F k + F

^,

^H _k M _F

^,

^H

z

^,

^k +

²^,

M _F

^,1

F(z)F

^y

(z)M _F

^,

^H

^,

I

^g

(23) This spectrum verifies the power constraint as

S

⁰

= 0

^{. The}

first order approximation is:

S(z) =

^P

_k

^,

M _F

^,1

F k + F

^,

^H _k M _F

^,

^H

z

^,

^k

^.

Example of optimization for

p = 2

^:

We introduce the following parameterization of Q: X

= (;'; )

²

[0;]

[0;2]

², that allows to determine Q up to a global phase factor:

Q

(;'; ) =

^h

e ^j cos e ^j sin

,

e

^,

^j sin e

^,

^j cos

i

(24) We consider here the case of high SNR, i.e

>> 1

^{, in which}

case

¹

= 2M

²^P

^L

¹^jj

S i

^jj²^and

R

^0,

^T

R

^0,1

= I

^{. To}

optimize

^{, we fix}

¹ ^{by fixing}^P

^L

¹^jj

S i

^jj²

= 1

^{, we op-}

timize then

² under this last constraint, and given the precoder solution, optimize finally the overall decrease

^with

respect to the scale factor

. We will skip below several details of the derivation due to lack of place. First we establish the following equalities:

@

^X

@

^h

#

= _@ ^@

_XT^hT

= ^@ _@

^QT_XT^v

_@ ^@

_Q^hT_v^{, where}

Q

v = [Re(vect(

^Q

)) ^T ;Im(vect(

^Q

)) ^T ] ^T

^.

Considering now the independence between H⁰and

Q

^we

can write:

²

=

^,2

EX

tr

^JJ _M

^XX²^,1

^@ _@

^QT_XT^v EH⁰

@

^hT

@

^Q_v

@

^hT

@

^Q_v

T

@

^QT_v

@

^XT

T

=

^,2

^E

^jj

^H _p

^jj²

tr

EX

8

<

:

JJ M

XX²

@

^QT_v

@

^XT

@

^QT_v

@

^XT

T

^!^,1⁹⁼

; ,1

=

^,2

^E

^jj

^H _p

^jj²

C(S)

(25) where EX (resp. EH⁰) denotes expectation with respect to X (resp. H⁰), and to obtain the second expression we used arguments similar to the ones used to prove that E H

H R

^,1^H

= ^tr

^{E HHR}

_p

^,1^H

I

and applied here to H⁰ (as- sumed to have the same distribution as H) in order to prove that: EH⁰

@

^hT

@

^Q_v

@

^hT

@

^Q_v

T

= ^E

^jj

^H _p

^jj²

I

. The main difficulty is to

minimize

C(S)

, which is possible directly but at high numer- ical cost. To avoid it, let’s first note that as result of the convex- ity character of the function

f(x) = 1=x

^over^R⁺^:

C(S)

C

⁰

(S) = tr

8

<

:

EX

JJ M

XX²

@

^QT_v

@

^XT

@

^QT_v

@

^XT

T

^!^,1⁹⁼

; ,1

, then for

S opt

^{optimum of}

C

⁰^,

C(S opt )

C opt

C _opt

⁰ ^.

If

C

⁽

S

opt^),

C

⁰_opt

C

⁽

S

opt⁾

<< 1

then we can conclude that

S opt

^minimizes

²^. In our case

C

⁰

(S) =

¹⁴

_m

⁵⁺1 ^m^m¹² +

m

²^, where

m

¹

=

^P

^L _i

⁼¹^jj

S i (1;1)

^,

S i (2;2)

^jj² ^and

m

²

=

P

L i

⁼¹^jj

S i (1;2)

^jj²

+

^jj

S i (2;1)

^jj², the optimal solution (

C _opt

⁰

= 1:125

) verifies:

S i (2;2) =

^,

S i (1;1); i = 1;:::;L

^,

m

²

=

²³^{, and}^P

^L _i

⁼¹^jj

S i (1;1)

^jj²

+

^jj

S i (1;1)

^jj²

=

1

3

. One choice of a causal precoder with the smallest order L=1 that verifies these optimality conditions is:

T

opt (z) = (1

^,

²

)

¹

⁼

²

I +

^A

z

^,1

;

^A

= 1

^p

3

"

1

p

2

1

^,^p¹2

#

(26) In the first order of

this leads to

S opt (z) = (1

^,

²

)

¹

⁼

²^A

(z

^,1

+ z) +

^,²

²

I

^A

(z

^,1

+ z)

. By numeri- cal evaluation we get:

C

⁽

S

opt^),

C

⁰_opt

C

⁽

S

opt⁾

=

¹

^:

^375,1¹

_:

³⁷⁵

^:

¹²⁵

= 0:18

^,

hence

S opt

is close to optimal. The optimization of

^gives

=

ⁿ

⁽

^E

^jj

^H

^jj²²

_Mp

⁾

^C

⁽

^S

^opt⁾^o¹

⁼

⁴, usually the normalization is

E

^jj

H

^jj²

p = 1

^{, then}

= 0:91

ⁿ

_M

^o^,1

⁼

⁴ ^and

_MC

⁽

²_b

_I

⁾

=

1

:

⁶⁶⁽

M

⁾¹⁼²

MC

⁽

_b²

I

⁾ ^/ ^ln¹

p

M

. The same ratio in the case of the ex- clusive use of a training sequence gives

MC

⁽

_b²

I

⁾^/

M _{TS pm}

^,1^ln

^.

To achieve the same performance, the length of the training sequence needs to be of the order

q

M m

which can become very important for large MIMO systems.

REFERENCES

[1] A. Medles, D.T.M. Slock and E. De Carvalho “Linear prediction based semi-blind estimation of MIMO FIR channels”. In Proc. Third Workshop on Signal Processing Advances in Wireless Communications (SPAWC’01), pp. 58-61, Taipei, Taiwan, March 2001.

[2] G. Leus, P. Vandaele, and M. Moonen. “Deterministic Blind Modulation- Induced Source Separation for Digital Wireless Communications”. IEEE Trans. on Signal Processing, SP-49(1):219–227, Jan. 2001.

[3] J. Xavier, V.A.N. Barroso and J.M.F Moura. “Closed-Form Correlative Coding(^CF^C2)Blind Identification of MIMO Channels: Isometry Fit- ting to Second Order Statistics”. IEEE Trans. Signal Processing, SP- 49(5):1073–1086, May 2001.

[4] Y. Hua and Y. Xiang. “The BID for Blind Equalization of FIR MIMO Channels”. In Proc. ICASSP Conf., Salt Lake City, USA, May 2001.

[5] A. Gorokhov and Ph. Loubaton. “Subspace based techniques for second- order blind separation of convolutive mixtures with temporally correlated sources”. IEEE Trans. Circuits and Systems, 44:813–820, Sept. 1997.

[6] H. Hat´es and F. Hlawatsch. “Blind Equalization of MIMO Channels Us- ing Deterministic Precoding”. In Proc. ICASSP Conf., Salt Lake City, USA, May 2001.

[7] I.E. Telatar. “Capacity of Multi-antenna Gaussian Channels”. Eur. Trans.

Telecom., vol. 10, no. 6, pp. 585-595, Nov/Dec 1999.

[8] A. Medles and D.T.M. Slock. “Semiblind channel estimation for MIMO spatial multiplexing systems”. In Proc. 35th Asilomar Conference on Sig- nals, Systems and Computers, Pacific Grove, CA, USA, Nov. 2001.

Linear precoding for spatial multiplexing MIMO systems: blind channel estimation aspects