On space-time coding for quasi-static multiple-antenna channels

(1)

Multiple-Antenna Channels

Giuseppe Caire and Giulio Colavolpe

Mobile Communications Group Universita diParma

Institut EURECOM Dipartimentodi Ingegneriadell'Informazione

2229Route des Cretes Parco Areadelle Scienze 181/A

06560 Valbonne, FRANCE 43100 Parma - ITALY

Tel: +33 (0)4 93002904 Tel: +39 0521 905744

Fax: +33 (0)4 93002627 Fax: +39 0521 905758

E-mail: [email protected] Email: [email protected]

March 7, 2001

Abstract

Weproposeanewspace-timecodingschemeforthequasi-staticmultiple-antenna

channelwithperfect channelstate informationat thereceiverand nochannelstate

information at the transmitter. The new scheme includes both trellis space-time

codes and layered space-time codes as special cases. When the number of trans-

mitantennas is notlarger thanthe numberof receive antennas ourscheme can be

eÆcientlydecodedbyminimummean-squareerror(MMSE)decision-feedbackinter-

ference cancellation coupledwithViterbi decodingthroughthe useof per-survivor

processing (PSP). We discussthe code design for the new scheme, and show that

nding codes with optimal diversity is much easier than for conventional trellis

space-time codes. We providean upperboundtotheword-errorrateof ourscheme

whichisbothaccurateandeasytoevaluate. Then,wendupperandlowerbounds

to the information outage probability with discrete i.i.d. inputs (as opposed to

Gaussianinputs,asinmostpreviousworks)and weshowthattheMMSEfront-end

yields a large advantage over the whitened matched lter front-end when coupled

with the decision-feedback PSP-based decoder. Our scheme yields a largeperfor-

mance gain with respect to coded V-BLAST of similar complexity, and it can be

easilycoupledwith therecentlyproposed linear-dispersion precoding to handlethe

caseofanumberofreceiveantennassmallerthanthenumberoftransmitantennas.

(2)

1 Introduction

Transmission schemes based on multiple antennas have attracted much attention in the

recentyearsasaviablesolutiontoincrease spectraleÆciencyandperformanceofwireless

channels.

Roughly speaking, works on multiple antennas can be classied depending on the

assumptions on the channel state information (CSI) available at the transmitter and at

the receiver. Theergodiccapacity[1]of afrequencynon-selectivechannelwith ttransmit

and r receive antennasand independent Rayleighfading, notransmitter CSIand perfect

receiverCSIhasbeencalculatedbyTelatarin[2]. Inthiscase,thecapacityforhigh-SNRis

givenbyC =minfr;tglog

2

SNR+O(1)bit/channeluse. Theergodiccapacitywithperfect

CSIboth atthetransmitterand atthereceiverhasthe samehigh-SNRbehavior[3]. The

informationoutageprobability[1](relatedtothe non-ergodic,or\outage"capacity)with

no transmitterCSI and perfect receiver CSIhas been investigated numerically in[2]and

in [4]. Transmit strategies minimizing the information outage probability with perfect

CSI both at the transmitter and atthe receiver have been considered in[5].

Other works assume perfect receiver CSI and partial transmitter CSI [6], frequency-

selectivefadingandacombinationofmultipleantennasand OFDM[7], andlarge-system

limits of physical scattering models, without any assumption of independent Rayleigh

fading, based on large randommatrix theory [8].

Amorerealisticmodelfortime-varyingfadingchannelshasbeenintroducedbyHochwald

and Marzetta in [9]. They considered a block-fading channel constant for T consecutive

channel uses and independent from block to block, where both transmitter and receiver

havenoCSI. Tse and Zheng[10] have shown thatthe high-SNRcapacity ofsuch channel

is C = k(1 k=T)log

2

SNR+O(1), where k = minfr;t;T=2g. They show also that

capacity is maximizedby using nomore than k transmit antennas. In particular, having

t >r antennasdoesnot improvecapacity. In [10,11] itisshownthat thesame high-SNR

capacity behaviorcan beachieved by a\naive"schemethat allocatesk inputdimensions

forchannelestimation(e.g.,by explicitlysendingtrainingsymbols), andbyamismatched

receiver that treats the training-basedestimated channel as if itwas the actual channel.

When the code block length N is much larger than maxfr;tg, and the channel co-

herence interval is T N, one codeword spans a single channel realization. Then, we

are in the presence of a compound channel, i.e., a collection of channels indexed by the

dierentrealizations of the rt channelmatrix. In this case, the (non-ergodic) capacity

and information outage probability of the channel with or without CSI at the receiver

coincide, since the channel matrix process is strongly singular [1].

In practice, the assumption of perfect CSI at the receiver holds approximately when

the channel varies very slowly with respect to the duration of a codeword (quasi-static

assumption). Thisis aquiterealisticassumption inseveral situationswherethe mobility

ofwirelessterminalsislimitedorabsent(e.g.,indoorwirelesslocal-areanetworks,wireless

local loops). On the contrary, the assumption of perfect CSI at the transmitter holds

only if adelay-freeerror-free feedback link fromreceivertotransmitter exists, or if time-

(3)

signal in the reverse direction. Motivated by the above considerations, we conclude that

assumingperfect CSIatthe receiverand noCSIatthe transmitterisreasonable andcan

beappliedinseveral practicalsettingsforwhichT N maxfr;tg and wherefeedback

ortime-divisionduplexingcannotbeexploited. Theresultsbasedonthisassumptioncan

be eectively approached in the high-SNR region by the naive scheme based on explicit

training, which is actually the way the large majority of existing wireless systems work

today. Without further questioning about the validity of this model, we adopt it as our

starting point.

Codingschemesforthequasi-staticmultiple-antennachannelwithperfectreceiverCSI

have been proposed inseveral works(see for example [13, 14,15, 16,17,18, 19]). In this

paper we propose a new scheme nicknamed \wrapped" space-time coding that includes

boththetrellisspace-timecodesof[13] andthelayeredspace-timecodesof[14]asspecial

cases. Forthe casert,ourschemecanbeeÆcientlydecodedbyminimummean-square

error (MMSE) decision-feedback interference cancellationcoupled withViterbi decoding,

through the use of per-survivor processing (PSP) [20]. We discuss the code design for

the new scheme, and show that nding codes with optimal diversity is actually much

easier than for conventional trellis space-time codes. We show many exampleswhere the

maximumpossiblediversity isachieved bywell-known trelliscodes. Weprovideanupper

boundtotheword-errorrateofourschemewhichisbothveryclosetosimulationsandeasy

toevaluate(atleastforgeometricallyuniformcodes). Then,inordertounderstandbetter

the behavior of the word-error rate, we nd upper and lower bounds to the information

outage probability with discretei.i.d. inputs. We show that the MMSE front-end yields

a large advantage over the whitened matchedlter (WMF) front-end if coupledwith the

decision-feedback PSP-based decoder. Finally, we provide numerical examples showing

the accuracy of our analytical bound and the eectiveness of the proposed space-time

coding scheme. We compare our scheme with V-BLAST [21], which is also based on

decision-feedback interference cancellationbut itdoesnot exploitPSP,and weshowthat

the latter suers severely from error propagation in the feedback decisions while the

proposedscheme doesnot. Also,weconsiderthe caser <t andwe showthat ourscheme

can workasouter codingwhereinnercoding(or \precoding")isobtained by therecently

proposed linear dispersioncodes [19].

The paper isorganized asfollows: inSection2wereview the basic trellisand layered

space-time coding schemes and we introduce the new \wrapped" scheme. In Section 3

we describe the details of the PSP-based decoding scheme. In Section 4 we discuss the

code design for our scheme and in Section 5 we present the semi-analyticbound on the

word-error rate. In Section 6 we present results on the outage probability with discrete

inputs and in Section7 we showsome numericalresults,the comparison with V-BLAST

and the use oflinear-dispersion precoding. Conclusions are summarizedinSection 8and

some side results are collected inAppendices A and B for the sake of completeness.

(4)

2 Coding for the Quasi-Static Multiple-Antenna Chan-

nel

The multiple-input multiple-output (MIMO) system with t transmitting (Tx) and r re-

ceiving (Rx) antennas considered inthis paper isdened by [13, 22, 4,2]

y

n

= p

Hx

n +z

n

; n=1;:::;N (1)

wherex

n 2X

t

isthevectorofmodulationsymbolstransmittedinparallelattimenbythe

Tx antennas, X C denotes a complex modulation signal set with unit average energy,

z

n 2 C

r

is the noise vector i.i.d. N

C

(0;I), y

n 2 C

r

is the corresponding vector of

received signalsamplesattheoutputof theRxantennas,H2C rt

isthe channelmatrix,

and isthe SNRperTxantenna. Weassumethat thechannelmatrixisnormalizedsuch

that 1

rt

traceE[HH H

] =1, so that the average received SNR per Rx antenna is given by

t.

As anticipated in the introduction, we consider the case where H is random but

constant over N maxft;rg channel uses, and we assume that the receiver knows H

perfectly,while the transmitter has noknowledge of H.

A space-time code (STC) for the above channel is a set S X tN

of tN complex

matrices (codewords). Codeword matricesX =[x

1

;:::;x

N

] are transmitted by columns,

in N consecutive channel uses. The STC spectral eÆciency is given by = 1

N log

2 jSj

bit/channeluse. By denition,the informationbit-energy over noise power spectralden-

sity ratio isgiven by E

b

=N

0

=t=.

2.1 Trellis Space-Time Codes

In [13], Tarokh et al. found criteria to design STC. They considered the pairwise-

error probability (PEP) with maximum-likelihood (ML) decoding and, in the case of

Rayleigh/RicianfadingcoeÆcients,theyshowed thatthePEPP(X!X 0

)averagedwith

respect to His upperbounded as

P(X!X 0

)K r

where K is a factor that depends on the codeword dierence matrix D = X 0

X and

on the channel matrix statistics, but it is independent of , and where is the rank

of D. Driven by the above bound, they indicated as the most important criterion for

constructingSTC themaximizationof theminimumrankoveralldistinctX;X 0

2S. We

shall refer tothis minimum rank asthe code rank-diversity.

ForlargeN,STCcanbeconstructedfrommultidimensionaltrelliscodes(M-TCM)[23]

with trellis termination. Examples of such schemes are given in [13, 16, 17,18]. Namely,

consider aM-TCM code CoverXof rateR =b=tbit/symbol,whereeachtrellis stepcor-

responds tob information (input)bits and to t code (output) symbols, and the subcode

of all c2 C leavinga given trellis state s

0

and merging intoa given trellis state s

N after

(5)

c as tN matrices, i.e., by transmitting the t symbols produced by the trellis encoder

at each trellis step in parallel on the t Tx antennas. The resulting spectral eÆciency is

given by b bit/channeluse.

1

ThediÆcultyinconstructingthesecodes isthatthe rank-diversity ishardtoevaluate

and it is not easily related to the algebraic properties of the underlying M-TCM code.

In [22], for the class of binary and quaternary trellis codes over Z

2

and Z

4

mapped onto

BPSK and QPSK,respectively,a condition onthe underlyingalgebraiccodes referred to

as the binary-rank criterionis shown to imply the rank-diversity of the resulting STC S

and it is used to construct STCs with full rank-diversity (i.e., with = t). The binary-

rank criterion is much easier to check than the rank-diversity and yields some explicit

general algebraicconstructions.

The maximum-likelihood (ML) decoder for trellis STC can be implemented by the

Viterbi Algorithm (VA) applied on the trellis of the underlyingM-TCM code. However,

for xed code rate R and rank-diversity the decoder complexity grows exponentially

with the number of Tx antennas t. In fact, from Lemma 3.3.2 in [13] we know that the

trellis complexity is lower bounded by 2 tR( 1)

.

Because of the diÆculty in code design (with the exception of constructions given

in[22])andbecauseofdecoding complexity,trellis STCarepracticallyrestrictedtosmall

t.

2.2 Layered Space-Time Codes

In [14], Foschini proposed a STC scheme suited to handle a very large number of Tx

and Rx antennas. In this scheme, M codes C (1)

;:::;C (M)

over X produce independently

codewords c (1)

;:::;c (M)

of length N 0

=td, where d1and M are given integers. These

codewords are diagonallyinterleaved (or\layered"), 2

asshown inFig.1,inorder toform

the tN codeword matrix (with N =d(M +t 1))

X=L

d (c

(1)

;:::;c (M)

)

whereL

d

indicatesthe\layered"diagonalinterleaverwithinterleavingdelayd(seeFig.1).

A reduced-complexity suboptimal decoder for this scheme is obtained by a linear

front-end followed by decision-feedback interference cancellation [14]. The linear front-

end, dened by a matrix F2C rt

,produces the sequence of received vectors

v

n

=F H

y

n

; n=1;:::;N

Then, eachcodeword c (m)

is decoded by taking as observable the sequence of samples

r (m)

`

=v

j;n p

t

X

k=j+1 b

j;k b x

k;n

; `=1;:::;N 0

(2)

1

Forsimplicity, weignoretherate decrease dueto trellistermination sinceit becomes negligiblefor

N much largerthantheTCMencodermemory.

2

In[21]anotherlayeredschemebasedonstandardrectangular(or\vertical")interleavingisconsidered,

(6)

where the one-to-oneindex mapping(m;`)$(j;n) is induced by the interleaverL

d ,v

j;n

is the j-th element ofv

n ,b

j;k

is the (j;k)-th elementof the matrix B=F H

H and xb

k;n is

a decision on the (k;n)-th symbol of X. From Fig. 1, we see that the elements x

k;n for

k =j+1;:::;t correspond toeitherzeros(forwhichnodecisionisneeded) ortosymbols

ofcodewordsc (m

0

)

withindexm 0

<m. Therefore, bydecodingthecodewordsinthe order

m =1;:::;M, the decisions needed in (2)are provided by earlierdecoded codewords.

2.3 \Wrapped" Space-Time Codes

In the original layered construction of [14], the length of each component codeword is

N 0

=dt. Because of practical hardware complexity limitationst cannot be a very large

number. Thisimpliesthat inordertohavelong codewords, the interleavingdelay dmust

belarge. Ifinterleavingdelay is anissue,the layered scheme isforced towork withshort

componentcodeblocklengthN 0

. Thismightposeaseriousproblemforusingtrelliscodes

with a large number of states. In fact, the code memory might not be negligible with

respect to N 0

thusyielding a non-negligiblerate loss due totrellis termination.

For this reason, we propose a scheme which keeps the the simplicity of decision-

feedbackdecodingwhileallows forarbitrarilylongcomponentcodewords andsmallinter-

leavingdelay. Interestingly,trellisSTCand layeredSTCare foundtobespecial cases. In

the proposed scheme, a single code C over X produces a codeword c of length N 00

. This

codewordisdiagonallyinterleavedinordertoformthetN codewordmatrixX=D

d (c),

with N =N 00

=t+(t 1)d. The diagonalinterleaver D

d

isdened by

x

j;n

=

c

`j(n)

if 1`

j

(n)N 00

0 otherwise

(3)

for 1j t and 1nN, where

`

j

(n)=[n 1 (j 1)d]t+j: (4)

The codeword matrix X is lled by wrapping the codeword c around its diagonals, as

illustrated by Fig. 2. Hence, this STC scheme shall be referred toas wrapped space-time

code(WSTC).Inthisway,theinterleavingdelaydbecomesafreeparameter,independent

of the componentcodeword block lengthN 00

.

Asa limitingcase,the interleavingdelaymay be alsod =0,i.e., avertical interleaver

may be used. For consistence with the case d>0, where code symbols with lower index

takethe lowerpositions ineach columnof the codeword matrix X (see Fig. 2),for d=0

we assume that the codeword matrix X is lled with the elements of the codeword c as

in Fig. 3. In this case, index `

j

(n) in(3) becomes

`

j

(n)=(n 1)t+t j +1: (5)

Remark 1 If C is a M-TCM code of rate R = b=t, then S = D

0

(C) is a trellis STC. If

C = C (1)

C (M)

(i.e., C is given by the Cartesian product of codes C (1)

;:::;C (M)

),

then S=D

d

(C) =L

d (C

(1)

;:::;C (M)

) isa layered STC. Hence, WSTC is ageneralization

(7)

Remark 2 Because of the lower and upper triangles of zero symbols in the codeword

matrix dened by (3), there is an inherent rate loss of (t 1)d=N. This is negligible if

N td. Moreover, if the transmissionof a long sequence of codewords is envisaged, the

codeword matricescan beconcatenated inordertolltheleadingand tailingtrianglesof

zeros, so that norate loss isincurred.

3 Decoding of Wrapped Space-Time Codes

Inthe casewhereCisatrellis code,weproposeareduced-complexitysuboptimaldecoder

obtainedbycombiningdecision-feedbackinterferencecancellationwiththeVA,inanalogy

with the delayed decision-feedback detection (DDFD) approach for ISI channels with

coding (see [25, 26, 27, 28] and references therein). The decoder takesas observable the

sequence of samples

r

`

=v

j;n p

t

X

k=j+1 b

j;k b x

k;n

; `=1;:::;N 00

(6)

wherev

j;n andb

j;k

aredenedasin(2)andwhere1j tand1nN aretheunique

integersfor which `

j

(n)=`. From the index mapping(4)(or (5)for d=0), we see that

the elements x

k;n

for k =j +1;:::;t correspond to either zeros (forwhich nodecision is

needed) ortosymbolsofcwithindex` 0

` td+1(`

0

` 1ford=0). Thesedecisions

are found in the survivor history according to per-survivor processing (PSP) [20]. As an

example, consider the case d >0. The delay for decision xb

k;n

necessary to compute the

observable for symbol c

`j(n)

=x

j;n

is given by (k j)(td 1). If(k j)(td 1) is larger

than the VA decoding delay (typically 5 or 6 times the code constraint length [29]), the

corresponding decisions are reliably obtained from the VA output. If (k j)(td 1) is

not large enough, the correspondingdecisions can be obtained fromthe survivors ending

ineach state ofthe VA.In this way, ifthe correctpath isamongthe survivors, atleast a

subset of the paths are extended by the VA with the correct feedback decisions.

Choice of the linear front-end lter. For the sake of simplicity, the decoder treats

the sequence r

`

asif itwas producedby the virtual scalar-inputadditive-noise channel

r

`

= p

j c

` +

`

; `=1;:::;N 00

(7)

where

`

is assumed i.i.d. N

C

(0;1) and, from(4) or(5), we have that

j =

t j` 1j

t

for d=0

j` 1j

t

+1 for d>0

(jj

t

denotes amodulo t operation). The SNR

j

of the above channel isgiven by

j

=

jb

j;j j

2

jf

j j

2

+ P

j 1

jb

j;k j

2

(8)

wheref

j

isthe j-thcolumnofthe front-endlter F. Becauseofdiagonalinterleaving,the

codeword ciscyclicallyinterleaved overt virtual additivewhite Gaussiannoise(AWGN)

channels with SNRs

1

;:::;

t

, so that exactly N 00

=t symbols are assigned to each chan-

nel j. Notice that (8)is the true SNR of channel j if decisions in (6) are correct, i.e., if

the contributionof past symbols is canceled exactly. Moreover, the noise samples

` are

not Gaussianand notindependent,ingeneral. However, providedthat theseassumptions

hold, this scheme decomposes the MIMO channel(1) into t parallel channels with cyclic

interleaving,as illustrated inFig. 4.

Giventheanalogybetweenthisschemeanddecision-feedbackequalizationofISIchan-

nels, standard choices for the front-end lter matrix F are also inspired by equaliza-

tion [30]. IfHhas rankt, aninformation-losslessfront-end isgiven by the WMFF=Q,

whereH=QBisthe\QR" decomposition[31] ofthechannelmatrix H,whereQ2C rt

has orthonormalcolumnsandB2C tt

isuppertriangular. Inthiscase, the j-thchannel

SNR is given by

j

=jb

j;j j

2

and the additivenoise isexactly Gaussiani.i.d. (subject to

the assumption of perfect feedback decisions). Another information-lossless front-end is

the unbiased MMSElter whose j-thcolumnf

j

isthe solutionof the SNR maximization

problem

maximize

j

subject to jf

j j

2

+ P

j 1

k=1 jb

j;k j

2

=1

and it is given explicitly by

f

j

= h

I+ P

j 1

k=1 h

k h

H

k i

1

h

j

r

h H

j h

I+ P

j 1

k=1 h

k h

H

k i

1

h

j

(9)

where h

j

denotes the j-thcolumnof H. In this case, the j-th channel SNRis given by

j

=h H

j

"

I+ j 1

X

k=1 h

k h

H

k

#

1

h

j

(10)

and the additive noise is neither Gaussian nor i.i.d. (even assuming perfect feedback

decisions).

IfHhasrankless thant,theWMFisnot denedandtheMMSElterisinformation-

lossy. Subject tomild conditions onthe statisticsof H, the probability that Hhas rank

less than t when r t is zero. Therefore, the above schemes can be practically applied

whenever r t. In the rest of this paper we restrict our treatment to this case, and we

briey address the case r <t inSection 7.

(9)

4 Code Design for the WSTC

Assuming that the parallel channel model with cyclic interleaving (7) holds, the PEP

between twocodewords c;c 0

2C for given channelSNRs

1

;:::;

t

is given by

P(c!c 0

j

1

;:::;

t )=Q

0

@ v

u

t

2 t

X

j=1

j w

j 1

A

(11)

where Q(x)

= R

1

x 1

p

2 e

z 2

=2

dz is the Gaussian tail function and where we dene the

squared Euclidean weight (SEW) w

j as

w

j

= 1

4 N

X

n=1 jx

0

j;n x

j;n j

2

(12)

(the correspondence between symbols of c and c 0

and symbols of X = D

d

(c) and X 0

=

D

d (c

0

) is given by (3)).

A sensible criterionfor the design of the component code C is to maximize the code

block-diversity Æ, dened by

Æ= min

c;c 0

2C:c 0

6=c

jfj 2f1;:::;tg : w

j

6=0gj (13)

that is,tomaximizethe minimumnumberof non-zerorows inthe matrixdierenceD =

X 0

X for each pair of distinct codewords matrices X;X 0

2D

d

(C). The block-diversity

criterion has been investigated in [32, 33, 34] for the design of trellis codes for cyclic

interleaving and/or periodic puncturing. The relationship between the rank-diversity of

a WSTC and the block-diversity of itscomponent code isgiven by the following:

Proposition 1 Consider a code C over X of rate R bit/symbol and block-diversity Æ.

Then, the rank-diversity of the corresponding WSTC S=D

d

(C) satises:

Æ1+

t

1

R

log

2 jXj

: (14)

Moreover, there exist d for which =Æ.

Proof. Consider a pair of distinct codewords c;c 0

2 C, the corresponding matrices

X = D

d

(c) and X 0

= D

d (c

0

) and the dierence matrix D = X 0

X. The rank of D

cannot be larger than the number of non-zero rows of D, therefore Æ. The WSTC S

can be seen as a block code of length t over the extended alphabet X N

, where each row

of X is a symbol. The block-diversity Æ is then the minimum Hamming distance of this

block code. Byapplying the Singleton bound [35], weobtain

NtR N(t Æ+1)

(10)

which impliesthe second inequality in(14).

3

In order to prove the second statement of the proposition, consider a distinct pair of

codewords c;c 0

2C and write their dierencec 0

c by columns,as a tN 00

=t array e

D.

Ifany such e

D has rankÆ,then the statement issatisedford=0. Otherwise, foreach

d 1 the dierence matrix D is obtained by appending to the right of each row of e

D a

tailof (t 1)d zeros and by shifting tothe righteach j-throwby (j 1)dpositions. Let

f1;:::;tgdenotethe set oftheindexes ofnon-zero rowsof e

Dand foreachrowj 2

let `

j and r

j

denote the number of leadingand tailing zeros. Now, let

d=maxf1;minfd

1

;d

2

gg (15)

where we dene

d

1

=max

j;k 2

k<j

`

j

`

k

k j

and

d

2

=max

j;k 2

k<j

r

k r

j

k j

:

By construction, itis immediate to check that the resultingD is either upper triangular

or lower triangular and has rank equal to jj. By taking d to be the maximum of (15)

over alldistinct c;c 0

2Cwe obtain that S=D

d

(C) has rank-diversity Æ.

Remark 3 From Theorem 3.3.1 of [13], we know that for any STC over X with t Tx

antennas and spectral eÆciency =tRthe rank-diversity satises

1+

t

1

R

log

2 jXj

: (16)

Since this isthe sameupperboundonblock-diversitygiven inProposition1,weget that

the wrapping construction incurs no loss of optimalityin terms of rank-diversity (for an

appropriate choice of the delay d). As a matter of fact, while it is diÆcult to construct

codeswithrank-diversityequaltotheupperbound(16),itisveryeasytondtrelliscodes

for which the upper bound (14) on Æ is met with equality, for several coding rates and

values of t. Examples of these codes are tabulated in [32, 34]. Therefore, the wrapping

construction is apowerfultooltoconstruct STC with maximumrank-diversity.

Remark 4 From Lemma 3.3.1 of [13] we know that a trellis STC with rank-diversity

must have constraint length L . 4

This constraint does not apply to WSTC. For

example,thebinary4-stateconvolutionalcode(CC)ofrate1=4withgenerators(5;7;7;7)

(octal notation [30]) has constraint length 3, but the corresponding WSTC for t = 4

antennas with interleaver delay d 2 achieves =4. This fact is explained by noticing

3

Thisupperboundontheblock-diversityhasbeenfoundin[36]and independentlyin [33].

4

Following[37],wedene theconstraintlengthof atrelliscodeasL =

max

+1,where

max is the

(11)

Table 1: Block-diversity for rate 1=2 binary codes tabulated in [30] mapped onto

BPSK/QPSK. Bold numbers denote diversity achieving the bound (14). Code gener-

ators are expressed in octal notation[30].

BPSK QPSK

States Generators t=2 t =4 t=8 t=2 t=4 t=8

4 (5;7) 2 3 4 2 3 3

8 (15;17) 2 3 4 2 3 4

16 (23;35) 2 3 5 2 3 4

32 (53;75) 2 3 5 2 3 4

64 (133;171) 2 3 5 2 3 5

that the diagonal interleaver expands the state space of the overall interleaved code. On

the other hand, the PSP-based VA decoder proposed for WTSCs works on the trellis

of the underlying code C, i.e., it ignores the state space expansion.

5

Therefore, it is

not a priori clear if WSTCs, even if optimal from the rank-diversity point of view, are

going topay alargepenaltywhen PSP-basedVA decodingisusedinsteadof optimalML

decoding. In next sections we shall see that the penalty incurred by the MMSE front-

end and suÆciently large interleaving delay d is practically negligible,while the penalty

incurred by the WMF front-end and any d can be very large.

Remark 5 In [38], a computational eÆcient trellis-based algorithm for computing the

block-diversity of trellis codes with cyclic interleaving is given. In Section 5, we give

another computationally eÆcient method for calculating the block-diversity. By using

this method, we computed the block-diversity for the binary codes of rate 1/4 and 1/2

tabulated in [30] and mapped onto BPSK and QPSK, for dierent values of t. Some

of these results are reported in Tables 1 and 2. We observe that several codes achieve

maximum block-diversity and therefore are good candidates for WSTC.

Example 1 Fig. 5 shows the word-error rate (WER) vs. E

b

=N

0

for WTSCs obtained

from the binaryCC with generators (5;7;7;7)mapped ontoBPSKand transmitted over

a t =r =4 channel, with d=0;1;2. The channel matrix has i.i.d. elements N

C (0;1)

(independent Rayleigh fading). Each transmitted codeword corresponds to128 informa-

tion bits. Decodingisperformedby thePSP-based VAworkingontheunderlying4-state

trellis,withMMSEfront-end. Asdincreases,theslopeoftheWERcurvebecomessteeper

and steeper. In fact, it can be checked that for d = 0 the resulting WSTC has = 2,

for d = 1 has = 3 and for d = 2 has = 4. The solid curve denoted as \QUB" is

5

Again, we stress the analogy of the problem at hand with the case of trellis coding overa nite-

memory ISI channel, where theoptimal ML decoderrequires a numberof states generallylarger than

(12)

Table 2: Block-diversity for rate 1=4 binary codes tabulated in [30] mapped onto

BPSK/QPSK. Bold numbers denote diversity achieving the bound (14). Code gener-

ators are expressed in octal notation[30].

BPSK QPSK

States Generators t=4 t =8 t=4 t=8

4 (5;7;7;7) 4 5 3 5

8 (13;15;15;17) 4 6 4 6

16 (25;27;33;37) 4 7 4 6

32 (53;67;71;75) 4 7 4 7

64 (135;135;147;163) 4 6 3 5

an analytical Quasi-Upper Bound developed in Section 5 assuming the parallel channel

model induced by perfect decision-feedback.

For the sake of comparison, we show the WER performance of the trellis STC or,

equivalently,WSTCwithd=0,obtainedbyformattingthe16-statebinaryCCwithgen-

erators (25;27;33;37) mapped onto BPSK. This code is shown toachieve rank-diversity

= 4 in [22]. As expected, the slope of the error curve for this code is the same of the

4-state WSTC with d=2. We alsoshow the QUB for the 16-statetrellis STC assuming

the parallel channel model induced by perfect decision-feedback with MMSE front-end.

The fact that the simulated ML curve gets asimptotically very close to the QUB makes

usconjecture that thePSP-based VA withMMSEfront-end paysalmostnopenaltywith

respect to optimal ML decoding. An information-theoretic explanation of this fact will

beprovided in Section6.

5 Word-Error Rate Analysis

In this Section, we derive aunion upper bound on the average WER of acode C over X

with block length N 00

cyclicallyinterleaved over t parallel AWGN channels with random

(but constantovereachcodeword)SNRs

1

;:::;

t

. Given the parallelchanneldecompo-

sition (7), this method provides also a true upper bound for the WSTC S= D

d

(C) with

WMFfront-end,subject tothe assumptionof perfect feedback decisions. Forthe MMSE

front-end,theparallelAWGNchannelmodeldoesnot holdexactlyeven assumingperfect

feedback decisions, since the noise is neither Gaussian nor independent. Nevertheless,

numericalresultsshowthat the methodprovidesalways anupperbound alsoforMMSE.

For this reason, inthe followingit willbereferred toas Quasi-Upper Bound (QUB).

For simplicity, we assume that C is geometrically uniform [39], so that we can take

(13)

c 0

j

1

;:::;

t

) given in(11). Byusing the union bound weget

P

w (ej

1

;:::;

t )

X

c 0

6=c

P(c!c 0

j

1

;:::;

t

) (17)

We are mainlyinterested inthe case whereC is atrellis code with trellis termination. A

pairwise error event fc ! c 0

g is said to be simple if the trellis paths corresponding to

codewords c and c 0

split at a certain step 1 n

1

N

00

, merge at step 1 n

2

N

00

,

coincide for all steps n n

1

and n n

2

and remain separated for n

1

< n < n

2 . If a

pairwise error event is not simple,is said to be composite. Wehave the following:

Lemma 1 For any arbitrary

1

;:::;

t

, the RHS of (17) remains an upper bound on the

conditional WER if the sum is restricted to simple error events.

Proof. We have to prove that if fc ! c 0

g is not simple, then it can be eliminated

from the union bound without changing the inequality relation. Consider the dierence

sequence d 0

=c 0

c (represented as acolumn vector), and let

=diag(

p

1

;:::; p

t

; p

1

;:::; p

t

;:::;:::; p

1

;:::; p

t )

beaN 00

N 00

diagonalmatrix containingthe channelamplitudes p

1

;:::; p

t

repeated

periodically N 00

=t times (by denition, N 00

is an integer multiple of t). Assume that

fc!c 0

g iscomposite. Then,thereexisttwocodewords c

1 and c

2

suchthatd 0

=d

1 +d

2

and (d

1 )

H

(d

2

)=0 forall , where d

1

=c

1

c and d

2

=c

2

c. This is because the

path corresponding to a composite event can be always be formed by the concatenation

of two paths, which dier from the reference path on disjoint supports. Therefore, the

non-zero elements in d

1

and d

2

are in dierent positions and the vectors d

1

and d

2

are orthogonal for any diagonal matrix . Let v be the received signal corresponding to

the transmission of c,i.e., v=c+,where is the AWGN vector. The pairwise error

regions corresponding to the events fc ! c 0

g;fc ! c

1

g and fc ! c

2

g are given by the

inequalities

E 0

= fv: 2Refv H

d 0

g+jd 0

j 2

0g

E

1

= fv: 2Refv H

d

1

g+jd

1 j

2

0g

E

2

= fv: 2Refv H

d

2

g+jd

2 j

2

0g (18)

The error event fc!c 0

gcan be removed fromthe union bound if E 0

E

1 [E

2

. In order

to showthis, assume that v2E 0

\(E

1 [E

2 )

c

. Then,

2Refv H

d

1

g+jd

1 j

2

0

2Refv H

d

2

g+jd

2 j

2

0

By summing the above two inequalities and by recalling the orthogonality property of

d

1

and d

2

weget

H 0 0 2

(14)

i.e., v2= E 0

, whichcontradicts the assumption.

Next, we nd a method to enumerate all simple error events. We can represent the

code C over a super-trellis, obtained by lumping together s trellis steps of the original

trellis, so that each step of the super-trellis corresponds to s consecutive steps of the

original trellis,and each step of the supertrellis has t outputsymbols. Forexample, if C

isatrelliscodeofrateb=pbit/symbol,i.e.,onetransitionoftheoriginaltrelliscorresponds

top outputsymbols,then s=t=p(where for simplicitywe assumethat pdivides t). The

length of the terminated super-trellisis N 00

=t steps.

LetA (L)

w1;:::;wt

denote the numberof simple error events of lengthL in the super-trellis

(the lengthL ismeasured insuper-trellissteps) withSEWs w

1

;:::;w

t

startingatstep 1.

Then, the total number of simple error events of length L and SEWs w

1

;:::;w

t

is equal

to (N 00

=t L+1)A (L)

w1;:::;wt

. By restricting the sum in (17) tosimple error events and by

using (11) we have

P

w (ej

1

;:::;

t

)

N 00

=t

X

L=1 (N

00

=t L+1) X

w

1

;:::;w

t A

(L)

w

1

;:::;wt Q

0

@ v

u

t

2 t

X

j=1

j w

j 1

A

N

00

t X

w

1

;:::;wt 1

X

L=1 A

(L)

w1;:::;wt

!

Q 0

@ v

u

t

2 t

X

j=1

j w

j 1

A

(a)

= N

00

t Z

=2

0 T

e

1

=sin 2

;:::;e

t

=sin 2

d (19)

where we dene the code multivariate weightenumerator function[40, 41]

T(W

1

;:::;W

t )=

X

w

1

;:::;wt 1

X

L=1 A

(L)

w1;:::;wt t

Y

j=1 W

w

j

(20)

and where in (a) of (19) we used the preferred integral expression of the Gaussian tail

function [42]

Q(x)= 1

Z

=2

0

exp

x 2

2sin 2

d

The multivariate weight enumerator function counts all the simple error events starting

at step 1 and ending after L steps of the supertrellis, for allpossible length L= 1;2;:::

(noticethatinordertomakethecomputationeasierweextendthesumalsotolengthsL>

N 00

=t). A detailedexampleof the computationof T(W

1

;:::;W

t

)is given inAppendix A.

In order to obtain the average WER, where expectation is done with respect to the

jointstatisticsofthe channel SNRs

1

;:::;

t

(they neednot be independent),wecannot

average term-by-term the weight enumerator. In fact, the parallel channels with cyclic

interleaving and random SNRs belongs to the class of block-fading channels, for which

the union bound averagedwith respect tothe fadingstatisticsmay be very loose oreven

(15)

we followthe approach of [33] and obtain the nal bound onthe averageWER as

P

w

(e)E

"

min (

1;

N 00

t Z

=2

0 T

e 1=sin

2

;:::;e t=sin

2

d )#

(21)

where the expectation is with respect to the joint statistics of

1

;:::;

t

. For small t

and if the joint pdf of

1

;:::;

t

is known, the above expectation can be calculated by

numericalintegrationmethods(asin[33]). Iftislargeorifthejointstatisticsof

1

;:::;

t

is not known explicitly,the bound (21) can be evaluated by Monte Carlo averaging over

a large number of realizations of the channel SNRs. In particular, if (21) is used as an

upper bound on the WER of the wrapped scheme S= D

d

(C), the Monte Carlo average

is obtained by generating a large number of channel matrices H and by calculating the

corresponding SNRseither viathe QR decomposition(inthe case of WMFfront-end) or

by using (10) (in the case of MMSE front-end). In this way, the method can be applied

to any arbitrary channel statistics (for example, H can be generated by a ray-tracing

software in orderto modela particular scattering environment [3]).

Remark 6 The integral with respect to can be eÆciently computed by using Gauss-

Chebychev quadrature rules. In fact, by letting g(sin 2

)=T

e 1=sin

2

;:::;e t=sin

2

we can write

1

Z

=2

0

g(sin 2

)d =

1

2 Z

1

1 g(x

2

) dx

p

1 x 2

1

2n n 1

X

i=0 g

cos 2

2i+1

2n

nodd

= 1

n

(n 1)=2 1

X

i=0 g

cos 2

2i+1

2n

(22)

where the last equality follows by noticing that g(0) = 0. Then, for a n-nodes quadra-

ture rule only (n 1)=2 evaluations of the integrand function are needed. Numerical

experimentsin Section7showthat a very good accuracyis obtained alreadyfor n=7.

Remark 7 In [25, 26, 43], an analysis of the bit-error probability of DDFD based on

union-boundingtechniques is presented. There, the main problem is to characterize the

eect that wrong decisions in the survivors have on the current branch metric: clearly,

wrong symbols onthe survivorterminating ina given state aect the decisionmetricsof

all paths stemmingfromthat state.

We can provide an intuitive explanation of why this problembasically disappears in

the analysis of the WER of PSP-based VA decoding of WSTCs with suÆciently large

interleaving delay d. Consider the situationdepicted in Fig. 6 and the decision-feedback

interference cancellation given in(6). Thereare two types of possible wrongdecisions in

(16)

error event, occurred to the left of the survivors merge point (see Fig. 6). Decisions bx

k;n

in (6) have delay (k j)(td 1) for k = j +1;:::;t. If the survivors merge always at

delay smallerthan td 1, onlyerrors due topast error events aect the metric updating

and the WEP of the PSP-based VA is equivalent to a genie-aided decoder with perfect

feedback decisions. In fact,if xb

k;n 6=x

k;n

a codeword error already occurred for both the

PSP-based and for the genie-aided decoders, while if xb

k;n

= x

k;n

the two decoders have

the same branch metric for the (j;n)-th symbol. In other words, the error propagation

due tonon-perfect feedback decisionshas noinuence onthe WER,provided that td 1

is large enoughin orderto havea very high probabilityof path merge atdelay <td 1.

Notice that for large t, a large interleavingdelay d isnot needed to meetthis condition.

On the contrary, in the case of DDFD for ISI errors are mostly due to unmerged

survivors, because of the time-causality of the ISI channel. For this reason the WER

performance of WSTCs with PSP-based VA decoding can be predicted very well by the

QUB (21) while neglecting feedback decision errors in DDFD yields overly optimistic

results [25, 26,43].

A method for evaluating the block-diversity. Asimplealgebraicmethodtocalcu-

late the block-diversity of a code C with periodicinterleaving over t parallel channels is

given by the following:

Lemma 2 The block-diversity of a code C cyclically interleaved over t channels is equal

toÆ ifandonly ifT(W

1

;:::;W

t

)=0forallvectors (W

1

;:::;W

t

)2f0;1g t

with Hamming

weightlessthanÆ andthereexistsonevector(W

1

;:::;W

t

)2f0;1g t

withHammingweight

Æ for which T(W

1

;:::;W

t )>0.

Proof. Consider the parallel channel model with cyclic interleaving (7) and Fig. 4.

Let some of the SNRs

j

be equal to 0 and others to +1. If all codewords of C are

distinguishableaftertransmissionoverthisparallelon-ochannelsthenP

w (ej

1

;:::;

t )=

0. Otherwise, P

w (ej

1

;:::;

t

) > 0. Following the same steps yielding (19), we obtain a

Cherno union bound [29] onthe WEPas

P

w (ej

1

;:::;

t )

N 00

2t T e

1

;:::;e t

(23)

The Chernounionboundisasymptoticallytightforlarge SNR.Then,we concludethat

the function T e 1

;:::;e t

is zero if P

w (ej

1

;:::;

t

)= 0 and obviously it is positive

if P

w (ej

1

;:::;

t

) > 0. Finally, we notice that letting

j

be 0 or 1 is equivalent to let

W

j

equal to1 or0, respectively.

6 Mutual Information and Outage Probability

Under the quasi-static regime, the best possible achievable WER of the MIMO channel

(17)

dened by

P

out

()=Pr(I

X

(H)) (24)

where I

X

(H) denotes the instantaneous mutual information between the input x

n , uni-

formly distributed overX t

,and the outputy

n

for a given channelmatrix H, given by

I

X

(H)=tlog

2

jXj E

x;z (

log

2

"

X

x2X t

e 2

p

Re

f z

H

H(x 0

x)

g (x

0

x) H

H H

H(x 0

x)

# )

(25)

(E

x;z

fgdenotes expectationwith respect tox and zN

C

(0;I)).

ThecomputationofI

X

(H)foragivenniteanddiscretesignalsetXisaverydemand-

ing task for large t. In fact, (25) does not admit a simple closed form and requires the

evaluationofat-dimensionalcomplexintegral. Moreover, the integrandfunctionrequires

the evaluationofthe quadraticform(x 0

x) H

H H

H(x 0

x) inthe jXj 2t

possiblevaluesof

the dierencevector x 0

x. Hence, ndingupperand lowerboundstoI

X

(H)isessential

for evaluating the limitperformance of STC under the quasi-static regime.

GiventheanalogybetweenthisproblemandtheISIchannelwithi.i.d. discreteinputs,

we shall make use of the bounds derived in [44]. A simple upper bound on I

X

(H) is

obtained byassumingGaussiani.i.d. inputs(GI)withthe sameaverageinputconstraint.

We have

I

X

(H)I

G

(H)=log

2

det I+HH H

(26)

Otherupperand lowerboundscan bederived by consideringthe parallelchannels(7)for

appropriate choice of the SNRs

j

. In general, the instantaneous mutual informationof

the t AWGN parallel channels withinput independent and uniform overX is given by

I

P:ch:

(

1

;:::;

t

)=tlog

2 jXj

t

X

j=1 E

x;

(

log

2 X

x 0

2X e

2 p

j Ref

(x 0

x)g

j jx

0

xj 2

!)

(27)

where N (0;1). The matched-lter bound (MFB) [44] is obtained by neglecting the

mutualinterference of the t transmit antennas. We have

I

X

(H)I mfb

X

(H)=I

P:ch:

jh

1 j

2

;:::;jh

t j

2

(28)

A class of lower bounds can be obtained from the chain-rule of mutual information as

(18)

follows [44]. Forgiven H and any rt matrix Ffor whichv=F H

ywe can write

I

X

(H) = I(x

1

;:::;x

t

;y)

= t

X

j=1 I(x

j

;yjx

j+1

;:::;x

t )

(a)

t

X

j=1 I(x

j

;vjx

j+1

;:::;x

t )

(b)

t

X

j=1 I(x

j

;v

j jx

j+1

;:::;x

t )

= t

X

j=1 I x

j

; v

j p

t

X

k=j+1 b

j;k x

k

!

(29)

where the coeÆcients b

j;k

are dened as in (2) and (6). The inequality (a) holds with

equality if F is information lossless. In particular, for F given by the WMF or by the

MMSE lter dened in Section 2.3 equality holds (recall that we assume r t). The

inequality (b) holds with equality if F is the MMSE lter and the inputs x

1

;:::;x

t are

Gaussian i.i.d. [45, 46].

6

The j-th term in the sum of the last lineof (29) is the mutual

informationof the j-thparallel channelin (7). IfFis the WMF, we get the lowerbound

I

X

(H)I wmf

X

(H)=I

P:ch:

jb

1;1 j

2

;:::;jb

t;t j

2

(30)

IfFistheMMSElter,thenoisein(7)isnotGaussiansincethex

j

'sarediscreteand(27)

is not directly applicable. However, experimentalresults and some analytical arguments

provided in[47]and inanasymptoticformin[48] showthat theresidualinterferenceplus

noise at the output of the MMSE lter is very close to Gaussian, even if the interfering

symbolshaveadiscretedistribution. In[44],experimentalevidenceshowsthatby making

a Gaussian approximation (GA) of the MMSE lter output (referred to as MMSE-GA

in the following) the resulting mutual information is a lower bound to the true mutual

information (in [44] this is referred to as conjectured lower bound). Motivated by these

arguments, we can write I

X (H)

I mmse

X

(H)=I

P:ch:

(

1

;:::;

t

) where

j

is given in(10)

and

means that inequalityis conjectured.

Upper and lower bounds on I

X

(H) yield lower and upper bounds on P

out

(), respec-

tively. Figs.7,8and9showacomparisonofthevariousoutageprobabilityboundsversus

E

b

=N

0

fort =r =4,BPSKmodulationandindependent Rayleighfading. Thecurvesare

6

Thiscanbeprovedinapurelylinear-algebraicwaybyshowingthedeterminantdecomposition

det I+HH H

= t

Y

j=1 (1+

j )

where isgivenby(10).

(19)

obtained by MonteCarlo simulation overseveral independentgenerations of the channel

matrix H. Some pointsfor the true outage probability are alsoshown forcomparison.

For small spectral eÆciency (Fig. 7 and 8) the MMSE-GA (quasi-)upper bound is

very close to the true outage probability,while as increases (Fig. 9)it becomes looser.

Remarkably, in the cases considered here the MFB provides a lower bound tighter than

theGI.ThegapbetweenGIandBPSKincreaseswithand,forallvaluesof,theWMF

suers froma large degradationwith respect to MMSE-GA.

At this point, some qualitativeconsiderations are inorder:

1. In the derivation of (30) and of the MMSE-GA (quasi-)upper bound the perfect

feedback decisions are not assumed, but they are a consequence of the chain-rule

of mutual information (29). Hence, these bounds yield also the performance limit

achievable by the WMF or MMSE front-end with perfect decision-feedback. In

particular, since the PSP-based VA decoding is very robust to feedback decision

errors,weexpect thattheseoutageprobabilityboundspredictwelltheperformance

limitsof WSTCs with PSP-based VA decoding.

2. In the rangeof wherethe MMSE-GAisclose tothe true outageprobability (e.g.,

in Fig. 7 and 8), we expect that good WSTCs designed with respect to the block-

diversity criterion and decoded by PSP-based VA decoding and MMSE front-end

perform closetogoodSTCdesigned withrespecttotherank-diversitycriterionand

decodedby MLdecoding. On thecontrary,weexpectthat PSP-basedVA decoding

suers from a performance penalty with respect to ML decoding over the range of

where the MMSE-GA is far from the true outage probability (e.g., in Fig. 9).

This explain why in Example 1 the QUB assuming MMSE front-end and perfect

feedback decisions is very close to the simulated ML performance for trellis STCs

of rate 1=4.

3. In terms of outage probability, the WMF front-end with perfect feedback decisions

suers from a large gap with respect to the MMSE front-end. On the contrary, it

is well-known that for any xed H of rank t, the two front-ends (assuming perfect

feedback decisions) are capacitywise equivalent as ! 1 (see [44] and references

therein). This apparentparadoxcan be explainedby noticingthat theconvergence

I wmf

X

(H) ! I mmse

X

(H) as ! 1 is non-uniform over the ensemble of random

H. Therefore, P(I wmf

X

(H) ) does not converge to P(I mmse

X

(H) ) even if

I wmf

X

(H)!I mmse

X

(H)for any given H.

Fortheabovereason,weexpectthat alsoforpracticalcodesthe gapbetweenWMF

andMMSEdoesnotvanishas !1. Curiously,several previousworksonlayered

STCs [14, 21] reported results for the WMF front-end, referred to as the \nulling

and canceling" approach. Our results show that it is actually very dangerous to

use WMFrather than MMSE with decision-feedback interference cancellation in a

(20)

Example 2 In this example we showthat the MMSE and the WMFfront-ends have in

facttwooppositebehaviorswithrespecttotheinterleavingdelayd: whiletheperformance

with MMSE front-end improves as the interleaving delay d increases (making feedback

decisions more and more reliable) the performance with WMF front-end improves by

makingdsmall(non-perfectinterference cancellation),providedthattheresultingWSTC

preserves its rank-diversity as d decreases. Fig. 10 and 11 show WER vs. E

b

=N

0

for the

WSTC obtained from the CC with generators (23;35) mapped onto BPSK, in the same

conditions of Example 1,for d=0;1;2;3 withPSP-based VA decoding with MMSEand

WMF front-ends, respectively. In Appendix B we show that this code has rank-diversity

equal to 3 for any d 0, i.e., always equal to the maximum possible rank-diversity for

t = 4 Tx antennas and binary coding of rate R = 1=2. For d =0 the WSTC reduces to

a trellis STC, where the underlying trellis code is given by the equivalent rate 2=4 CC

obtained by lumping two trellis steps of the original rate 1=2 CC. Therefore, exact ML

decodingispossible. WithMMSEfront-end(Fig.10),thePSP-baseddecoderapproaches

exact MLasd increases. On thecontrary, withWMFfront-end (Fig.11),the PSP-based

decoder suers from anincreasing performance gap with respect toexact ML.

This eect can be explained by noticing that the WMF front-end is capacitywise

suboptimal in the presence of perfect feedback decisions. On the contrary, the MMSE

front-endis optimalwith perfect feedback decisions.

7

Finally, this eect is clearly visible only for WSTCs for which =Æ for all d 0. If

increases with d, the performance withWMF front-end isthe result of two contrasting

eects, and itsbehavior isdiÆcult topredict.

7 Performance Examples

InthisSection, theWERperformanceofvariousWSTCsareassessed forquasi-staticfad-

ingchannelswith independent Rayleighfading. The unionboundonthe WERdescribed

in Section 5 is employed (curves denoted by \QUB") and the results are compared with

computersimulationofthefullPSP-basedVAdecoder(curvesdenoted by\SIM").Inour

simulations, each codeword corresponds to 128 information symbols. As a consequence,

the codeword length is variable, depending onthe code rate.

Results for known convolutional codes. Figs. 12,13 and 14show the QUB onthe

WER forsomeWSTC basedonbinary CCs ofrate 1=2and 1=4 mapped ontoBPSKand

QPSK, with t =r=4 and t=r =8. Some pointsobtained by simulating the full PSP-

based VA decoder (with interleaving delay d = 2) is shown for the sake of comparison.

Also, we included the outageprobability with GI and the MMSE-GA conjectured upper

boundwithdiscrete-inputs. Thedierenceinthe slopeofthe WERcurvesforthevarious

codes reects the dierent block-diversitiesinTables1and 2. Remarkably,thereisstilla

7

Strictlyspeaking,thisistrueonlyforGaussian inputs,butasarguedbefore fordiscreteinputsand

(21)

consistent gap (ranging from1.5 to4 dB atWER =10 3

)between these simpleo-the-

shelf codes and the outage curve. This calls for the design of good component codes for

the WSTC scheme.

ComparisonwithV-BLAST. In[21],asimpliedspace-timedecision-feedbackdetec-

tion scheme nicknamed \V-BLAST"is presented. This scheme is equivalent to aWSTC

with d = 0, but decision-feedback interference cancellation is obtained by feeding back

symbol-by-symboldecisions withoutper-survivorprocessing. Since the orderof decisions

is not dictated by the trellis time-ordering of the underlyingcode, decisionsare made in

anorderthat dependsonH,inordertolimiterror propagationinthefeedback. Namely,

the columns of H are permuted so that \QR" decomposition of the permuted matrix

yields WMF channel \gains" such that min

j=1;:::;t jb

j;j j

2

is maximized. Remarkably, this

orderingcan be calculatedby a simple\gready"algorithmwhich maximizesateachstep

j the SNR of the j-th parallelchannel, for j =t;t 1;:::;1 [21]. In the case of MMSE

front-end, the same algorithm yields a sequence of SNRs

j

for which min

j=1;:::;t

j is

maximized asymptotically, for large . As in classical decision feedback-equalization, if

the V-BLAST detector is concatenated with a decoder, hard decisions at the detector

decisionpointare madeonlyfor decision-feedback purpose, but soft values are fedtothe

decoder.

Fig. 15 compares the WER vs. E

b

=N

0

of a WSTC with either MMSE and WMF

front-end obtained from the binary CC with generators (23;35) mapped onto QPSK,

with t = r = 4 antennas with the scheme obtained by concatenating the same trellis

code with the V-BLAST detector. Simulations of the PSP-based VA decoder for the

WSTC (obtained for d = 2) are in perfect agreement with the corresponding QUB.

The V-BLAST performance was obtained by simulation. For comparison, we show also

the WER resulting from a genie-aided V-BLAST detector with ideal feedback decisions

(curves denoted by \Id. F."), which is very similar tothe WER achieved by the WSTC

without genie. This shows that the large performance degradation of V-BLAST with

respect to the WSTC is due to error propagation in the decision feedback, and that

the PSP-based decoder is very eectivein preventing such propagation while the special

detectionorderingofV-BLASTisnot. Moreover,thecomplexityofthe\greedy"detection

orderingalgorithmis larger than the extra complexity of the VA due toPSP, for larget.

Then, WSTC with PSP-based VA decoding isnot only more eective,but mightbealso

simpler than V-BLAST.

Fig.16shows analogousresultsforWSTCandV-BLASTschemesbasedonthebinary

CC with generators (23,35) mapped onto BPSK, with t =r =8 antennas. We see that

also for V-BLAST the advantage of the MMSE versus the WMF front-end can be very

large.

Handling the case r <t via LD precoding. When r <t the low-complexity PSP-

based decoding scheme cannot be applied. Recently, Hochwald and Hassibi proposed

(22)

modulationsymbolsand map them ontothe complex tT matrix signals

S= Q

X

q=1 (x

q C

q +x

q D

q )

whereC

q

andD

q

arecomplextT matricesdeningtheLDcode. Then,Sistransmitted

column-by-columnoverT channeluses. TheresultingspectraleÆciencyis= Q

T

R ,where

R is the rate of the outer code. We can think of LD coding as a precoder which shapes

the original tr complex MIMO channel into a virtual 2Q2rT real MIMO channel.

As long asQ and T are chosen such that Q rT, WSTC with PSP-based decoding can

beapplied asan outercoding/decoding scheme to the LD-precoded channel.

As an example, Fig. 17 shows the WER vs. E

b

=N

0

of a coding scheme obtained by

concatenating the WSTC obtained from the binary CC with generators (23,35) mapped

onto QPSKwith aLD precoder designed in[19] forthe t =4, r=1channel,with Q=4

and T = 4.

8

The resulting spectral eÆciency is = 1 bit/channel use. For the sake of

comparison, we show also the outage curve with GI on the original 41 channel (curve

denoted by \No prec."), the outage curve with GI of the LD-precoded channel, and the

MFBand MMSE-GAoutagecurveboundswith QPSKinputsandLD-precoded channel.

Wenoticethatthe gapbetweenthe GIoutageofthe original41channelandthe WER

of theconcatenated WSTCwithLD-precodingisalmostentirelyduetotheLD precoder,

since the simulated WER with PSP-based VA decoding is less than 1.5 dB away from

the outage ofthe LD-precoded channel,while thegap between LD-precoded and original

channels is about 5 dB for WER10 4

.

The LD precoders in [19], included the one used in this example, where designed in

order to maximizethe average mutual informationat a given SNR, and not tominimize

the outage probability for a given spectral eÆciency. We believe that a better WER

performance for the LD-precoded channel can be achieved by explicitly designing the

precoder inorder to minimizethe outage probability.

8 Conclusions

A new scheme, nicknamed \wrapped" STC was proposed. This scheme generalizes both

trellisandlayeredSTCsanditissuitedtoalargenumberofantennasandlowcomplexity

decoding, based on MMSE decision-feedback coupled with PSP-based Viterbi Decoding.

We showed that any trellis codes with maximal block-diversity can be turned into a

WSTC with maximal rank-diversity, with the advantage that block-diversity is easy to

check and maximal block-diversity is easilyachieved by several well-known trellis codes.

Wealsoshowedvianumericalexperimentsandinformation-theoreticalargumentsthatthe

MMSEdecision-feedbackreceivercoupledwiththePSP-basedVA,providingveryreliable

decisions, performsveryclosetooptimalMLdecoding. On thecontrary,asimilarscheme

withWMFinsteadofMMSEfront-endisverysuboptimalinthequasi-stationaryregime.

8

(23)

We provided a simple and rather accurate (quasi-)upper bound on the performance of

WSTCs and an eÆcientmethodfor calculating the block-diversity of trelliscodes, based

ona multivariateweightenumerator. Finally,weprovided some upperand lowerbounds

to the informationoutage probabilitywith discrete i.i.d. inputs.

Performance examples of the proposed scheme constructed from well-known binary

linear convolutional codes were provided. WSTCs compare very favorably with respect

to coded V-BLAST of similar complexity. In particular, we showed that V-BLAST is

pronetoerrorpropagationinthe feedback even ifthedetectionorderingalgorithmof[21]

isused, whilethe PSP-based decoder isalmostnot aectedby errorpropagationeven for

very smallinterleaving delay.

In the case r < t, where the decision-feedback scheme cannot be used directly, our

scheme is naturally suited to work as outer code where the inner code (pre-coder) is a

linear dispersion code as studiedin [19].

The gap between the WER of WSTCs obtained from simple o-the-shelf codes and

the information outage probability indicates that some work needs still to be done in

designing goodcomponent codes for the WSTC scheme. This is especially true for high-

rate trelliscodes, where thedesign formaximalblock-diversityhas beenrarely addressed

(see [32]).

Appendices

A Calculating the Multivariate Weight Enumerator

Function

For the sake of completeness, we give here an example of calculation of T(W

1

;:::;W

t ).

Furtherdetailsandexamplescanbefound in[40,41]. Consider therate1/2binary linear

CCwith4statesandgenerators(5;7)andlett=4. Fig.18showsthetrellisoftheoriginal

code and thesuper-trellisobtained by lumpingtogether s=2steps ofthe originaltrellis,

sothatt =4outputsymbolscorrespondtoeachtransitionofthesuper-trellis(noticethat

the super-trellishas the samenumberofstatesof theoriginaltrellisbutmoretransitions,

and in generalmay have paralleltransitions even if the originaltrellis has not).

Consider the (0;0) transition, with code symbols (c

1

;c

2

;c

3

;c

4

) = (1;1;1;1) (we as-

sume BPSK modulation with the mapping : f0;1g ! f1g). Any transition of the

super-trellis with code symbols(c 0

1

;c 0

2

;c 0

3

;c 0

4

) is labeled by the monomial Q

t

j=1 W

jc 0

j c

j j

2

=4

j

(transitions between non-connected states are labeled by the zero monomial). Then, we

dene the state variablesV

1

;:::;V

3

corresponding to states 1;:::;3,and we split state 0

intotwo states associated to the input variable X and to the output variable Y respec-

tively. The corresponding state equationis given by

2

6

4 Y

V

1

V

2

V 3

7

5

= 2

6

4

1 W

2 W

3 W

4

W

1 W

2

W

1 W

3 W

4

W

3 W

4

W

2

W

1 W

2 W

3 W

4

W

1

W

1 W

2 W

4 W

1 W

3

W

4

W

2 W

3

W W W W W W W W

3

7

5 2

6

4 X

V

1

V

2

V 3

7

5

(A1)

(24)

anditisobtainedbylettingeachstate/outputvariableequaltothesum overallincoming

transitions ofthe product ofthe transitionweightby the state/inputvariable originating

the transition. The multivariateweight enumerator function isnallygiven by

T(W

1

;W

2

;W

3

;W

4

)=Y=X 1

wherethe term 1eliminatesthecontributionofthecorrectpath,withEuclidean weight

zero. This can be obtained by eliminatingV

1

;V

2

;V

3

from the system of linear equations

(A1).

More ingeneral, by following the above recipe we can always put the state equations

in the 22block form

2

6

4 Y

V

1

.

V

S 1 3

7

5

=

D C

B A

2

6

4 X

V

1

.

V

S 1 3

7

5

(A2)

(S denotes the numberof states),whereD;C;B andA are 11,1(S 1),(S 1)1

and(S 1)(S 1)polynomialmatricesinthevariablesW

1

;:::;W

t

,respectively. Then,

T(W

1

;:::;W

t

)=C(I A) 1

B+D 1 (A3)

In order to compute T(W

1

;:::;W

t

) at W

j

= e

j

=sin 2

, for j = 1;:::;t, for calculating

the union bound(21), itiscomputationallymore convenienttosubstitute the arguments

inside the matrices A;B;Cand D and then evaluate(A3) numerically.

B Rank-diversity of the WSTCs build from code (25,35)

for t = 4

In this appendix we show that the WSTCs build from the binary CC with generators

(23;35) mapped onto BPSKfor transmission over t =4 antennas have all rank-diversity

= Æ = 3, which is the maximum possible (see (14)). Interestingly, this proof shows

through an example that checking the rank-diversity of a trellis STC is a quite involved

task even ina simple binary case.

Proof. We start by proving the statement for d = 0. By lumpingtogether two trellis

stepsoftheoriginaltrellisweobtainanequivalentrate2=416-statescodewhosegenerator

matrix is given by

G(D)=

1+D 2

1+D+D 2

D 1

D 2

D 1+D

2

1+D+D 2

where D is the delay operator. Let b

1 (D)=

P

1

i=0 b

1i D

i

and b

2 (D) =

P

1

i=0 b

2i D

i

denote