Multiple-Antenna Channels
Giuseppe Caire and Giulio Colavolpe
Mobile Communications Group Universita diParma
Institut EURECOM Dipartimentodi Ingegneriadell'Informazione
2229Route des Cretes Parco Areadelle Scienze 181/A
06560 Valbonne, FRANCE 43100 Parma - ITALY
Tel: +33 (0)4 93002904 Tel: +39 0521 905744
Fax: +33 (0)4 93002627 Fax: +39 0521 905758
E-mail: [email protected] Email: [email protected]
March 7, 2001
Abstract
Weproposeanewspace-timecodingschemeforthequasi-staticmultiple-antenna
channelwithperfect channelstate informationat thereceiverand nochannelstate
information at the transmitter. The new scheme includes both trellis space-time
codes and layered space-time codes as special cases. When the number of trans-
mitantennas is notlarger thanthe numberof receive antennas ourscheme can be
eÆcientlydecodedbyminimummean-squareerror(MMSE)decision-feedbackinter-
ference cancellation coupledwithViterbi decodingthroughthe useof per-survivor
processing (PSP). We discussthe code design for the new scheme, and show that
nding codes with optimal diversity is much easier than for conventional trellis
space-time codes. We providean upperboundtotheword-errorrateof ourscheme
whichisbothaccurateandeasytoevaluate. Then,wendupperandlowerbounds
to the information outage probability with discrete i.i.d. inputs (as opposed to
Gaussianinputs,asinmostpreviousworks)and weshowthattheMMSEfront-end
yields a large advantage over the whitened matched lter front-end when coupled
with the decision-feedback PSP-based decoder. Our scheme yields a largeperfor-
mance gain with respect to coded V-BLAST of similar complexity, and it can be
easilycoupledwith therecentlyproposed linear-dispersion precoding to handlethe
caseofanumberofreceiveantennassmallerthanthenumberoftransmitantennas.
1 Introduction
Transmission schemes based on multiple antennas have attracted much attention in the
recentyearsasaviablesolutiontoincrease spectraleÆciencyandperformanceofwireless
channels.
Roughly speaking, works on multiple antennas can be classied depending on the
assumptions on the channel state information (CSI) available at the transmitter and at
the receiver. Theergodiccapacity[1]of afrequencynon-selectivechannelwith ttransmit
and r receive antennasand independent Rayleighfading, notransmitter CSIand perfect
receiverCSIhasbeencalculatedbyTelatarin[2]. Inthiscase,thecapacityforhigh-SNRis
givenbyC =minfr;tglog
2
SNR+O(1)bit/channeluse. Theergodiccapacitywithperfect
CSIboth atthetransmitterand atthereceiverhasthe samehigh-SNRbehavior[3]. The
informationoutageprobability[1](relatedtothe non-ergodic,or\outage"capacity)with
no transmitterCSI and perfect receiver CSIhas been investigated numerically in[2]and
in [4]. Transmit strategies minimizing the information outage probability with perfect
CSI both at the transmitter and atthe receiver have been considered in[5].
Other works assume perfect receiver CSI and partial transmitter CSI [6], frequency-
selectivefadingandacombinationofmultipleantennasand OFDM[7], andlarge-system
limits of physical scattering models, without any assumption of independent Rayleigh
fading, based on large randommatrix theory [8].
Amorerealisticmodelfortime-varyingfadingchannelshasbeenintroducedbyHochwald
and Marzetta in [9]. They considered a block-fading channel constant for T consecutive
channel uses and independent from block to block, where both transmitter and receiver
havenoCSI. Tse and Zheng[10] have shown thatthe high-SNRcapacity ofsuch channel
is C = k(1 k=T)log
2
SNR+O(1), where k = minfr;t;T=2g. They show also that
capacity is maximizedby using nomore than k transmit antennas. In particular, having
t >r antennasdoesnot improvecapacity. In [10,11] itisshownthat thesame high-SNR
capacity behaviorcan beachieved by a\naive"schemethat allocatesk inputdimensions
forchannelestimation(e.g.,by explicitlysendingtrainingsymbols), andbyamismatched
receiver that treats the training-basedestimated channel as if itwas the actual channel.
When the code block length N is much larger than maxfr;tg, and the channel co-
herence interval is T N, one codeword spans a single channel realization. Then, we
are in the presence of a compound channel, i.e., a collection of channels indexed by the
dierentrealizations of the rt channelmatrix. In this case, the (non-ergodic) capacity
and information outage probability of the channel with or without CSI at the receiver
coincide, since the channel matrix process is strongly singular [1].
In practice, the assumption of perfect CSI at the receiver holds approximately when
the channel varies very slowly with respect to the duration of a codeword (quasi-static
assumption). Thisis aquiterealisticassumption inseveral situationswherethe mobility
ofwirelessterminalsislimitedorabsent(e.g.,indoorwirelesslocal-areanetworks,wireless
local loops). On the contrary, the assumption of perfect CSI at the transmitter holds
only if adelay-freeerror-free feedback link fromreceivertotransmitter exists, or if time-
signal in the reverse direction. Motivated by the above considerations, we conclude that
assumingperfect CSIatthe receiverand noCSIatthe transmitterisreasonable andcan
beappliedinseveral practicalsettingsforwhichT N maxfr;tg and wherefeedback
ortime-divisionduplexingcannotbeexploited. Theresultsbasedonthisassumptioncan
be eectively approached in the high-SNR region by the naive scheme based on explicit
training, which is actually the way the large majority of existing wireless systems work
today. Without further questioning about the validity of this model, we adopt it as our
starting point.
Codingschemesforthequasi-staticmultiple-antennachannelwithperfectreceiverCSI
have been proposed inseveral works(see for example [13, 14,15, 16,17,18, 19]). In this
paper we propose a new scheme nicknamed \wrapped" space-time coding that includes
boththetrellisspace-timecodesof[13] andthelayeredspace-timecodesof[14]asspecial
cases. Forthe casert,ourschemecanbeeÆcientlydecodedbyminimummean-square
error (MMSE) decision-feedback interference cancellationcoupled withViterbi decoding,
through the use of per-survivor processing (PSP) [20]. We discuss the code design for
the new scheme, and show that nding codes with optimal diversity is actually much
easier than for conventional trellis space-time codes. We show many exampleswhere the
maximumpossiblediversity isachieved bywell-known trelliscodes. Weprovideanupper
boundtotheword-errorrateofourschemewhichisbothveryclosetosimulationsandeasy
toevaluate(atleastforgeometricallyuniformcodes). Then,inordertounderstandbetter
the behavior of the word-error rate, we nd upper and lower bounds to the information
outage probability with discretei.i.d. inputs. We show that the MMSE front-end yields
a large advantage over the whitened matchedlter (WMF) front-end if coupledwith the
decision-feedback PSP-based decoder. Finally, we provide numerical examples showing
the accuracy of our analytical bound and the eectiveness of the proposed space-time
coding scheme. We compare our scheme with V-BLAST [21], which is also based on
decision-feedback interference cancellationbut itdoesnot exploitPSP,and weshowthat
the latter suers severely from error propagation in the feedback decisions while the
proposedscheme doesnot. Also,weconsiderthe caser <t andwe showthat ourscheme
can workasouter codingwhereinnercoding(or \precoding")isobtained by therecently
proposed linear dispersioncodes [19].
The paper isorganized asfollows: inSection2wereview the basic trellisand layered
space-time coding schemes and we introduce the new \wrapped" scheme. In Section 3
we describe the details of the PSP-based decoding scheme. In Section 4 we discuss the
code design for our scheme and in Section 5 we present the semi-analyticbound on the
word-error rate. In Section 6 we present results on the outage probability with discrete
inputs and in Section7 we showsome numericalresults,the comparison with V-BLAST
and the use oflinear-dispersion precoding. Conclusions are summarizedinSection 8and
some side results are collected inAppendices A and B for the sake of completeness.
2 Coding for the Quasi-Static Multiple-Antenna Chan-
nel
The multiple-input multiple-output (MIMO) system with t transmitting (Tx) and r re-
ceiving (Rx) antennas considered inthis paper isdened by [13, 22, 4,2]
y
n
= p
Hx
n +z
n
; n=1;:::;N (1)
wherex
n 2X
t
isthevectorofmodulationsymbolstransmittedinparallelattimenbythe
Tx antennas, X C denotes a complex modulation signal set with unit average energy,
z
n 2 C
r
is the noise vector i.i.d. N
C
(0;I), y
n 2 C
r
is the corresponding vector of
received signalsamplesattheoutputof theRxantennas,H2C rt
isthe channelmatrix,
and isthe SNRperTxantenna. Weassumethat thechannelmatrixisnormalizedsuch
that 1
rt
traceE[HH H
] =1, so that the average received SNR per Rx antenna is given by
t.
As anticipated in the introduction, we consider the case where H is random but
constant over N maxft;rg channel uses, and we assume that the receiver knows H
perfectly,while the transmitter has noknowledge of H.
A space-time code (STC) for the above channel is a set S X tN
of tN complex
matrices (codewords). Codeword matricesX =[x
1
;:::;x
N
] are transmitted by columns,
in N consecutive channel uses. The STC spectral eÆciency is given by = 1
N log
2 jSj
bit/channeluse. By denition,the informationbit-energy over noise power spectralden-
sity ratio isgiven by E
b
=N
0
=t=.
2.1 Trellis Space-Time Codes
In [13], Tarokh et al. found criteria to design STC. They considered the pairwise-
error probability (PEP) with maximum-likelihood (ML) decoding and, in the case of
Rayleigh/RicianfadingcoeÆcients,theyshowed thatthePEPP(X!X 0
)averagedwith
respect to His upperbounded as
P(X!X 0
)K r
where K is a factor that depends on the codeword dierence matrix D = X 0
X and
on the channel matrix statistics, but it is independent of , and where is the rank
of D. Driven by the above bound, they indicated as the most important criterion for
constructingSTC themaximizationof theminimumrankoveralldistinctX;X 0
2S. We
shall refer tothis minimum rank asthe code rank-diversity.
ForlargeN,STCcanbeconstructedfrommultidimensionaltrelliscodes(M-TCM)[23]
with trellis termination. Examples of such schemes are given in [13, 16, 17,18]. Namely,
consider aM-TCM code CoverXof rateR =b=tbit/symbol,whereeachtrellis stepcor-
responds tob information (input)bits and to t code (output) symbols, and the subcode
of all c2 C leavinga given trellis state s
0
and merging intoa given trellis state s
N after
c as tN matrices, i.e., by transmitting the t symbols produced by the trellis encoder
at each trellis step in parallel on the t Tx antennas. The resulting spectral eÆciency is
given by b bit/channeluse.
1
ThediÆcultyinconstructingthesecodes isthatthe rank-diversity ishardtoevaluate
and it is not easily related to the algebraic properties of the underlying M-TCM code.
In [22], for the class of binary and quaternary trellis codes over Z
2
and Z
4
mapped onto
BPSK and QPSK,respectively,a condition onthe underlyingalgebraiccodes referred to
as the binary-rank criterionis shown to imply the rank-diversity of the resulting STC S
and it is used to construct STCs with full rank-diversity (i.e., with = t). The binary-
rank criterion is much easier to check than the rank-diversity and yields some explicit
general algebraicconstructions.
The maximum-likelihood (ML) decoder for trellis STC can be implemented by the
Viterbi Algorithm (VA) applied on the trellis of the underlyingM-TCM code. However,
for xed code rate R and rank-diversity the decoder complexity grows exponentially
with the number of Tx antennas t. In fact, from Lemma 3.3.2 in [13] we know that the
trellis complexity is lower bounded by 2 tR( 1)
.
Because of the diÆculty in code design (with the exception of constructions given
in[22])andbecauseofdecoding complexity,trellis STCarepracticallyrestrictedtosmall
t.
2.2 Layered Space-Time Codes
In [14], Foschini proposed a STC scheme suited to handle a very large number of Tx
and Rx antennas. In this scheme, M codes C (1)
;:::;C (M)
over X produce independently
codewords c (1)
;:::;c (M)
of length N 0
=td, where d1and M are given integers. These
codewords are diagonallyinterleaved (or\layered"), 2
asshown inFig.1,inorder toform
the tN codeword matrix (with N =d(M +t 1))
X=L
d (c
(1)
;:::;c (M)
)
whereL
d
indicatesthe\layered"diagonalinterleaverwithinterleavingdelayd(seeFig.1).
A reduced-complexity suboptimal decoder for this scheme is obtained by a linear
front-end followed by decision-feedback interference cancellation [14]. The linear front-
end, dened by a matrix F2C rt
,produces the sequence of received vectors
v
n
=F H
y
n
; n=1;:::;N
Then, eachcodeword c (m)
is decoded by taking as observable the sequence of samples
r (m)
`
=v
j;n p
t
X
k=j+1 b
j;k b x
k;n
; `=1;:::;N 0
(2)
1
Forsimplicity, weignoretherate decrease dueto trellistermination sinceit becomes negligiblefor
N much largerthantheTCMencodermemory.
2
In[21]anotherlayeredschemebasedonstandardrectangular(or\vertical")interleavingisconsidered,
where the one-to-oneindex mapping(m;`)$(j;n) is induced by the interleaverL
d ,v
j;n
is the j-th element ofv
n ,b
j;k
is the (j;k)-th elementof the matrix B=F H
H and xb
k;n is
a decision on the (k;n)-th symbol of X. From Fig. 1, we see that the elements x
k;n for
k =j+1;:::;t correspond toeitherzeros(forwhichnodecisionisneeded) ortosymbols
ofcodewordsc (m
0
)
withindexm 0
<m. Therefore, bydecodingthecodewordsinthe order
m =1;:::;M, the decisions needed in (2)are provided by earlierdecoded codewords.
2.3 \Wrapped" Space-Time Codes
In the original layered construction of [14], the length of each component codeword is
N 0
=dt. Because of practical hardware complexity limitationst cannot be a very large
number. Thisimpliesthat inordertohavelong codewords, the interleavingdelay dmust
belarge. Ifinterleavingdelay is anissue,the layered scheme isforced towork withshort
componentcodeblocklengthN 0
. Thismightposeaseriousproblemforusingtrelliscodes
with a large number of states. In fact, the code memory might not be negligible with
respect to N 0
thusyielding a non-negligiblerate loss due totrellis termination.
For this reason, we propose a scheme which keeps the the simplicity of decision-
feedbackdecodingwhileallows forarbitrarilylongcomponentcodewords andsmallinter-
leavingdelay. Interestingly,trellisSTCand layeredSTCare foundtobespecial cases. In
the proposed scheme, a single code C over X produces a codeword c of length N 00
. This
codewordisdiagonallyinterleavedinordertoformthetN codewordmatrixX=D
d (c),
with N =N 00
=t+(t 1)d. The diagonalinterleaver D
d
isdened by
x
j;n
=
c
`j(n)
if 1`
j
(n)N 00
0 otherwise
(3)
for 1j t and 1nN, where
`
j
(n)=[n 1 (j 1)d]t+j: (4)
The codeword matrix X is lled by wrapping the codeword c around its diagonals, as
illustrated by Fig. 2. Hence, this STC scheme shall be referred toas wrapped space-time
code(WSTC).Inthisway,theinterleavingdelaydbecomesafreeparameter,independent
of the componentcodeword block lengthN 00
.
Asa limitingcase,the interleavingdelaymay be alsod =0,i.e., avertical interleaver
may be used. For consistence with the case d>0, where code symbols with lower index
takethe lowerpositions ineach columnof the codeword matrix X (see Fig. 2),for d=0
we assume that the codeword matrix X is lled with the elements of the codeword c as
in Fig. 3. In this case, index `
j
(n) in(3) becomes
`
j
(n)=(n 1)t+t j +1: (5)
Remark 1 If C is a M-TCM code of rate R = b=t, then S = D
0
(C) is a trellis STC. If
C = C (1)
C (M)
(i.e., C is given by the Cartesian product of codes C (1)
;:::;C (M)
),
then S=D
d
(C) =L
d (C
(1)
;:::;C (M)
) isa layered STC. Hence, WSTC is ageneralization
Remark 2 Because of the lower and upper triangles of zero symbols in the codeword
matrix dened by (3), there is an inherent rate loss of (t 1)d=N. This is negligible if
N td. Moreover, if the transmissionof a long sequence of codewords is envisaged, the
codeword matricescan beconcatenated inordertolltheleadingand tailingtrianglesof
zeros, so that norate loss isincurred.
3 Decoding of Wrapped Space-Time Codes
Inthe casewhereCisatrellis code,weproposeareduced-complexitysuboptimaldecoder
obtainedbycombiningdecision-feedbackinterferencecancellationwiththeVA,inanalogy
with the delayed decision-feedback detection (DDFD) approach for ISI channels with
coding (see [25, 26, 27, 28] and references therein). The decoder takesas observable the
sequence of samples
r
`
=v
j;n p
t
X
k=j+1 b
j;k b x
k;n
; `=1;:::;N 00
(6)
wherev
j;n andb
j;k
aredenedasin(2)andwhere1j tand1nN aretheunique
integersfor which `
j
(n)=`. From the index mapping(4)(or (5)for d=0), we see that
the elements x
k;n
for k =j +1;:::;t correspond to either zeros (forwhich nodecision is
needed) ortosymbolsofcwithindex` 0
` td+1(`
0
` 1ford=0). Thesedecisions
are found in the survivor history according to per-survivor processing (PSP) [20]. As an
example, consider the case d >0. The delay for decision xb
k;n
necessary to compute the
observable for symbol c
`j(n)
=x
j;n
is given by (k j)(td 1). If(k j)(td 1) is larger
than the VA decoding delay (typically 5 or 6 times the code constraint length [29]), the
corresponding decisions are reliably obtained from the VA output. If (k j)(td 1) is
not large enough, the correspondingdecisions can be obtained fromthe survivors ending
ineach state ofthe VA.In this way, ifthe correctpath isamongthe survivors, atleast a
subset of the paths are extended by the VA with the correct feedback decisions.
Choice of the linear front-end lter. For the sake of simplicity, the decoder treats
the sequence r
`
asif itwas producedby the virtual scalar-inputadditive-noise channel
r
`
= p
j c
` +
`
; `=1;:::;N 00
(7)
where
`
is assumed i.i.d. N
C
(0;1) and, from(4) or(5), we have that
j =
t j` 1j
t
for d=0
j` 1j
t
+1 for d>0
(jj
t
denotes amodulo t operation). The SNR
j
of the above channel isgiven by
j
=
jb
j;j j
2
jf
j j
2
+ P
j 1
jb
j;k j
2
(8)
wheref
j
isthe j-thcolumnofthe front-endlter F. Becauseofdiagonalinterleaving,the
codeword ciscyclicallyinterleaved overt virtual additivewhite Gaussiannoise(AWGN)
channels with SNRs
1
;:::;
t
, so that exactly N 00
=t symbols are assigned to each chan-
nel j. Notice that (8)is the true SNR of channel j if decisions in (6) are correct, i.e., if
the contributionof past symbols is canceled exactly. Moreover, the noise samples
` are
not Gaussianand notindependent,ingeneral. However, providedthat theseassumptions
hold, this scheme decomposes the MIMO channel(1) into t parallel channels with cyclic
interleaving,as illustrated inFig. 4.
Giventheanalogybetweenthisschemeanddecision-feedbackequalizationofISIchan-
nels, standard choices for the front-end lter matrix F are also inspired by equaliza-
tion [30]. IfHhas rankt, aninformation-losslessfront-end isgiven by the WMFF=Q,
whereH=QBisthe\QR" decomposition[31] ofthechannelmatrix H,whereQ2C rt
has orthonormalcolumnsandB2C tt
isuppertriangular. Inthiscase, the j-thchannel
SNR is given by
j
=jb
j;j j
2
and the additivenoise isexactly Gaussiani.i.d. (subject to
the assumption of perfect feedback decisions). Another information-lossless front-end is
the unbiased MMSElter whose j-thcolumnf
j
isthe solutionof the SNR maximization
problem
maximize
j
subject to jf
j j
2
+ P
j 1
k=1 jb
j;k j
2
=1
and it is given explicitly by
f
j
= h
I+ P
j 1
k=1 h
k h
H
k i
1
h
j
r
h H
j h
I+ P
j 1
k=1 h
k h
H
k i
1
h
j
(9)
where h
j
denotes the j-thcolumnof H. In this case, the j-th channel SNRis given by
j
=h H
j
"
I+ j 1
X
k=1 h
k h
H
k
#
1
h
j
(10)
and the additive noise is neither Gaussian nor i.i.d. (even assuming perfect feedback
decisions).
IfHhasrankless thant,theWMFisnot denedandtheMMSElterisinformation-
lossy. Subject tomild conditions onthe statisticsof H, the probability that Hhas rank
less than t when r t is zero. Therefore, the above schemes can be practically applied
whenever r t. In the rest of this paper we restrict our treatment to this case, and we
briey address the case r <t inSection 7.
4 Code Design for the WSTC
Assuming that the parallel channel model with cyclic interleaving (7) holds, the PEP
between twocodewords c;c 0
2C for given channelSNRs
1
;:::;
t
is given by
P(c!c 0
j
1
;:::;
t )=Q
0
@ v
u
u
t
2 t
X
j=1
j w
j 1
A
(11)
where Q(x)
= R
1
x 1
p
2 e
z 2
=2
dz is the Gaussian tail function and where we dene the
squared Euclidean weight (SEW) w
j as
w
j
= 1
4 N
X
n=1 jx
0
j;n x
j;n j
2
(12)
(the correspondence between symbols of c and c 0
and symbols of X = D
d
(c) and X 0
=
D
d (c
0
) is given by (3)).
A sensible criterionfor the design of the component code C is to maximize the code
block-diversity Æ, dened by
Æ= min
c;c 0
2C:c 0
6=c
jfj 2f1;:::;tg : w
j
6=0gj (13)
that is,tomaximizethe minimumnumberof non-zerorows inthe matrixdierenceD =
X 0
X for each pair of distinct codewords matrices X;X 0
2D
d
(C). The block-diversity
criterion has been investigated in [32, 33, 34] for the design of trellis codes for cyclic
interleaving and/or periodic puncturing. The relationship between the rank-diversity of
a WSTC and the block-diversity of itscomponent code isgiven by the following:
Proposition 1 Consider a code C over X of rate R bit/symbol and block-diversity Æ.
Then, the rank-diversity of the corresponding WSTC S=D
d
(C) satises:
Æ1+
t
1
R
log
2 jXj
: (14)
Moreover, there exist d for which =Æ.
Proof. Consider a pair of distinct codewords c;c 0
2 C, the corresponding matrices
X = D
d
(c) and X 0
= D
d (c
0
) and the dierence matrix D = X 0
X. The rank of D
cannot be larger than the number of non-zero rows of D, therefore Æ. The WSTC S
can be seen as a block code of length t over the extended alphabet X N
, where each row
of X is a symbol. The block-diversity Æ is then the minimum Hamming distance of this
block code. Byapplying the Singleton bound [35], weobtain
NtR N(t Æ+1)
which impliesthe second inequality in(14).
3
In order to prove the second statement of the proposition, consider a distinct pair of
codewords c;c 0
2C and write their dierencec 0
c by columns,as a tN 00
=t array e
D.
Ifany such e
D has rankÆ,then the statement issatisedford=0. Otherwise, foreach
d 1 the dierence matrix D is obtained by appending to the right of each row of e
D a
tailof (t 1)d zeros and by shifting tothe righteach j-throwby (j 1)dpositions. Let
f1;:::;tgdenotethe set oftheindexes ofnon-zero rowsof e
Dand foreachrowj 2
let `
j and r
j
denote the number of leadingand tailing zeros. Now, let
d=maxf1;minfd
1
;d
2
gg (15)
where we dene
d
1
=max
j;k 2
k<j
`
j
`
k
k j
and
d
2
=max
j;k 2
k<j
r
k r
j
k j
:
By construction, itis immediate to check that the resultingD is either upper triangular
or lower triangular and has rank equal to jj. By taking d to be the maximum of (15)
over alldistinct c;c 0
2Cwe obtain that S=D
d
(C) has rank-diversity Æ.
Remark 3 From Theorem 3.3.1 of [13], we know that for any STC over X with t Tx
antennas and spectral eÆciency =tRthe rank-diversity satises
1+
t
1
R
log
2 jXj
: (16)
Since this isthe sameupperboundonblock-diversitygiven inProposition1,weget that
the wrapping construction incurs no loss of optimalityin terms of rank-diversity (for an
appropriate choice of the delay d). As a matter of fact, while it is diÆcult to construct
codeswithrank-diversityequaltotheupperbound(16),itisveryeasytondtrelliscodes
for which the upper bound (14) on Æ is met with equality, for several coding rates and
values of t. Examples of these codes are tabulated in [32, 34]. Therefore, the wrapping
construction is apowerfultooltoconstruct STC with maximumrank-diversity.
Remark 4 From Lemma 3.3.1 of [13] we know that a trellis STC with rank-diversity
must have constraint length L . 4
This constraint does not apply to WSTC. For
example,thebinary4-stateconvolutionalcode(CC)ofrate1=4withgenerators(5;7;7;7)
(octal notation [30]) has constraint length 3, but the corresponding WSTC for t = 4
antennas with interleaver delay d 2 achieves =4. This fact is explained by noticing
3
Thisupperboundontheblock-diversityhasbeenfoundin[36]and independentlyin [33].
4
Following[37],wedene theconstraintlengthof atrelliscodeasL =
max
+1,where
max is the
Table 1: Block-diversity for rate 1=2 binary codes tabulated in [30] mapped onto
BPSK/QPSK. Bold numbers denote diversity achieving the bound (14). Code gener-
ators are expressed in octal notation[30].
BPSK QPSK
States Generators t=2 t =4 t=8 t=2 t=4 t=8
4 (5;7) 2 3 4 2 3 3
8 (15;17) 2 3 4 2 3 4
16 (23;35) 2 3 5 2 3 4
32 (53;75) 2 3 5 2 3 4
64 (133;171) 2 3 5 2 3 5
that the diagonal interleaver expands the state space of the overall interleaved code. On
the other hand, the PSP-based VA decoder proposed for WTSCs works on the trellis
of the underlying code C, i.e., it ignores the state space expansion.
5
Therefore, it is
not a priori clear if WSTCs, even if optimal from the rank-diversity point of view, are
going topay alargepenaltywhen PSP-basedVA decodingisusedinsteadof optimalML
decoding. In next sections we shall see that the penalty incurred by the MMSE front-
end and suÆciently large interleaving delay d is practically negligible,while the penalty
incurred by the WMF front-end and any d can be very large.
Remark 5 In [38], a computational eÆcient trellis-based algorithm for computing the
block-diversity of trellis codes with cyclic interleaving is given. In Section 5, we give
another computationally eÆcient method for calculating the block-diversity. By using
this method, we computed the block-diversity for the binary codes of rate 1/4 and 1/2
tabulated in [30] and mapped onto BPSK and QPSK, for dierent values of t. Some
of these results are reported in Tables 1 and 2. We observe that several codes achieve
maximum block-diversity and therefore are good candidates for WSTC.
Example 1 Fig. 5 shows the word-error rate (WER) vs. E
b
=N
0
for WTSCs obtained
from the binaryCC with generators (5;7;7;7)mapped ontoBPSKand transmitted over
a t =r =4 channel, with d=0;1;2. The channel matrix has i.i.d. elements N
C (0;1)
(independent Rayleigh fading). Each transmitted codeword corresponds to128 informa-
tion bits. Decodingisperformedby thePSP-based VAworkingontheunderlying4-state
trellis,withMMSEfront-end. Asdincreases,theslopeoftheWERcurvebecomessteeper
and steeper. In fact, it can be checked that for d = 0 the resulting WSTC has = 2,
for d = 1 has = 3 and for d = 2 has = 4. The solid curve denoted as \QUB" is
5
Again, we stress the analogy of the problem at hand with the case of trellis coding overa nite-
memory ISI channel, where theoptimal ML decoderrequires a numberof states generallylarger than
Table 2: Block-diversity for rate 1=4 binary codes tabulated in [30] mapped onto
BPSK/QPSK. Bold numbers denote diversity achieving the bound (14). Code gener-
ators are expressed in octal notation[30].
BPSK QPSK
States Generators t=4 t =8 t=4 t=8
4 (5;7;7;7) 4 5 3 5
8 (13;15;15;17) 4 6 4 6
16 (25;27;33;37) 4 7 4 6
32 (53;67;71;75) 4 7 4 7
64 (135;135;147;163) 4 6 3 5
an analytical Quasi-Upper Bound developed in Section 5 assuming the parallel channel
model induced by perfect decision-feedback.
For the sake of comparison, we show the WER performance of the trellis STC or,
equivalently,WSTCwithd=0,obtainedbyformattingthe16-statebinaryCCwithgen-
erators (25;27;33;37) mapped onto BPSK. This code is shown toachieve rank-diversity
= 4 in [22]. As expected, the slope of the error curve for this code is the same of the
4-state WSTC with d=2. We alsoshow the QUB for the 16-statetrellis STC assuming
the parallel channel model induced by perfect decision-feedback with MMSE front-end.
The fact that the simulated ML curve gets asimptotically very close to the QUB makes
usconjecture that thePSP-based VA withMMSEfront-end paysalmostnopenaltywith
respect to optimal ML decoding. An information-theoretic explanation of this fact will
beprovided in Section6.
5 Word-Error Rate Analysis
In this Section, we derive aunion upper bound on the average WER of acode C over X
with block length N 00
cyclicallyinterleaved over t parallel AWGN channels with random
(but constantovereachcodeword)SNRs
1
;:::;
t
. Given the parallelchanneldecompo-
sition (7), this method provides also a true upper bound for the WSTC S= D
d
(C) with
WMFfront-end,subject tothe assumptionof perfect feedback decisions. Forthe MMSE
front-end,theparallelAWGNchannelmodeldoesnot holdexactlyeven assumingperfect
feedback decisions, since the noise is neither Gaussian nor independent. Nevertheless,
numericalresultsshowthat the methodprovidesalways anupperbound alsoforMMSE.
For this reason, inthe followingit willbereferred toas Quasi-Upper Bound (QUB).
For simplicity, we assume that C is geometrically uniform [39], so that we can take
c 0
j
1
;:::;
t
) given in(11). Byusing the union bound weget
P
w (ej
1
;:::;
t )
X
c 0
6=c
P(c!c 0
j
1
;:::;
t
) (17)
We are mainlyinterested inthe case whereC is atrellis code with trellis termination. A
pairwise error event fc ! c 0
g is said to be simple if the trellis paths corresponding to
codewords c and c 0
split at a certain step 1 n
1
N
00
, merge at step 1 n
2
N
00
,
coincide for all steps n n
1
and n n
2
and remain separated for n
1
< n < n
2 . If a
pairwise error event is not simple,is said to be composite. Wehave the following:
Lemma 1 For any arbitrary
1
;:::;
t
, the RHS of (17) remains an upper bound on the
conditional WER if the sum is restricted to simple error events.
Proof. We have to prove that if fc ! c 0
g is not simple, then it can be eliminated
from the union bound without changing the inequality relation. Consider the dierence
sequence d 0
=c 0
c (represented as acolumn vector), and let
=diag(
p
1
;:::; p
t
; p
1
;:::; p
t
;:::;:::; p
1
;:::; p
t )
beaN 00
N 00
diagonalmatrix containingthe channelamplitudes p
1
;:::; p
t
repeated
periodically N 00
=t times (by denition, N 00
is an integer multiple of t). Assume that
fc!c 0
g iscomposite. Then,thereexisttwocodewords c
1 and c
2
suchthatd 0
=d
1 +d
2
and (d
1 )
H
(d
2
)=0 forall , where d
1
=c
1
c and d
2
=c
2
c. This is because the
path corresponding to a composite event can be always be formed by the concatenation
of two paths, which dier from the reference path on disjoint supports. Therefore, the
non-zero elements in d
1
and d
2
are in dierent positions and the vectors d
1
and d
2
are orthogonal for any diagonal matrix . Let v be the received signal corresponding to
the transmission of c,i.e., v=c+,where is the AWGN vector. The pairwise error
regions corresponding to the events fc ! c 0
g;fc ! c
1
g and fc ! c
2
g are given by the
inequalities
E 0
= fv: 2Refv H
d 0
g+jd 0
j 2
0g
E
1
= fv: 2Refv H
d
1
g+jd
1 j
2
0g
E
2
= fv: 2Refv H
d
2
g+jd
2 j
2
0g (18)
The error event fc!c 0
gcan be removed fromthe union bound if E 0
E
1 [E
2
. In order
to showthis, assume that v2E 0
\(E
1 [E
2 )
c
. Then,
2Refv H
d
1
g+jd
1 j
2
0
2Refv H
d
2
g+jd
2 j
2
0
By summing the above two inequalities and by recalling the orthogonality property of
d
1
and d
2
weget
H 0 0 2
i.e., v2= E 0
, whichcontradicts the assumption.
Next, we nd a method to enumerate all simple error events. We can represent the
code C over a super-trellis, obtained by lumping together s trellis steps of the original
trellis, so that each step of the super-trellis corresponds to s consecutive steps of the
original trellis,and each step of the supertrellis has t outputsymbols. Forexample, if C
isatrelliscodeofrateb=pbit/symbol,i.e.,onetransitionoftheoriginaltrelliscorresponds
top outputsymbols,then s=t=p(where for simplicitywe assumethat pdivides t). The
length of the terminated super-trellisis N 00
=t steps.
LetA (L)
w1;:::;wt
denote the numberof simple error events of lengthL in the super-trellis
(the lengthL ismeasured insuper-trellissteps) withSEWs w
1
;:::;w
t
startingatstep 1.
Then, the total number of simple error events of length L and SEWs w
1
;:::;w
t
is equal
to (N 00
=t L+1)A (L)
w1;:::;wt
. By restricting the sum in (17) tosimple error events and by
using (11) we have
P
w (ej
1
;:::;
t
)
N 00
=t
X
L=1 (N
00
=t L+1) X
w
1
;:::;w
t A
(L)
w
1
;:::;wt Q
0
@ v
u
u
t
2 t
X
j=1
j w
j 1
A
N
00
t X
w
1
;:::;wt 1
X
L=1 A
(L)
w1;:::;wt
!
Q 0
@ v
u
u
t
2 t
X
j=1
j w
j 1
A
(a)
= N
00
t Z
=2
0 T
e
1
=sin 2
;:::;e
t
=sin 2
d (19)
where we dene the code multivariate weightenumerator function[40, 41]
T(W
1
;:::;W
t )=
X
w
1
;:::;wt 1
X
L=1 A
(L)
w1;:::;wt t
Y
j=1 W
w
j
j
(20)
and where in (a) of (19) we used the preferred integral expression of the Gaussian tail
function [42]
Q(x)= 1
Z
=2
0
exp
x 2
2sin 2
d
The multivariate weight enumerator function counts all the simple error events starting
at step 1 and ending after L steps of the supertrellis, for allpossible length L= 1;2;:::
(noticethatinordertomakethecomputationeasierweextendthesumalsotolengthsL>
N 00
=t). A detailedexampleof the computationof T(W
1
;:::;W
t
)is given inAppendix A.
In order to obtain the average WER, where expectation is done with respect to the
jointstatisticsofthe channel SNRs
1
;:::;
t
(they neednot be independent),wecannot
average term-by-term the weight enumerator. In fact, the parallel channels with cyclic
interleaving and random SNRs belongs to the class of block-fading channels, for which
the union bound averagedwith respect tothe fadingstatisticsmay be very loose oreven
we followthe approach of [33] and obtain the nal bound onthe averageWER as
P
w
(e)E
"
min (
1;
N 00
t Z
=2
0 T
e 1=sin
2
;:::;e t=sin
2
d )#
(21)
where the expectation is with respect to the joint statistics of
1
;:::;
t
. For small t
and if the joint pdf of
1
;:::;
t
is known, the above expectation can be calculated by
numericalintegrationmethods(asin[33]). Iftislargeorifthejointstatisticsof
1
;:::;
t
is not known explicitly,the bound (21) can be evaluated by Monte Carlo averaging over
a large number of realizations of the channel SNRs. In particular, if (21) is used as an
upper bound on the WER of the wrapped scheme S= D
d
(C), the Monte Carlo average
is obtained by generating a large number of channel matrices H and by calculating the
corresponding SNRseither viathe QR decomposition(inthe case of WMFfront-end) or
by using (10) (in the case of MMSE front-end). In this way, the method can be applied
to any arbitrary channel statistics (for example, H can be generated by a ray-tracing
software in orderto modela particular scattering environment [3]).
Remark 6 The integral with respect to can be eÆciently computed by using Gauss-
Chebychev quadrature rules. In fact, by letting g(sin 2
)=T
e 1=sin
2
;:::;e t=sin
2
we can write
1
Z
=2
0
g(sin 2
)d =
1
2 Z
1
1 g(x
2
) dx
p
1 x 2
1
2n n 1
X
i=0 g
cos 2
2i+1
2n
nodd
= 1
n
(n 1)=2 1
X
i=0 g
cos 2
2i+1
2n
(22)
where the last equality follows by noticing that g(0) = 0. Then, for a n-nodes quadra-
ture rule only (n 1)=2 evaluations of the integrand function are needed. Numerical
experimentsin Section7showthat a very good accuracyis obtained alreadyfor n=7.
Remark 7 In [25, 26, 43], an analysis of the bit-error probability of DDFD based on
union-boundingtechniques is presented. There, the main problem is to characterize the
eect that wrong decisions in the survivors have on the current branch metric: clearly,
wrong symbols onthe survivorterminating ina given state aect the decisionmetricsof
all paths stemmingfromthat state.
We can provide an intuitive explanation of why this problembasically disappears in
the analysis of the WER of PSP-based VA decoding of WSTCs with suÆciently large
interleaving delay d. Consider the situationdepicted in Fig. 6 and the decision-feedback
interference cancellation given in(6). Thereare two types of possible wrongdecisions in
error event, occurred to the left of the survivors merge point (see Fig. 6). Decisions bx
k;n
in (6) have delay (k j)(td 1) for k = j +1;:::;t. If the survivors merge always at
delay smallerthan td 1, onlyerrors due topast error events aect the metric updating
and the WEP of the PSP-based VA is equivalent to a genie-aided decoder with perfect
feedback decisions. In fact,if xb
k;n 6=x
k;n
a codeword error already occurred for both the
PSP-based and for the genie-aided decoders, while if xb
k;n
= x
k;n
the two decoders have
the same branch metric for the (j;n)-th symbol. In other words, the error propagation
due tonon-perfect feedback decisionshas noinuence onthe WER,provided that td 1
is large enoughin orderto havea very high probabilityof path merge atdelay <td 1.
Notice that for large t, a large interleavingdelay d isnot needed to meetthis condition.
On the contrary, in the case of DDFD for ISI errors are mostly due to unmerged
survivors, because of the time-causality of the ISI channel. For this reason the WER
performance of WSTCs with PSP-based VA decoding can be predicted very well by the
QUB (21) while neglecting feedback decision errors in DDFD yields overly optimistic
results [25, 26,43].
A method for evaluating the block-diversity. Asimplealgebraicmethodtocalcu-
late the block-diversity of a code C with periodicinterleaving over t parallel channels is
given by the following:
Lemma 2 The block-diversity of a code C cyclically interleaved over t channels is equal
toÆ ifandonly ifT(W
1
;:::;W
t
)=0forallvectors (W
1
;:::;W
t
)2f0;1g t
with Hamming
weightlessthanÆ andthereexistsonevector(W
1
;:::;W
t
)2f0;1g t
withHammingweight
Æ for which T(W
1
;:::;W
t )>0.
Proof. Consider the parallel channel model with cyclic interleaving (7) and Fig. 4.
Let some of the SNRs
j
be equal to 0 and others to +1. If all codewords of C are
distinguishableaftertransmissionoverthisparallelon-ochannelsthenP
w (ej
1
;:::;
t )=
0. Otherwise, P
w (ej
1
;:::;
t
) > 0. Following the same steps yielding (19), we obtain a
Cherno union bound [29] onthe WEPas
P
w (ej
1
;:::;
t )
N 00
2t T e
1
;:::;e t
(23)
The Chernounionboundisasymptoticallytightforlarge SNR.Then,we concludethat
the function T e 1
;:::;e t
is zero if P
w (ej
1
;:::;
t
)= 0 and obviously it is positive
if P
w (ej
1
;:::;
t
) > 0. Finally, we notice that letting
j
be 0 or 1 is equivalent to let
W
j
equal to1 or0, respectively.
6 Mutual Information and Outage Probability
Under the quasi-static regime, the best possible achievable WER of the MIMO channel
dened by
P
out
()=Pr(I
X
(H)) (24)
where I
X
(H) denotes the instantaneous mutual information between the input x
n , uni-
formly distributed overX t
,and the outputy
n
for a given channelmatrix H, given by
I
X
(H)=tlog
2
jXj E
x;z (
log
2
"
X
x2X t
e 2
p
Re
f z
H
H(x 0
x)
g (x
0
x) H
H H
H(x 0
x)
# )
(25)
(E
x;z
fgdenotes expectationwith respect tox and zN
C
(0;I)).
ThecomputationofI
X
(H)foragivenniteanddiscretesignalsetXisaverydemand-
ing task for large t. In fact, (25) does not admit a simple closed form and requires the
evaluationofat-dimensionalcomplexintegral. Moreover, the integrandfunctionrequires
the evaluationofthe quadraticform(x 0
x) H
H H
H(x 0
x) inthe jXj 2t
possiblevaluesof
the dierencevector x 0
x. Hence, ndingupperand lowerboundstoI
X
(H)isessential
for evaluating the limitperformance of STC under the quasi-static regime.
GiventheanalogybetweenthisproblemandtheISIchannelwithi.i.d. discreteinputs,
we shall make use of the bounds derived in [44]. A simple upper bound on I
X
(H) is
obtained byassumingGaussiani.i.d. inputs(GI)withthe sameaverageinputconstraint.
We have
I
X
(H)I
G
(H)=log
2
det I+HH H
(26)
Otherupperand lowerboundscan bederived by consideringthe parallelchannels(7)for
appropriate choice of the SNRs
j
. In general, the instantaneous mutual informationof
the t AWGN parallel channels withinput independent and uniform overX is given by
I
P:ch:
(
1
;:::;
t
)=tlog
2 jXj
t
X
j=1 E
x;
(
log
2 X
x 0
2X e
2 p
j Ref
(x 0
x)g
j jx
0
xj 2
!)
(27)
where N (0;1). The matched-lter bound (MFB) [44] is obtained by neglecting the
mutualinterference of the t transmit antennas. We have
I
X
(H)I mfb
X
(H)=I
P:ch:
jh
1 j
2
;:::;jh
t j
2
(28)
A class of lower bounds can be obtained from the chain-rule of mutual information as
follows [44]. Forgiven H and any rt matrix Ffor whichv=F H
ywe can write
I
X
(H) = I(x
1
;:::;x
t
;y)
= t
X
j=1 I(x
j
;yjx
j+1
;:::;x
t )
(a)
t
X
j=1 I(x
j
;vjx
j+1
;:::;x
t )
(b)
t
X
j=1 I(x
j
;v
j jx
j+1
;:::;x
t )
= t
X
j=1 I x
j
; v
j p
t
X
k=j+1 b
j;k x
k
!
(29)
where the coeÆcients b
j;k
are dened as in (2) and (6). The inequality (a) holds with
equality if F is information lossless. In particular, for F given by the WMF or by the
MMSE lter dened in Section 2.3 equality holds (recall that we assume r t). The
inequality (b) holds with equality if F is the MMSE lter and the inputs x
1
;:::;x
t are
Gaussian i.i.d. [45, 46].
6
The j-th term in the sum of the last lineof (29) is the mutual
informationof the j-thparallel channelin (7). IfFis the WMF, we get the lowerbound
I
X
(H)I wmf
X
(H)=I
P:ch:
jb
1;1 j
2
;:::;jb
t;t j
2
(30)
IfFistheMMSElter,thenoisein(7)isnotGaussiansincethex
j
'sarediscreteand(27)
is not directly applicable. However, experimentalresults and some analytical arguments
provided in[47]and inanasymptoticformin[48] showthat theresidualinterferenceplus
noise at the output of the MMSE lter is very close to Gaussian, even if the interfering
symbolshaveadiscretedistribution. In[44],experimentalevidenceshowsthatby making
a Gaussian approximation (GA) of the MMSE lter output (referred to as MMSE-GA
in the following) the resulting mutual information is a lower bound to the true mutual
information (in [44] this is referred to as conjectured lower bound). Motivated by these
arguments, we can write I
X (H)
I mmse
X
(H)=I
P:ch:
(
1
;:::;
t
) where
j
is given in(10)
and
means that inequalityis conjectured.
Upper and lower bounds on I
X
(H) yield lower and upper bounds on P
out
(), respec-
tively. Figs.7,8and9showacomparisonofthevariousoutageprobabilityboundsversus
E
b
=N
0
fort =r =4,BPSKmodulationandindependent Rayleighfading. Thecurvesare
6
Thiscanbeprovedinapurelylinear-algebraicwaybyshowingthedeterminantdecomposition
det I+HH H
= t
Y
j=1 (1+
j )
where isgivenby(10).
obtained by MonteCarlo simulation overseveral independentgenerations of the channel
matrix H. Some pointsfor the true outage probability are alsoshown forcomparison.
For small spectral eÆciency (Fig. 7 and 8) the MMSE-GA (quasi-)upper bound is
very close to the true outage probability,while as increases (Fig. 9)it becomes looser.
Remarkably, in the cases considered here the MFB provides a lower bound tighter than
theGI.ThegapbetweenGIandBPSKincreaseswithand,forallvaluesof,theWMF
suers froma large degradationwith respect to MMSE-GA.
At this point, some qualitativeconsiderations are inorder:
1. In the derivation of (30) and of the MMSE-GA (quasi-)upper bound the perfect
feedback decisions are not assumed, but they are a consequence of the chain-rule
of mutual information (29). Hence, these bounds yield also the performance limit
achievable by the WMF or MMSE front-end with perfect decision-feedback. In
particular, since the PSP-based VA decoding is very robust to feedback decision
errors,weexpect thattheseoutageprobabilityboundspredictwelltheperformance
limitsof WSTCs with PSP-based VA decoding.
2. In the rangeof wherethe MMSE-GAisclose tothe true outageprobability (e.g.,
in Fig. 7 and 8), we expect that good WSTCs designed with respect to the block-
diversity criterion and decoded by PSP-based VA decoding and MMSE front-end
perform closetogoodSTCdesigned withrespecttotherank-diversitycriterionand
decodedby MLdecoding. On thecontrary,weexpectthat PSP-basedVA decoding
suers from a performance penalty with respect to ML decoding over the range of
where the MMSE-GA is far from the true outage probability (e.g., in Fig. 9).
This explain why in Example 1 the QUB assuming MMSE front-end and perfect
feedback decisions is very close to the simulated ML performance for trellis STCs
of rate 1=4.
3. In terms of outage probability, the WMF front-end with perfect feedback decisions
suers from a large gap with respect to the MMSE front-end. On the contrary, it
is well-known that for any xed H of rank t, the two front-ends (assuming perfect
feedback decisions) are capacitywise equivalent as ! 1 (see [44] and references
therein). This apparentparadoxcan be explainedby noticingthat theconvergence
I wmf
X
(H) ! I mmse
X
(H) as ! 1 is non-uniform over the ensemble of random
H. Therefore, P(I wmf
X
(H) ) does not converge to P(I mmse
X
(H) ) even if
I wmf
X
(H)!I mmse
X
(H)for any given H.
Fortheabovereason,weexpectthat alsoforpracticalcodesthe gapbetweenWMF
andMMSEdoesnotvanishas !1. Curiously,several previousworksonlayered
STCs [14, 21] reported results for the WMF front-end, referred to as the \nulling
and canceling" approach. Our results show that it is actually very dangerous to
use WMFrather than MMSE with decision-feedback interference cancellation in a
Example 2 In this example we showthat the MMSE and the WMFfront-ends have in
facttwooppositebehaviorswithrespecttotheinterleavingdelayd: whiletheperformance
with MMSE front-end improves as the interleaving delay d increases (making feedback
decisions more and more reliable) the performance with WMF front-end improves by
makingdsmall(non-perfectinterference cancellation),providedthattheresultingWSTC
preserves its rank-diversity as d decreases. Fig. 10 and 11 show WER vs. E
b
=N
0
for the
WSTC obtained from the CC with generators (23;35) mapped onto BPSK, in the same
conditions of Example 1,for d=0;1;2;3 withPSP-based VA decoding with MMSEand
WMF front-ends, respectively. In Appendix B we show that this code has rank-diversity
equal to 3 for any d 0, i.e., always equal to the maximum possible rank-diversity for
t = 4 Tx antennas and binary coding of rate R = 1=2. For d =0 the WSTC reduces to
a trellis STC, where the underlying trellis code is given by the equivalent rate 2=4 CC
obtained by lumping two trellis steps of the original rate 1=2 CC. Therefore, exact ML
decodingispossible. WithMMSEfront-end(Fig.10),thePSP-baseddecoderapproaches
exact MLasd increases. On thecontrary, withWMFfront-end (Fig.11),the PSP-based
decoder suers from anincreasing performance gap with respect toexact ML.
This eect can be explained by noticing that the WMF front-end is capacitywise
suboptimal in the presence of perfect feedback decisions. On the contrary, the MMSE
front-endis optimalwith perfect feedback decisions.
7
Finally, this eect is clearly visible only for WSTCs for which =Æ for all d 0. If
increases with d, the performance withWMF front-end isthe result of two contrasting
eects, and itsbehavior isdiÆcult topredict.
7 Performance Examples
InthisSection, theWERperformanceofvariousWSTCsareassessed forquasi-staticfad-
ingchannelswith independent Rayleighfading. The unionboundonthe WERdescribed
in Section 5 is employed (curves denoted by \QUB") and the results are compared with
computersimulationofthefullPSP-basedVAdecoder(curvesdenoted by\SIM").Inour
simulations, each codeword corresponds to 128 information symbols. As a consequence,
the codeword length is variable, depending onthe code rate.
Results for known convolutional codes. Figs. 12,13 and 14show the QUB onthe
WER forsomeWSTC basedonbinary CCs ofrate 1=2and 1=4 mapped ontoBPSKand
QPSK, with t =r=4 and t=r =8. Some pointsobtained by simulating the full PSP-
based VA decoder (with interleaving delay d = 2) is shown for the sake of comparison.
Also, we included the outageprobability with GI and the MMSE-GA conjectured upper
boundwithdiscrete-inputs. Thedierenceinthe slopeofthe WERcurvesforthevarious
codes reects the dierent block-diversitiesinTables1and 2. Remarkably,thereisstilla
7
Strictlyspeaking,thisistrueonlyforGaussian inputs,butasarguedbefore fordiscreteinputsand
consistent gap (ranging from1.5 to4 dB atWER =10 3
)between these simpleo-the-
shelf codes and the outage curve. This calls for the design of good component codes for
the WSTC scheme.
ComparisonwithV-BLAST. In[21],asimpliedspace-timedecision-feedbackdetec-
tion scheme nicknamed \V-BLAST"is presented. This scheme is equivalent to aWSTC
with d = 0, but decision-feedback interference cancellation is obtained by feeding back
symbol-by-symboldecisions withoutper-survivorprocessing. Since the orderof decisions
is not dictated by the trellis time-ordering of the underlyingcode, decisionsare made in
anorderthat dependsonH,inordertolimiterror propagationinthefeedback. Namely,
the columns of H are permuted so that \QR" decomposition of the permuted matrix
yields WMF channel \gains" such that min
j=1;:::;t jb
j;j j
2
is maximized. Remarkably, this
orderingcan be calculatedby a simple\gready"algorithmwhich maximizesateachstep
j the SNR of the j-th parallelchannel, for j =t;t 1;:::;1 [21]. In the case of MMSE
front-end, the same algorithm yields a sequence of SNRs
j
for which min
j=1;:::;t
j is
maximized asymptotically, for large . As in classical decision feedback-equalization, if
the V-BLAST detector is concatenated with a decoder, hard decisions at the detector
decisionpointare madeonlyfor decision-feedback purpose, but soft values are fedtothe
decoder.
Fig. 15 compares the WER vs. E
b
=N
0
of a WSTC with either MMSE and WMF
front-end obtained from the binary CC with generators (23;35) mapped onto QPSK,
with t = r = 4 antennas with the scheme obtained by concatenating the same trellis
code with the V-BLAST detector. Simulations of the PSP-based VA decoder for the
WSTC (obtained for d = 2) are in perfect agreement with the corresponding QUB.
The V-BLAST performance was obtained by simulation. For comparison, we show also
the WER resulting from a genie-aided V-BLAST detector with ideal feedback decisions
(curves denoted by \Id. F."), which is very similar tothe WER achieved by the WSTC
without genie. This shows that the large performance degradation of V-BLAST with
respect to the WSTC is due to error propagation in the decision feedback, and that
the PSP-based decoder is very eectivein preventing such propagation while the special
detectionorderingofV-BLASTisnot. Moreover,thecomplexityofthe\greedy"detection
orderingalgorithmis larger than the extra complexity of the VA due toPSP, for larget.
Then, WSTC with PSP-based VA decoding isnot only more eective,but mightbealso
simpler than V-BLAST.
Fig.16shows analogousresultsforWSTCandV-BLASTschemesbasedonthebinary
CC with generators (23,35) mapped onto BPSK, with t =r =8 antennas. We see that
also for V-BLAST the advantage of the MMSE versus the WMF front-end can be very
large.
Handling the case r <t via LD precoding. When r <t the low-complexity PSP-
based decoding scheme cannot be applied. Recently, Hochwald and Hassibi proposed
modulationsymbolsand map them ontothe complex tT matrix signals
S= Q
X
q=1 (x
q C
q +x
q D
q )
whereC
q
andD
q
arecomplextT matricesdeningtheLDcode. Then,Sistransmitted
column-by-columnoverT channeluses. TheresultingspectraleÆciencyis= Q
T
R ,where
R is the rate of the outer code. We can think of LD coding as a precoder which shapes
the original tr complex MIMO channel into a virtual 2Q2rT real MIMO channel.
As long asQ and T are chosen such that Q rT, WSTC with PSP-based decoding can
beapplied asan outercoding/decoding scheme to the LD-precoded channel.
As an example, Fig. 17 shows the WER vs. E
b
=N
0
of a coding scheme obtained by
concatenating the WSTC obtained from the binary CC with generators (23,35) mapped
onto QPSKwith aLD precoder designed in[19] forthe t =4, r=1channel,with Q=4
and T = 4.
8
The resulting spectral eÆciency is = 1 bit/channel use. For the sake of
comparison, we show also the outage curve with GI on the original 41 channel (curve
denoted by \No prec."), the outage curve with GI of the LD-precoded channel, and the
MFBand MMSE-GAoutagecurveboundswith QPSKinputsandLD-precoded channel.
Wenoticethatthe gapbetweenthe GIoutageofthe original41channelandthe WER
of theconcatenated WSTCwithLD-precodingisalmostentirelyduetotheLD precoder,
since the simulated WER with PSP-based VA decoding is less than 1.5 dB away from
the outage ofthe LD-precoded channel,while thegap between LD-precoded and original
channels is about 5 dB for WER10 4
.
The LD precoders in [19], included the one used in this example, where designed in
order to maximizethe average mutual informationat a given SNR, and not tominimize
the outage probability for a given spectral eÆciency. We believe that a better WER
performance for the LD-precoded channel can be achieved by explicitly designing the
precoder inorder to minimizethe outage probability.
8 Conclusions
A new scheme, nicknamed \wrapped" STC was proposed. This scheme generalizes both
trellisandlayeredSTCsanditissuitedtoalargenumberofantennasandlowcomplexity
decoding, based on MMSE decision-feedback coupled with PSP-based Viterbi Decoding.
We showed that any trellis codes with maximal block-diversity can be turned into a
WSTC with maximal rank-diversity, with the advantage that block-diversity is easy to
check and maximal block-diversity is easilyachieved by several well-known trellis codes.
Wealsoshowedvianumericalexperimentsandinformation-theoreticalargumentsthatthe
MMSEdecision-feedbackreceivercoupledwiththePSP-basedVA,providingveryreliable
decisions, performsveryclosetooptimalMLdecoding. On thecontrary,asimilarscheme
withWMFinsteadofMMSEfront-endisverysuboptimalinthequasi-stationaryregime.
8
We provided a simple and rather accurate (quasi-)upper bound on the performance of
WSTCs and an eÆcientmethodfor calculating the block-diversity of trelliscodes, based
ona multivariateweightenumerator. Finally,weprovided some upperand lowerbounds
to the informationoutage probabilitywith discrete i.i.d. inputs.
Performance examples of the proposed scheme constructed from well-known binary
linear convolutional codes were provided. WSTCs compare very favorably with respect
to coded V-BLAST of similar complexity. In particular, we showed that V-BLAST is
pronetoerrorpropagationinthe feedback even ifthedetectionorderingalgorithmof[21]
isused, whilethe PSP-based decoder isalmostnot aectedby errorpropagationeven for
very smallinterleaving delay.
In the case r < t, where the decision-feedback scheme cannot be used directly, our
scheme is naturally suited to work as outer code where the inner code (pre-coder) is a
linear dispersion code as studiedin [19].
The gap between the WER of WSTCs obtained from simple o-the-shelf codes and
the information outage probability indicates that some work needs still to be done in
designing goodcomponent codes for the WSTC scheme. This is especially true for high-
rate trelliscodes, where thedesign formaximalblock-diversityhas beenrarely addressed
(see [32]).
Appendices
A Calculating the Multivariate Weight Enumerator
Function
For the sake of completeness, we give here an example of calculation of T(W
1
;:::;W
t ).
Furtherdetailsandexamplescanbefound in[40,41]. Consider therate1/2binary linear
CCwith4statesandgenerators(5;7)andlett=4. Fig.18showsthetrellisoftheoriginal
code and thesuper-trellisobtained by lumpingtogether s=2steps ofthe originaltrellis,
sothatt =4outputsymbolscorrespondtoeachtransitionofthesuper-trellis(noticethat
the super-trellishas the samenumberofstatesof theoriginaltrellisbutmoretransitions,
and in generalmay have paralleltransitions even if the originaltrellis has not).
Consider the (0;0) transition, with code symbols (c
1
;c
2
;c
3
;c
4
) = (1;1;1;1) (we as-
sume BPSK modulation with the mapping : f0;1g ! f1g). Any transition of the
super-trellis with code symbols(c 0
1
;c 0
2
;c 0
3
;c 0
4
) is labeled by the monomial Q
t
j=1 W
jc 0
j c
j j
2
=4
j
(transitions between non-connected states are labeled by the zero monomial). Then, we
dene the state variablesV
1
;:::;V
3
corresponding to states 1;:::;3,and we split state 0
intotwo states associated to the input variable X and to the output variable Y respec-
tively. The corresponding state equationis given by
2
6
6
4 Y
V
1
V
2
V 3
7
7
5
= 2
6
6
4
1 W
2 W
3 W
4
W
1 W
2
W
1 W
3 W
4
W
3 W
4
W
2
W
1 W
2 W
3 W
4
W
1
W
1 W
2 W
4 W
1 W
3
W
4
W
2 W
3
W W W W W W W W
3
7
7
5 2
6
6
4 X
V
1
V
2
V 3
7
7
5
(A1)
anditisobtainedbylettingeachstate/outputvariableequaltothesum overallincoming
transitions ofthe product ofthe transitionweightby the state/inputvariable originating
the transition. The multivariateweight enumerator function isnallygiven by
T(W
1
;W
2
;W
3
;W
4
)=Y=X 1
wherethe term 1eliminatesthecontributionofthecorrectpath,withEuclidean weight
zero. This can be obtained by eliminatingV
1
;V
2
;V
3
from the system of linear equations
(A1).
More ingeneral, by following the above recipe we can always put the state equations
in the 22block form
2
6
6
6
4 Y
V
1
.
.
.
V
S 1 3
7
7
7
5
=
D C
B A
2
6
6
6
4 X
V
1
.
.
.
V
S 1 3
7
7
7
5
(A2)
(S denotes the numberof states),whereD;C;B andA are 11,1(S 1),(S 1)1
and(S 1)(S 1)polynomialmatricesinthevariablesW
1
;:::;W
t
,respectively. Then,
T(W
1
;:::;W
t
)=C(I A) 1
B+D 1 (A3)
In order to compute T(W
1
;:::;W
t
) at W
j
= e
j
=sin 2
, for j = 1;:::;t, for calculating
the union bound(21), itiscomputationallymore convenienttosubstitute the arguments
inside the matrices A;B;Cand D and then evaluate(A3) numerically.
B Rank-diversity of the WSTCs build from code (25,35)
for t = 4
In this appendix we show that the WSTCs build from the binary CC with generators
(23;35) mapped onto BPSKfor transmission over t =4 antennas have all rank-diversity
= Æ = 3, which is the maximum possible (see (14)). Interestingly, this proof shows
through an example that checking the rank-diversity of a trellis STC is a quite involved
task even ina simple binary case.
Proof. We start by proving the statement for d = 0. By lumpingtogether two trellis
stepsoftheoriginaltrellisweobtainanequivalentrate2=416-statescodewhosegenerator
matrix is given by
G(D)=
1+D 2
1+D+D 2
D 1
D 2
D 1+D
2
1+D+D 2
where D is the delay operator. Let b
1 (D)=
P
1
i=0 b
1i D
i
and b
2 (D) =
P
1
i=0 b
2i D
i
denote