Structured multiuser channel estimation for block-synchronous DS/CDMA

(1)

Block-Synchronous DS/CDMA

Giuseppe Caire

Mobile Communications Group

Institut EURECOM

2229 Route des Cretes

06560 Valbonne, France

Tel: +33 (0)4 93002904, Fax: +33 (0)4 93002627

E-mail: [email protected]

Urbashi Mitra

Department of Electrical Engineering

The Ohio State University

756 Dreese Labs, 2015 Neil Avenue, Columbus, OH 43210

Tel: (614) 292-3902, Fax: (614) 292-7596

Email: [email protected]

August 7, 2000

Submitted to IEEE Trans. on Commun.: July 1999.

Revised: August 2000.

(2)

Abstract

Uplinkchannelestimationforablock-synchronouschip-asynchronousDS/CDMA

systemasproposedforthetime-divisionduplexoptionof3rdgenerationcellularsys-

temsisconsidered. Trainingmidamblesareemployedforjointchannelestimationof

all users. Thestandardunstructured approachbasedon modelingtheeectiveuser

channelsasunknownFIRltersis comparedwithtwo structured methodsthatex-

ploitaprioriknowledgeabouttheuserchannelssuchasthemaximumdelay-spread,

the transmit chip-shaping pulse and the path delays. Since these are usually un-

known,a low-complexity estimator for thepath delays of all users is derived from

a maximum-likelihood approach. For all channel estimators, optimal training se-

quences sets basedon perfect root-of-unity sequencesarefound. Forthese optimal

sets,itisshownthatthereductioninchannelestimationmean-squarederrorofthe

structured estimator versus the unstructured estimator is exactly the ratio of the

number of structured parameters to unstructured parameters. Simulation results

showthatstructuredchannelestimationprovideadvantages upto 4dBintermsof

output signal-to-interference plusnoise ratio with respect to unstructured estima-

tion, forlinearMMSE detection. In contrast, forconventional single-usermatched

ltering, unstructuredestimationprovesto besuÆcientlygood.

Keywords: DS/CDMA, channel estimation, multiuser detection.

(3)

1 Introduction

One of the proposals for UMTS, the European 3rd generation mobile communication

system, is based on hybrid TDMA/CDMA (briey denoted as T-CDMA) [1, 2, 3]. In

this scheme, multiple access is regulated by a TDMA slot structure similar to GSM [4],

and time-divisionduplexing (TDD) is adopted. User signals are direct-sequence spread-

spectrum, with a limited processing gain L (typically L = 16). In contrast to standard

TDMA, in T-CDMA multiple users per cell are allowed to transmit over the same time

slot.

In the uplink (mobile-to-base), users have a coarse commontiming reference and are

abletoaligntheirsignal\blocks"(or\bursts") withtheslotreferenceofthe basestation.

Due to imperfect synchronization and to the dierent multipath channels for each user,

the signal blocks experience some misalignment. This misalignment is compensated for

by inserting guard intervals of appropriate duration. Therefore, the system is block-

synchronous, but cannot beconsidered chip-synchronous.

Provided that the blocks are suÆciently short with respect to the channel coherence

time [5], the user channelscan be considered time-invariantover the block duration. On

theotherhand,duetotheTDMAdynamicuserallocationovertheslotsandtothebursty

nature of transmission, tracking the channels from block to block might be infeasible.

Hence, we shall consider blockwise channel estimation, where the receiver estimates the

user channelsblock-by-block without tracking across dierent blocks.

Weconsider training-basedjointchannel estimationof alluplink usersas in[3]. Each

signal block contains a training sequence of known chips, in a xed nominal position

(typically,inthe middleof the block [1,4]). The base-stationreceivermakesuse ofthese

trainingsequences inorder toestimatethe channel impulseresponses ofall users. Least-

Squares (LS) training-basedchannel estimation iswell-known inthe single-user case (see

[6, 7] and references therein) and has been extended to the block-synchronous multiuser

case in [3]. Most literature considers a chip-matched lter front-end, and chip-ratesam-

pling without an explicit timing reference. Under the assumption that the chip-shaping

transmit pulse is bandlimited with non-zero excess bandwidth [5], chip-rate sampling

does not providesuÆcient statisticsfor data detection unless the correcttiming epoch is

chosen. Noticethat,duetothefrequency-selectivechannelsandchip-asynchronous trans-

mission,chip-ratesamplingisalwayssuboptimal. SuÆcientstatisticsfordetectioncan be

(4)

obtained by low-pass ltering and samplingat a frequency larger than the chip-rate. In

this case, the chip-matched lter front-end is to be avoided, since in general it produces

a colored discrete-time noise sequence. We shall consider an ideal low-pass lter front-

end with bandwidth equal to half of the receiver sampling rate. In this way, the noise

aftersamplingiswhite,andLSisequivalenttomaximum-likelihood(ML)estimation(for

Gaussiannoise)withoutrequiringanynoisewhitening. Moreover,the discrete-timechan-

nel responses are naturally represented as polyphase lters and each \sampling phase"

can beestimated independently of the others withoutloss of optimality.

In standard LS (or ML) training-based channel estimation, the user channels are

modeled as FIR lters with unknown coeÆcients [3], and no a prioriinformation about

the channels is exploited. We refer to this approach as \unstructured" channel estima-

tion. More recently, a priori knowledge of the structure of the users channel impulse

responses has been exploited in order to improve estimation. In some works (see for ex-

ample [8, 9, 10, 11]), user channels are modeled as discrete multipath characterized by

a set of path delays and gains and rectangular chip-shaping pulses are assumed. After

chip-matched ltering, the overall impulse responses are linear combinations of delayed

triangular pulses. Since triangular pulses are piecewise linear, the discrete-time overall

channelsare linearfunctionsbothofthe multipathgainsand ofthedelays,andthis isex-

ploitedfor estimation. Withthisapproach the channelsare represented directly interms

of their \physical" parameters (multipathgains anddelays). Then,a minimalnumberof

unknowns needs to be estimated. On the other hand, these methods do not generalize

easily toarbitrary chip-shaping pulses (typically, root-raised-cosine (RRC) pulses with a

givenroll-ofactor[5]). Anothermethod(see[12,13]and referencestherein) exploitsthe

factthatthe discrete-timeoverall userchannelsarefound tolieinthecolumn-spaceofan

a prioriknown matrixdetermined bythe chip-shaping pulseand by themaximum delay-

spread. This has the advantage of being applicable to general bandlimited chip-shaping

pulses but yields generally more unknowns than the methods based on physical channel

parameterization.

In this paper, we propose two types of \structured" channel estimators which can

be applied to any arbitrary (approximately bandlimited) chip-shaping pulse. Our rst

methodisessentiallythemultiuserversionof[12],andexploitsonlythecoarseinformation

represented by the maximum delay-spread and by the chip-shaping pulse. Our second

methodis based onthe physical channel parameterizationand exploits the knowledgeof

(5)

thepathdelaysofallusers. Inpractice, thisinformationisnotusuallyavailable. Thus, we

proposeatwo-stepapproachwhererstthepathdelaysareexplicitlyestimated,andthen

used inthe structuredchannel estimator. StartingfromaML approach,wederive alow-

complexitypathdelayestimatorsuitedforthe rststepofstructured channelestimation.

For both structured and unstructured channel estimation, we discuss the problemof

training sequence optimizationand we show that the rich family of perfect root-of-unity

sequences (PRUS) [14] provides optimal training sequence sets for all desired lengths.

Remarkably, the same set of sequences is optimal for both structured and unstructured

estimation. Ifoptimalsequences are used,weareable toshowthat theestimation mean-

squared error (MSE) reduction provided by structured over unstructured estimation is

given by the ratio between the number of structured and unstructured parameters. This

result is not generally true for anarbitrary choice of the trainingsequences.

At the base-station, the channel estimates are used to compute the coeÆcients of a

bank of receiving lters, whoseoutput is sampledat the symbolrate and sent toa bank

of single-user decoders [15, 16]. In this case, as discussed extensively in [15, 16, 17, 18],

assuming single-user capacity-achieving Gaussian codes 1

with interleaving over a large

numberofslots,thesystemspectraleÆciencyisgivenby=E[log

2

(1 + SINR)]bit/s/Hz,

where is a proportionality constant that depends on the chip-shaping pulse excess

bandwidth[5]andonthe\channelload"(i.e.,thenumberofusersdividedbythespreading

gain). Ontheotherhand,ifeachslotisindependentlyencodedanddecoded,thecodeword

error rateiscloselyrelatedtothe informationoutageprobability[19] and,inthelimitfor

a large number of symbols per slot, the system spectral eÆciency subject to a required

outage probability is given by = log

2

(1+) where is the largest value for which

Pr(SINR ) . In all cases, the system spectral eÆciency is determined by the

cumulative distribution function (cdf) of the SINR at the output of the linear receivers.

Classicalchoicesforthelinearltersareeitherthesingle-usermatchedlter(SUMF)and

the linear minimum MSE (LMMSE) lter [22]. The SUMF is normally implemented in

current DS/CDMA systems as a rake receiver [5], and the LMMSE one of the solutions

proposed for T-CDMA [2].

Motivatedbytheabovearguments,wecompareourestimatorsbyevaluatingtheSINR

cdf atthe outputof the SUMF and LMMSEreceivers when the lter coeÆcientsare cal-

1

With modern concatenatedcodinganditerativedecodingtechniques[20,21]theGaussian capacity

can becloselyapproachedbypracticalnite-complexitycodes.

(6)

culatedfromthechannelestimates. Simulationsshowthatstructuredchannel estimation

provide advantages up to 4 dB in terms of output SINR with respect to unstructured

estimation, for LMMSE detection. In contrast, for conventional SUMF detection, un-

structured estimation provesto be suÆciently good.

Thispaperisorganizedasfollows. InSection2,thesystemmodelisdened. Section3

presents the maximum-likelihoodunstructured channel estimator and the corresponding

optimal training sequences. In Section 4, we derive the structured channel estimators.

In addition,it is shown that the trainingsequences previously found are optimalalso in

this case. Furthermore, a multipath delay estimator is also derived. Section 5 presents

simulation results and inSection 6we providesome concludingremarks.

Notation. i) v N

C

(v;R) means that the random vector v is complex Gaussian

with circularsymmetry, with mean E[v]=v and covariance E[vv H

] vv H

=R. ii) I

n

denotes the nn identity matrix. When the dimension is clear from the context, the

subscript n may be omitted. iii) Æ

i;j

denotes the Kronecker delta symbol, equal to 1 if

i =j and 0 otherwise. iv) / means \proportionalto". v) Superscripts T

and H

indicate

transpose and Hermitiantranspose, respectively. vi) sinc(x)

=sin(x)=(x). vii) and

? denote Kronecker product [23] and convolution, respectively.

2 Signal model

Fig. 1 shows the block diagram of the complex baseband equivalent system at hand,

with K users. Since blockwise channelestimation is considered, we focus onaparticular

\reference" time slot. The k-th user multipath channel impulse response is assumed to

betime-invariantoneach slotand is expressed by

c

k (t)=

P 1

X

p=0 c

k;p

Æ(t

k;p

) (1)

We donot make any explicit distinction between the delay introducedby non-idealuser

synchronization and the delay introduced by multipath propagation. These eects are

incorporated in the channel impulse response so that the delays

k;p

account also for

synchronization errors. Without loss of generality, we assume that all users have the

same number P of paths: if c

k

(t) has less than P paths, the gains corresponding to the

missing paths are zero. Also, we assume that all users have the same transmit power:

(7)

dierentreceived signal-to-noiseratios(SNR) are taken into accountby multiplyingeach

user channelcoeÆcients by the corresponding amplitude factor.

We assume that the system isdesigned tocope with a given maximum overall delay-

spread (a design parameter known a priori [1]). An initial synchronization procedure

(not taken intoaccount here) allows user k to transmitonly if max

p f

k;p

g , i.e., if it

issynchronized tothebase-stationslottimereferencewith overalldelay-spreadnot larger

than [24]. In general, is much shorter than the slot duration but longer than the

chip interval. Then, the system is block-synchronous but not chip-synchronous.

The DS/CDMA signal transmittedby user k is given by

u

k (t)=

X

m b

k [m]

"

L 1

X

`=0 a

k

[mL+`] (t (mL+`)T

c )

#

(2)

where b

k

[m] is the m-th data symbol, belonging to some unit-energy complex signal al-

phabet (e.g., BPSK, QPSK, 16QAM, etc ...), T

c

is the chip interval, a

k

[n] is the n-th

(complex)chip, Listhe spreadinggain and (t)isthe chip-shapingpulse, commontoall

users. The pulse (t) is obtained by truncatinga bandlimited waveform over aninterval

ofdurationT

c

,wherethe integerissuÆcientlylargeinordertomakethebandwidthof

(t)approximatelyequalto(1 + )=(2T

c

)(with0< 1)andsuchthat R

() (+ t)

d

satises approximately the Nyquist criterion [5]. For example, in the UMTS T-CDMA

system (t) is obtained by truncating aRRCpulse [5]with roll-ofactor =0:22 [1].

Each user transmitsa trainingsequence of known chips inaxed positioninthe slot.

During training no data is transmitted (i.e., the data symbols are all equal to 1) and

the number of training chips is not necessarily equalto a multiple of L. Without lossof

generality, wex the time axis originso that the trainingsequence istransmitted onthe

interval [ (Q 1)T

c

;(M 1)T

c

], forintegersQ and M, where

Q=d=T

c

e+ (3)

is chosen to be not smaller than the duration, expressed in chip intervals, of the overall

impulse responses

g

k

(t)= (t)?c

k

(t) (4)

for allusers.

Thereceived signal,given by the superposition of allusers plus Gaussianbackground

(8)

noise (accounting also forouter-cell interference), can be writtenas

r(t)= K

X

k=1 u

k (t)?c

k

(t)+(t) (5)

where(t)isacomplexcircularly-symmetricGaussianprocesswithautocorrelationfunc-

tion E[(t)(t )

]=N

0 Æ().

The baseband receiver front-end is an ideal low-pass lter with bandwidth W=2 and

amplitude 1=

p

W, whose output is sampled at rate W = N

c

=T

c (N

c

2 is an integer),

withoutanyexplicittimingreference. ThechannelestimatorcollectsN

c

M samplesofthe

received signalover the interval [0;MT

c

]. This are given by

r[nN

c

+`]= K

X

k=1 Q 1

X

m=0 g

k;`

[m]a

k

[n m]+[nN

c

+`] (6)

for n = 0;:::;M 1 and ` = 0;:::;N

c

1, where [j] is the discrete-time low-pass

lterednoisesequence withi.i.d. samplesN

C (0;N

0

),andwherewedenethepolyphase

representation of the discrete-time low-pass lteredoverall channelimpulse response

g

k;`

[m]

= 1

p

W g

k ((mN

c

+`)=W) m=0;:::;Q 1 (7)

for ` =0;:::;N

c

1. Forlater use, we collectthe samples of the `-th samplingphase of

the k-thuser channel inthe vector g

k;`

=(g

k;`

[0];:::;g

k;`

[Q 1]) T

. When Q satises(3),

for all realizations of the users synchronization errors and channel multipath delays the

samples (6)contains only the contributionof known training chips (see Fig. 2).

3 Unstructured ML channel estimation

InthissectionwereviewstandardMLchannelestimationasproposedin[3]andwederive

optimal sets of trainingsequences.

The channel estimatorformsthe vectors r

`

=(r[`];r[N

c

+`];:::;r[(M 1)N

c +`])

T

,

for ` =0;:::;N

c

1. After some standard algebra [3, 6, 7], these can be written in the

compact matrix form

r

`

=Ag

` +

`

(8)

where g

`

= (g T

1;`

;:::;g T

K ;`

) T

is the total user channel vector corresponding to the `-th

sampling phase, = ([`];[N +`];:::;[(M 1)N +`]) T

is a vector of i.i.d. noise

(9)

samples,andA=[A

1

;:::;A

K

]isaMKQblockmatrixcontainingonlytrainingchips,

whose k-thM Q block is given by the convolution(Toeplitz)matrix

A

k

= 2

6

4 a

k

[0] a

k

[ 1] a

k

[ Q+1]

a

k

[1] a

k

[0] a

k

[ Q+2]

.

a

k

[M 1] a

k

[M 2] a

k

[M Q]

3

7

5

(9)

Since the noise in (6) is Gaussian and white, the ML estimation of fg

0

;:::;g

N

c 1

g from

the observationfr

0

;:::;r

N

c 1

gsplitsintotheindividualMLestimationsofg

`

withobser-

vationr

`

, for `=0;:::;N

c

1. Moreover, since

`

has i.i.d. components,ML estimation

is equivalentto the simple LSestimation [25]:

b g

`

=arg min

g jr

`

Agj 2

(10)

whose solution isreadily given by

b g

`

=(A H

A) 1

A H

r

`

(11)

where we assumethat A has full column-rank, i.e., rank(A)=KQ.

Theabovemethodisreferredtoas\unstructured" sinceittreatsg

`

asavectorofKQ

unknowns, withouttaking intoaccountthatthe elementsofg

`

depend onthesame setof

\physical" channel parameters (the delays and channel gains of the channel model(1)).

3.1 Optimal training sequences

The estimation error vector e

`

= b

g

` g

`

= (A H

A) 1

A H

`

is N

C

(0;), where

=

E[e

` e

H

`

] = N

0 (A

H

A) 1

. The resulting normalized estimation MSE of the unstructured

estimatoris given by

2

unstr

= 1

KQN

c Nc 1

X

`=0 E[je

` j

2

]= N

0

KQ

Tr (A H

A) 1

(12)

It is well-known [3, 6, 7], that optimal training sequences minimizing 2

unstr

must satisfy

A H

A / I. This fact has been shown either by using matrix inequalities [6] or by using

a matched lter argument [7]. Here, we provide an alternative simple proof by solving a

(10)

TheestimationMSEisinverselyproportionaltothechip-energy. Therefore,weimpose

thetraceconstraintTr(A H

A)=E,whereEcanbeinterpretedasthetotalenergydevoted

to training. Let f 2

i

: i = 1;:::;KQg denote the eigenvalues of A H

A (all positive, by

construction). Optimalsequence sets are solutions of

(

minimize Tr (A H

A) 1

= P

i 1=

2

i

subject to Tr (A H

A)= P

i

2

i

=E

(13)

ByapplyingLagrangemultipliersweobtainthattheminimumisachievedbytheconstant

eigenvalues 2

i

=E=(KQ). Since A H

A is positive denite, there exists a unitary matrix

Q such that A H

A=E=(KQ)QIQ H

, which yieldsA H

A=E=(KQ)I. Hence, in order to

minimizethe MSE, A H

A must be proportionaltothe identity matrix.

The construction of optimal training sequences satisfying A H

A / I has been inves-

tigated in several papers (see [3, 6, 7, 26, 14, 27] and references therein). Most related

research considers sequences constructed from simple symbol alphabets, like BPSK and

QPSK. Unfortunately, this requirement is too restrictive and optimal sequences cannot

befound for most traininglengths M. In some works (see [3,26]), exhaustive search for

BPSK sequences minimizingthe o-diagonalelementsof A H

A ismade forsmallM,and

heuristicconstruction of goodsequences with smallo-diagonalelements of A H

A iscar-

riedout forlargeM. Otherworksproposethe useofnon-constantenvelopesequences [7].

We observe that requiring simple symbol alphabets like BPSK or QPSK does not

bring a real complexity advantage, especially if modern DSP architectures are used for

implementation. Rather,aconstantenvelopeallowsmoreeÆcientutilizationofthetrans-

mitter power amplier[4], and has a major impact onthe battery life of the mobile ter-

minals. In this paper, we require the chips to belong to a N-th root-of-unity alphabet

A

N

= fe j2i=N

: i = 0;:::;N 1g, for some integer N. This choice is enough to obtain

optimal training sequences for any desired training length M, while preserving the con-

stant envelope.

2

In particular, an optimal set of training sequences can be derived from

a single PRUS [14]:

Denition: PerfectRoot-of-UnitySequences. Thesequencex=(x

0

;:::;x

M 1 )2

2

Forthesakeofprecision,wehavetosaythatevenifthechipshaveconstantmagnitude,thetransmit

signal after the chip-shaping lter is not constant-envelope any longer. Then, for a given non-linear

amplier,theremightbesequencesthatperformbetterthanothers(e.g.,avoidingphasetransitionsof

betweenconsecutivesymbols). Thistypeof\ne-tuning"optimizationisbeyondthescopeofthispaper,

(11)

C M

is a PRUS if x

m 2 A

N

, for all m = 0;:::;M 1 and some integer N, and if its

periodic autocorrelation satises

x (n)=

M 1

X

m=0 x

m x

[m n mod M]

=MÆ

n;0

Explicit and simple constructions of PRUSs of any length are provided in [14, 27].

Let x be a PRUS of length M KQ. Then, the k-th user training sequence (a

k

[ Q+

1];:::;a

k

[M 1])is obtained from xas

a

k [m]=

p

E

c x

[m (k 1)Q mod M]

(14)

forallk =1;:::;K andm= Q+1;:::;M 1,whereE

c

isthetransmitchip-energy. By

using (14) into (9), itis easy tocheck that the columns of A are distinct cyclic shifts of

the same PRUS x. Byconstruction, we obtain A H

A =ME

c

I, as desired. The resulting

minimum estimation error isgiven by

2

unstr

= L

M

E

s

N

0

1

(15)

where E

s

=LE

c

is the nominalaverage transmit energy persymbol.

From the implementation point of view, the proposed channel estimation scheme is

extremely simple. Since the columns of A are cyclic shifts of x, the vector A H

r

` can

be seen as the result of the cyclic convolution of x with r

`

. Then, it can be computed

eÆciently in the frequency domain, by using FFT. In particular, the discrete Fourier

transform of x can be precomputed and stored in the receiver. Moreover, no matrix

storage or matrix-vector product are needed since (A H

A) 1

/ I. Finally, the gb

` 's for

` =0;:::;N

c

1can becomputed in parallel,by N

c

identical processors.

4 Structured channel estimation

In this section we propose estimation methods that exploit some a priori information

of the \structure" of the channel. Our rst method (referred to as Type I structured

estimation) exploits coarse a priori knowledge about the channel responses represented

by themaximum delay-spreadandby thechip-shapingpulse (t). Our secondmethod

(referred to asType II structured estimation)exploits ner a prioriknowledge about the

channel responses, represented by the sets of delays f

k;p

: p = 0;:::;P 1g for all

(12)

4.1 Channel structure

By using the factthat (t) isbandlimited 3

and by using (1)and (4)in (7)we can write

explicitly

g

k;`

[m]= Nc 1

X

i=0

(i=W)ec

k [mN

c

+` i] (16)

where we dene the low-pass lteredand sampled channel response

e c

k [j]

= P 1

X

p=0 c

k;p

p

W

sinc (j

k;p

W) (17)

Let e

c

k

=(ec

k

[0];:::;ec

k

[D 1]) T

bethevectorofsignicantchannelresponsesamples,where

D is a suitableinteger known a priori(it depends on), let c

k

=(c

k;0

;:::;c

k;P 1 )

T

and

dene the QD convolution matrix

`

with (i;j)-th element

[

` ]

i;j

=

iN

c

+` j

W

(18)

for i= 0;:::;Q 1 and j = 0;:::;D 1, and the DP interpolation matrix

k with

(j;p)-th element

[

k ]

j;p

=sinc(j

k;p W)=

p

W (19)

Then, (16) and (17) can bewritten inmatrix form as

g

k;`

=

` ec

k

e

c

k

=

k c

k

(20)

respectively. We dene ec

= (ec T

1

;:::;ec T

K )

T

, c

=(c T

1

;:::;c T

K )

T

, the block-diagonalmatrix

=diag(

1

;:::;

K

) of dimension KDKP and the Kroneckerproduct matrix

`

=

I

K

`

of dimension KQKD. We can writeec=c and, by recalling the denition

of g

`

, we get g

`

=

`

ec. Finally, we stack the dierent sampling phases g

`

into a single

vector g

=(g T

0

;:::;g T

Nc 1 )

T

toobtain

g= ec=c (21)

where we dene the KQN

c

KD block matrix

= 2

6

4 0

.

Nc 1 3

7

5

3

Weneglectthedistortionduetotruncation,sincepracticalsystemsaredesignedtokeepthisdistortion

(13)

and where we dene the KQN

c

KP matrix

= . Equations (21) dene the a

priori structure of the channel impulse responses. The matrix is determined by the

chip-shaping pulse (t) and by the maximum delay-spread , therefore it is known by

the receiver. Thematrixisdeterminedalsobythe pathdelays f

k;p

:p=0;:::;P 1g

for all k = 1;:::;K, and by the number of paths P, which are generally not known a

priori.

4.2 Type I structured estimation

Let r

=(r T

0

;:::;r T

N

c 1

) T

and

=( T

0

;:::; T

N

c 1

) T

. Then,

r=[I

Nc

A]g+ (22)

whereAisdened in(9). From therstequalityof(21) wehavethatthe desiredchannel

vector g lies inthe column-spaceof the a prioriknown matrix . If the null-space of

is non-trivial,this informationcan be exploitedto improvechannelestimation.

We observe that might be ill-conditioned for large D and/or large N

c

, as it is

obtained by oversampling the bandlimited waveform (t). In order to cope with this

possibility,we use singular-valuedecomposition (SVD) [28] and re-parameterizeour esti-

mation problem. We can write

=U 0

S 0

(V 0

) H

(23)

where S 0

is 0

0

diagonal with positivedecreasing diagonalelements (singularvalues),

0

is the rank of , and U 0

, V 0

are rectangular matrices with orthonormal columns

and dimension KQN

c

0

and KD 0

, respectively. Then, g = ec = U 0

d 0

, where

d 0

= S 0

(V 0

) H

e

c. Notice that any component of e

c in the null-space of has no impact

on the observed channel response g. Thus, we can rst project ec onto the orthogonal

complement of the null-space of and then estimate its projection d 0

. The proposed

channel estimation algorithmis asfollows:

1. Obtain the ML estimateof d 0

from the observation r as

b

d = arg min

d

jr [I

N

c

A]U 0

dj 2

= (U

0

) H

[I

Nc (A

H

A)]U 0

1

(U 0

) H

[I

Nc A

H

]r (24)

2. Obtain the Type I structured estimate of g as b

g=U 0

b

d.

(14)

4.3 Type II structured estimation

If thedelays

k;p

and the numberof paths P are known, the receivercan compute and

impose that the channel vector g must lieinitscolumn space(see the second equalityof

(21)). By substitutingg =c into(22) weget the linear model

r=[I

N

c

A]c+ (25)

Also in this case it is convenient to re-parameterize the problem by using the SVD

= U

00

S 00

(V 00

) H

, where S 00

is 00

00

diagonal with positive decreasing diagonal el-

ements (singular values), 00

is the rank of , and U 00

, V 00

are rectangular matrices

with orthonormal columnsand dimension KQN

c

00

and KP 00

, respectively. Then,

g =c=U 00

d 00

,where d 00

=S 00

V 00H

c. Since ingeneralneither the

k;p

's nor the number

ofpathsP are aprioriknown,theseparametersmust alsobeestimatedfromthereceived

signal. We propose a two-step approach where rst an estimate b

k;p

of delays

k;p is ob-

tainedand thentheaboveTypeIIstructuredestimatoriscomputedassuming

k;p

=b

k;p .

More precisely, we have:

1. Obtain an estimate the maximum number of paths per user b

P and of the delays

fb

k;p

:p=0;:::; b

P 1g, for allk =1;:::;K.

2. Based on the estimated delays, compute an estimate b

of and the KQN

c b

00

factor b

U 00

in itsSVD.

3. UndertheassumptionU 00

= b

U 00

,obtaintheMLestimateofd 00

fromtheobservation

r as

b

d = arg min

d

r [I

Nc A]

b

U 00

d

2

=

( b

U 00

) H

[I

Nc (A

H

A)]

b

U 00

1

( b

U 00

) H

[I

Nc A

H

]r (26)

4. Obtain the Type II structured estimate of g asgb= b

U 00

b

d.

Since b

is not known a priori, a real-time SVD is needed every time a new estimateof

the users path delays is available.

In general, since U 00

6=

b

U 00

the Type II estimator is biased. We expect that the

performanceoftheproposedestimatordependscriticallyonthequalityofthepreliminary

delayestimation. Ontheotherhand,forperfectknowledgeofthe 's,TypeIIandType

(15)

I estimators have the same form. Therefore, for perfect delay information an optimal

training sequence set for the Type I is also optimalfor the Type II. In the next section,

we nd optimal training sequence sets for Type I and Type II structured estimators

(assuming perfect knowledge of the delays for the latter) and we postpone the delay

estimation problem toSection4.5.

Remark: near-far resistance of channel estimators. In multiuser detection, a

receiver is said to be near-far resistant for user k if its error probability goes to zero as

N

0

! 0 for any choice of the other users received powers [22]. Analogously, we say that

a channel estimatorfor user k is near-far resistant if its estimation MSE goesto zero as

N

0

! 0 for any choice of the other users received powers. In particular, if a multiuser

channelestimatorcanbeputintheformbg=g+e,wheretheerrorvectorehasmeanzero

and covariance matrix such that Tr() ! 0 as N

0

! 0, then it is near-far resistant.

According to the above denition, the unstructured and structured Type I estimators

are near-far resistant while the structured Type II estimator is near-far resistant if it is

unbiased, i.e., if the delays are perfectlyknown.

4.4 Optimal training sequences.

We let U = U 0

and d = d 0

(resp., U = b

U 00

and d = d 00

) for Type I (resp., Type II)

estimation, and for Type II we assume perfect knowledge of the delays, i.e., b

U 00

= U 00

.

The error vector inthe estimation of d isgiven by e= b

d d, N

C

(0;), where

=E[ee H

]=N

0 U

H

I

N

c (A

H

A)

U

1

The error vector in the estimation of g is given by Ue, N

C

(0;UU H

). The resulting

normalized estimation MSE isgiven by

2

struc

= 1

KQN

c

Tr UU

H

= 1

KQN

c

Tr () (27)

where we used the fact that U H

U = I. Then, the optimal choice of training sequences

should minimize the trace of subject to a constraint on the total training energy. We

already know that this is obtained by / I. Fortunately, since U has orthonormal

columns, we have immediately that if A H

A / I then also / I. We conclude that

optimal sequences for unstructured estimation are also optimal for Type I and Type II

(16)

construction of trainingsequences from a single PRUS of length M KQ, as indicated

in (14), can be successfully appliedhere.

The resultingminimum estimationerror is given by

2

struc

= L

M

E

s

N

0

1

KQN

c

(28)

where = 0

(resp., = 00

) is the rank of (resp., ) for Type I (resp., Type II)

estimation.

Fromanimplementationpointofview,withoptimalsequences,thestructuredchannel

estimator is obtained by b

g / UU H

I

N

c A

H

r, where the proportionality constant is

known. This can be interpreted as the linear post-processing dened by the squared

matrixUU H

oftheunstructured channelestimation

I

Nc A

H

r ofSection3. ForType

Iestimation,UU H

canbeprecomputedandstored,andtheextracomplexityisonlygiven

by the matrix transformation. For Type II estimation, extra complexity is alsoincurred

by explicit delay estimation and by the SVD required to calculateU.

Remark: reduction of the estimation MSE. By comparing (15) with (28), we

see that for optimal sequences, 2

struc

= 2

unstr

= =(KQN

c

), i.e., that the estimation MSE

reduction provided by the knowledge of the a prioristructure of the channel is given by

the ratio between thenumberof structured and unstructured unknown parametersto be

estimated.

The MSE reduction provided by Type I structured estimation can be coarsely eval-

uated by the following intuitive argument. If the channel delay-spread is considerably

larger than the chip-shaping pulse duration, we get that Q < D N

c

Q. The Q D

matrices

k;`

dened in (18) for ` =0;:::;N

c

1 are all obtained by samplingthe ban-

dlimitedpulse (t). Each of thesematriceshas full row-rankQ. However, since the rows

of

k;`

for` >0 areobtained by sampling (t `=W)atrate W (largerthan the Nyquist

rate),fortheNyquistsamplingtheorem[5]theyalllieinthelinearspacegeneratedbythe

rows of

k;0

. This is not exactly true because of truncation, however we expect that the

numberofdominantsingularvaluesoftheN

c

QDcombinedmatrix[ T

k;0

;:::; T

k;N

c 1

] T

is Q, so that the rank 0

of is KQ. This yields a MSE reduction factor of about

1=N

c

. Since N

c

2,the improvement inthe estimation MSEis about 3 dB orlarger.

Fig.3shows the squaredsingularvaluesof[ T

k;0

;:::; T

k;Nc 1 ]

T

forN

c

=2and4,Q=

32and RRCchip-shapingpulse with roll-o =0:22 truncated over asupportof=12

(17)

values. In the results of Section 5 we use a threshold criterion for rank determination.

Namely, the rank is estimated as the largest index for which the corresponding squared

singular value is not smaller than 10 3

times the sum of all squared singular values. In

the example ofFig. 3this criterionyieldsanestimated rankequalto 33,inagreementto

what anticipatedby the above intuitive argument.

The MSE reduction provided by Type II structured estimation (with perfect delay

knowledge) is easilyevaluated by noticing that the rank of is not larger than the sum

ofalluserpaths,i.e., 00

KP. Then,thereductionfactorprovidedbyTypeIIestimation

is at least P=(N

c

Q). Forchannels with a smallnumber of dominant paths much smaller

thanthedelay-spread(expressedinchips),suchasinmostETSImodels[29],thepotential

gain ofTypeIIestimation islarge. However, thisdependscriticallyonthechannelmodel

considered and onthe ability of estimating the delays of the dominant pathsof allusers.

4.5 Delay estimation

In this sectionwepropose apathdelayestimatorderived fromthe MLestimation of

=

f

k;p

: k = 1;:::;K; p =0;:::;P 1g fromthe observation bg, given by the estimated

discrete-timelow-pass lteredchannelimpulseresponsesprovided eitherby unstructured

or by Type I structured estimation. This method can be used in the preliminary delay

estimation of the Type II structured channel estimatordescribed inSection4.3.

Consider the preliminarychannel estimategb obtained either by the unstructured es-

timator of Section3 or by the Type I structured estimator. For both estimators we can

write

b

g =g+e=c+e (29)

where e N

C

(0;),independent of the vector of channel gains c. We assume Rayleigh

fading, so that c is also N

C

(0;), where the covariance matrix denes the average

energy for each path (delay-intensity prole [5])) and the correlation between the path

gains. Forgiven delay vector, isxed and bg isconditionallyN

C (0;R

g

()),where

the conditionalcovariancematrix is given by

R

g

()=() () H

+

(18)

Withthe abovestatisticalmodel, the MLestimateof based onthe observation bg is

given by

b

=arg max

b

g H

R 1

g (

)b

g logDet(R

g

()) (30)

This requiresthe maximizationof thelog-likelihoodfunctionoveraKP-dimensionalreal

space. Moreover, the log-likelihoodfunction depends on via a matrix inversion.

Inordertodecreasecomplexity,wemakesomesimplicationsand assumptions. First,

we assume a white error vector (i.e., = 2

e

I). This assumption holds exactly for the

unstructured estimator of Section 3 and approximately for the structured Type I esti-

mator if an optimal training sequence set is used. Next, we assume that the channel

gains for dierent users are mutually independent and that each user channel obeys the

uncorrelated-scattering model[5], so that E[c

k;p c

k 0

;p 0]=

2

k;p Æ

k;k 0Æ

p;p 0

. Then, aftersuitable

reordering ofthe componentsof b

g andby exploitingthe block-matrixstructure of,the

ML joint delayestimator(30) reduces toK independent MLestimators for the

k 's(the

delays of user k),given by

b

k

=arg max

k n

b g

(k)

H

R 1

k (

k )bg

(k)

logDet(R

k (

k ))

o

(31)

where

b g

(k)

=(bg

k;0

[0];:::;bg

k;N

c 1

[0];bg

k;0

[1];:::;bg

k;N

c 1

[1];:::;bg

k;0

[Q 1];:::;bg

k;N

c 1

[Q 1]) T

where

R

k (

k )=

P 1

X

p=0

2

k;p k;p

H

k;p +

2

e

I (32)

and where thevector

k;p

contains the samplesofthe chip-shapingpulse delayed by

k;p .

Namely,the i-th elementof

k;p

is given by

[

k;p ]

i

= (i=W

k;p )=

p

W (33)

for i=0;:::;QN

c 1.

The solution of (31) still requires maximizationover a P-dimensional real space. In

order toobtainanadditionalsimplication,weassumethat the delays

k

are suÆciently

separated, i.e., that min

p6=p 0

j

k;p

k;p 0j

>T

c

. Since the support of (t)has limitedsize

T

c

, the vectors

k;p and

k;p 0

have disjointsupport, i.e., their non-zero components are

indierentpositions. Therefore, H

0

=Æ

p;p 0

(recallthat, because ofthe bandlimited

(19)

assumption, j

k;p j

2

= 1

W P

QN

c 1

i=0

j (i=W

k;p )j

2

' R

j (t)j 2

dt = 1). Under this \sep-

arated delay" assumption, by applying the matrix inversion lemma [25] to (32) we can

write

R 1

k (

k )=

1

2

e

"

I P 1

X

p=0

2

k;p

2

e +

2

k;p k;p

H

k;p

#

(34)

Also, because of the disjoint support, R

k (

k

) is block diagonal with P blocks of size

N

c

N

c

with the form

2

k;p vv

H

+ 2

e

I (35)

where v contains the N

c

non-zero samples of (t). Since in general PN

c

QN

c ,

R

k (

k

) contains alsosome other diagonalblocks with diagonal elements allequal to 2

e .

Hence, Det(R

k (

k

)) isgiven by

Det(R

k (

k

)) =

2QNc

e

P 1

Y

p=0 Det

2

k;p

2

e vv

H

+I

=

2QNc

e

P 1

Y

p=0

2

k;p

2

e +1

(36)

and it is independent of

k

. By neglecting the terms independent of

k

, the following

approximated MLestimator isobtained:

b

k

=arg max

k P 1

X

p=0

2

k;p

2

e +

2

k;p

H

k;p b g

(k)

2

(37)

Wehavenotescaped aP dimensionalmaximizationandtypicallyindependentmaximiza-

tion of each term in (37) yieldsb

k;p

=b,for all p=0;:::;P 1,where bislocatedwith

high probability in the vicinity of the maximum peak of the sampled observed channel

response b

g (k)

. Also, we observe that in practice, both the delay-intensity prole and P

are unknown. However, we propose an approximated algorithm which requires only a

sequence of P one-dimensionalmaximizations.

AssumeP known. First,deneadelay discretizationstep, suchthat T

c

= =N

is an integer multiple of N

c

. Then, for all j = 0;:::;QN

1, dene the vectors v

j of

length QN

c

with i-th component

[v

j ]

i

= (i=W j)=

p

W

for i = 0;:::;QN

c

1. Clearly, v

j

=

k;p

if j =

k;p

, for some j, while if

k;p

is not

(20)

vectorw

0

=gb (k)

andtheset ofdelayindexesS

0

=f0;:::;QN

1g. Forp=0;:::;P 1,

repeat the following steps:

1. Estimate the p-th delay asb

k;p

= b

j

p

, where

b

j

p

=arg max

j2Sp

v H

j w

p

2

(38)

2. Eliminate the eect of the p-th delay from the observed channel impulse response

as

w

p+1

=w

p v

H

b

j

p w

p

jv

b

j

p j

2

!

v

b

jp

(39)

3. Update the delay index set as

S

p+1

=S

p f

b

j

p N

+1;:::; b

j

p +N

1g (40)

Remark: algorithminterpretation. Ifthe\separateddelay"assumptionholds,we

couldmaximize(37)bysearchingoverallpossiblecombinationsofsupportsfS

0

;:::;S

P 1 g

of size kT

c

, such that S

p

[0;QT

c

] and S

p

\S

p 0

= ;, and by maximizing each term

j H

k;p b g

(k)

j 2

with respect to

k;p 2 S

p

. While the \separated delay" assumption is instru-

mental for reducing the likelihood function to a manageable form, it is not generally

satised. We argue that as the delays are more and more separated, the true likelihood

function is closer and closer to the approximation (37). Obviously, if some paths are

removed from the channel response, the separation between the delays of the remaining

paths is increased and the likelihoodfunction for these delays is better approximated by

(37). Then, it is meaningfulto look for the delays in order, starting fromthe path with

the largest gain, and remove it at each p-th iteration the path corresponding to the p-th

estimated delay by projecting the observation vector ontothe orthogonalcomplementof

v

b

jp

. Allinnerproductsv H

j w

p

in(38)canbecomputedbyconvolvingw

p

withapolyphase

version of the chip-matched lter (t), sampledat frequency N

=T

c

. Notice that N

can

bemuchhigherthanN

c

,sothatthetimingresolutionofthedelayestimatorisnotlimited

by the receiversamplingfrequency. Foreachp, weestimate

k;p

asthe multipleof for

whichthe outputof the polyphasechip-matchedlter has maximumsquared magnitude.

Becauseofnoiseandnitetimingresolution,thecancellationofthep-thpatheect isnot

perfect. If thep-th pathgain ismuchlarger thanthe remainingpath gains,the energyof

(21)

the p-th pathafterimperfectcancellationmightbestilllargerthanthe energy ofremain-

ingpaths. Thispreventsthedetection ofotherpaths. Weobservethat,forincorrectpath

cancellation, the delay estimator for

k;p+1

typically nds a non-existent delay (outlier)

veryclose toone of the previously estimateddelays b

k;0

;:::;b

k;p

. In orderto prevent this

eect, in(40) weeliminatefromthe set of alloweddelay indicesthe indicesdieringfrom

b

j

p

by less than N

. In this way, we forcethe delay estimatorto nd delays separated by

at least one chip interval. Notice that alsoa conventional rake receiver cannot individu-

ally track delays separated by less than one chip. In fact, since the channel is convolved

with (t),whichhasbandwidthonlyslightlylargerthan1=(2T

c

),individualestimationof

delays separated by less than one chip inessence violatesthe Nyquist samplingtheorem.

In practice, P is not known a priori. We can either x an arbitrary value for the

maximumnumber b

P ofpathsperuser(inthiscase,wemaymisssignicantcomponentsfor

someusers whilewewastecomplexityforotherusers)orwecanintroduceastoppingrule

inthedelaysearch. Namely,afterremovingthep-thestimatedpath,ifjw

p+1 j

2

=jw

0 j

2

,

delay (p+1)is searched, otherwise the algorithmis terminatedand the number of paths

for user k is set to p. The threshold should be designed according to the estimation

MSE of the preliminary estimate bg, which is known a priori. Basically, we should stop

the delaysearch whenw

p+1

containsonlythe estimationnoise,plus someextra noisedue

to imperfect cancellationof the alreadyestimated paths.

5 Performance with linear detectors

In order to investigate the impact of channel estimation on the receiver performance,

we consider simple FIR (or \one-shot") linear receivers [2, 22]. After low-pass ltering

and sampling at rate W, the samples of the received signal are input to a shift-register

of length M

1 +M

2

+1, for suitable integers M

1

and M

2

. For each m-th symbol the

register contains the window of samples [mLN

c M

1

;mLN

c +M

2

] (referred to as the

receiver \processing window") centered around the m-th symbol interval. The content

of the receiver processingwindowis used toproduce soft estimates of the users symbols,

which are sent to the bank of single-user decoders. Since the channel delay-spread can

be larger than the symbol interval, the processing window should span more than one

symbolinterval,i.e., M +M +1LN [30,31]. Aftersome straightforwardbut tedious

(22)

algebra (see [32, 30] for the details), we can write the content of the receiver processing

windowcorresponding tothe m-thsymbolas the vector

r[m] = K

X

k=1 B

2

X

n= B1 s

k

[m;n]b

k

[m n]+[m] (41)

where [m]N

C (0;N

0

I)and wherethe signalvectorss

k

[m;n] are given by the convolu-

tionofthesegmentofthek-thchipsequence correspondingtothe(m n)-thsymbolwith

the k-thdiscrete-timeoverall channel response. Forperiodicspreading, s

k

[m;n] doesnot

dependonm [33]. Thevectorchannelmodel(41)takesintoaccountboth multiple-access

and inter-symbol interference (MAI and ISI) [32]. In fact, each user contributes to r[m]

with B

1 +B

2

+1 symbols. The integersB

1

and B

2

are relatedtotheprocessing windows

boundaries M

1

and M

2

by [30]

B

1

=

M

1 +N

c

(L+Q 1) 1

N

c L

B

2

=

M

2

N

c L

(42)

The receivermakes use ofthe channelestimates obtained by one of the proposed estima-

tors and computes the signal vectors bs

k

[m;n] by convolving the estimated k-th channel

response with the appropriatesegment of the k-th chip sequence. In the case of periodic

spreading, bs

k

[m;n] can be computed once and used for the whole block. For aperiodic

spreading s

k

[m;n] must be computed for every symbol intervalin the block.

Without loss of generality, we focus on the linear detection of symbol b

1

[m]. A soft

estimateof b

1

[m] isobtained by the linear lteringoperationz

1

[m]=h

1 [m]

H

r[m], where

h

1

[m] is the vector of lter coeÆcients for the m-th symbol of user 1. The SINR at the

lter output isgiven by [15]

SINR

1 [m]=

jh

1 [m]

H

s

1 [m;0]j

2

N

0 jh

1 [m]j

2

+ P

K

k=1 P

B

2

n= B

1 jh

1 [m]

H

s

k [m;n]j

2

jh

1 [m]

H

s

1 [m;0]j

2

(43)

Here, we consider SUMF and LMMSE ltering [22]. Approximations of the SUMF and

of the LMMSElters are obtained fromthe vectorsbs

k

[m;n] as

b

h

1;sumf

[m] = bs

1 [m;0]

b

h

1;mmse

[m] = b

R 1

r [m]bs

1

[m;0] (44)

(23)

where

b

R

r [m]=

K

X

k=1 B2

X

n= B

1 bs

k

[m;n]bs

k [m;n]

H

+N

0 I

is the covariance matrix of r[m] assumingbs

k

[m;n] =s

k [m;n].

We consider a system with K = 8 users, spreading gain L = 16, RRC chip-shaping

pulse with roll-o = 0:22, truncated over =12 chips, and maximum channel delay-

spreadof=20T

c

. Withthisdelay-spread,user blocksmaybemisalignedbymorethan

onesymbolandISIisnotatallnegligible,incontrasttowhatisnormallyassumedinmost

DS/CDMAliterature. ThemaximumchannellengthisQ=20+12=32chips. Wechoose

traininglengthM =256, i.e.,the minimumlength necessary forestimating 8channelsof

length 32. For each block and for each user k, a random channel impulse response c

k (t)

is generated according to the model (1), where the number of paths P is random and

uniformly distributed over the integers f1;:::;6g, the delays

k;p

are independently and

uniformlydistributed on [0;] and the gains c

k;p

are independent N

C (0;

2

k;p

), where

2

k;p

=A

k exp

k;p

min;k

max;k

min;k

(

min;k and

max;k

denotetheminimumandthemaximumdelaysofuserk). Theamplitude

factor A

k

is chosen such that P

P 1

p=0

2

k;p

=E

k

,where E

k

=N

0

isthe received average SNR

of user k.

AsillustratedintheIntroduction, thesystem spectraleÆciency (assumingsingle-user

decoding) is determined by the SINR cdf. We use Monte Carlo simulation to evaluate

the cdfF

sinr

()=Pr(SINR

1

[m]) over5000independent slots. Figs. 4and 5showthe

SINR cdf for the SUMF receiver in the case of slowand fast power control, respectively.

In the case of slowpowercontrol, E

k

=N

0

is set to10 dBfor allusers withoutany further

normalization of the path gains. Then, the instantaneous user received power uctuates

considerably from block to block, because of the uncompensated Rayleighfading. In the

case of fast power control, E

k

=N

0

is set as before but the channel gains are normalized

such that P

P 1

p=0 jc

k;p j

2

=E

k

. This could be obtainedin practice by a TDDsystem where

each user measures the signal powerreceived on the downlink and exploits reciprocity in

order to compensate instantaneously for the Rayleigh fading block by block. The SINR

cdf yieldsthe block outageprobability,dened asthe probabilitythat the SINRis below

a xed threshold . The horizontal dashed line corresponds to F

sinr

() = 10 1

. For

1

(24)

unstructured estimator, about 0.6 dB for the Type I structured estimator and 0.2 dB

for the Type II structured estimator(the latter employs the delay estimatordescribed in

Section4.5, with axednumberofpaths peruser b

P =4). Forcomparison,the SINR cdf

for Type II estimation with perfect knowledge of the delays is shown. The loss between

perfect and imperfect delay knowledge isabout 0.1dB.

Figs. 6 and 7 show analogous results for the LMMSE receiver. In this case, for a

10 1

outage probability, non-ideal channel knowledge incurs about a 4.5 dB loss for the

unstructured estimator,about 2.6dB for the TypeI structured estimatorand 1.0dB for

the Type II structured estimator. The losscaused by imperfect delay knowledge for the

Type II estimatorisabout 0.4 dB.

Finally,Figs. 8and9showresults forthe LMMSEreceiverassumingimbalanceduser

powers. Namely, we assume that E

k

=N

0

=10dB for users k = 1;:::;4 and E

k

=N

0

=20

dB for users k =5;:::;8. Slow and fast power controlare considered, respectively. The

imbalanced-power case ismotivated here by the possibilityof accommodatingusers with

dierent service requirements. In this case, the performance of all estimators is almost

unchanged with respect to the case of imbalanced powers, with the dierence that the

degradation suered from imperfect delay knowledge in Type II estimation is slightly

increased (about 0.6 dB). Performance for the SUMF in the imbalanced-power scenario

isnot shown since theresultingSINR isverylow, and thereislimitedutility inusingthe

SUMF forlow-powerusers in the presence of high-power users [22].

Some comments are in order: 1) the estimators presented in this paper are almost

insensitivetouser powerimbalance(although strictly-speakingTypeII estimation isnot

near-far resistant, because of the bias introduced by non-ideal delay estimation). This

is due to the fact that, since the training matrix A has full column-rank, the channels

are estimatedinmutuallyorthogonalsubspaces, sothatusers withhigher received power

do not create larger interference for the estimation of lowerpower users. In comparison,

conventional rake-based channelestimation isnot robust to largeuser power dierences.

2) The slight degradation of Type II estimation with respect to the case of balanced

powers can be explained by the fact that only a xed number of paths b

P per user are

takenintoaccount. Ifauser hasmorethan b

P paths, theeect ofthesepathsisnot taken

into account in the approximated LMMSE lter (44). For high-power interfering users,

the eect of neglected paths is more evident. However, the degradation shown in our

simulations is very small, and itcould be reduced further by allowing avariable number

(25)

of estimated paths peruser as explainedat the end of Section 4.5.

3)Conventional unstructured estimationprovesto be suÆciently goodfor SUMF de-

tection. On the contrary, for LMMSE detection is makes sense to consider improved

channel estimation such as the structured Type I and Type II schemes proposed here.

In fact, it is well-known that in a multipath environment channel estimation errors have

a much larger impact on LMMSE than on SUMF, especially for large SNR (see [34]).

Intuitively, we can explain this fact by considering that the LMMSE for large SNR is

similar to the decorrelator [22], which projects the received signal onto the orthogonal

complement of the subspace spanned by interference. Non-perfect knowledge of the in-

terference subspace caused by estimation errors yields a large SINR degradation. On

the other hand, the SUMF is insensitive to the knowledge of the interference subspace,

since it treats interference as white noise, and SINR degradation is caused only by the

non-perfectknowledgeofthe usefulsignal. Notice thatthiseect cannotbeobserved just

by consideringthe estimation MSEas performance measure.

6 Conclusions

In this paper we have considered training sequence-based joint channel estimation for a

block-synchronous chip-asynchronous system motivated by the T-CDMA/TDD proposal

of3rdgenerationsystems. Wereviewedtheunstructuredchannelestimatorcurrentlypro-

posed for T-CDMA/TDD and we derived new structured channel estimators, exploiting

variouslevelsofa prioriinformation. Theproposedstructured channelestimatorscan be

applied to any arbitrary (approximately bandlimited) chip-shaping pulse, as opposed to

most methodsthat assume rectangular pulses.

The structured channel estimator that exploits the ne structure of the multipath

channels, i.e., the number of paths and their delays, requires accurate delay estimation.

Then, startingfrom the ML criterion, we derived a low-complexity delay estimatorthat

provesto be suited forstructured channelestimation.

Forallmethods, we found optimalsets of training sequences, existing for any desired

training length. The same sequence set is optimal for both unstructured and structured

estimation. If these sequences are adopted in standardization, it will be up to the de-

signer of the base-station equipment to implement the lower complexity unstructured

estimation orthe higher complexity structured estimation, according to the desired per-

(26)

formance/complexity trade-o.

Wecompared the proposed channel estimatorsby lookingatthe outputSINR ofsim-

ple linear detectors like the SUMF and the LMMSE receivers. Our simulations show

that whileunstructured estimationmightbesuÆciently goodforthe SUMF,itslosswith

respect to ideal channel knowledge might be too large in the case of LMMSE. On the

contrary, the performance lossofstructured estimation issmall alsowithLMMSE detec-

tion. Even though not considered here, we expect that alsoparallelor serial interference

cancellationschemes [22] are sensitive tochannelestimation errors,since imperfectchan-

nel knowledge prevents complete interference cancellation. Then, the higher complexity

of structured channelestimators proposed inthis papermightbe motivated andjustied

for receivers employingmultiuser detection.

Acknowledgements

Theauthorswouldliketoacknowledgethefruitfulandfriendlydiscussionswithprofessors

DirkSlock and Pierre Humblet.

References

[1] 3GPP {TS25.105V3.1.0,\3GPP-TSG-RAN-WG4;UTRA(BS)TDD;RadioTrans-

mission and Reception," ETSI (Availableat www.etsi.org/umts), Dec. 1999.

[2] A. Klein, G. Kawas Kaleh, P. Baier,\Zero forcing and minimum mean-square error

equalizationformultiuserdetectionincode-divisionmultiple-accesschannels,"IEEE

Trans. on Vehic. Tech., Vol. 45,No. 2,pp. 276 {287, May 1996.

[3] B. Steiner,P.Jung,\Optimumandsuboptimum channelestimationfortheuplinkof

CDMA mobile radio systems with joint detection," European Trans. on Commun.,

Vol.5, No. 1,pp. 39 { 49,Jan.{Feb., 1994.

[4] T. Rappaport, Wireless Communications.EnglewoodClis,NJ:Prentice-Hall, 1996.

[5] J. G.Proakis, Digital Communications.New York,NY: McGrawHill, 3rd ed., 1995.

[6] S. Crozier, D. Falconer,S. Mahmoud, \Least sum of squared errors(LSSE) channel

(27)

[7] A. Clark, Z. Zhu, J. Joshi, \Fast start-up channel estimation," IEE Proc.- F, vol.

131, pp. 375 { 382, July1984.

[8] S.Bensley,B.Aazhang,\Subspace-based channelestimationforCode DivisionMul-

tiple Accesscommunicationsystems," IEEETrans.onCommun., Vol.44,No.8,pp.

1009 { 1020, Aug.1998.

[9] S. Bensley, B. Aazhang, \Maximum-Likelihood synchronization of a single user for

Code DivisionMultipleAccess communicationsystems," IEEETrans.onCommun.,

Vol.46, No. 3,pp. 392 { 399, March 1996.

[10] E. Strom, S. Parkvall, S. Miller, B. Ottersten, \Propagation delay estimation in

asynchronousDirect-SequenceCode-DivisionMultiple-Accesssystems,"IEEETrans.

on Commun., Vol. 44,No. 1,pp. 84 {93, Jan. 1996.

[11] E.Ertin,U.Mitra,S.Siwamogsatham,\Maximum-Likelihoodbasedmultipathchan-

nel estimation for code-division multiple-access systems," submitted to the IEEE

Trans. on Commun.,June 1998.

[12] B. C. Ng,M. Cedervall,A. Paulraj, \Astructured channel estimation for maximum

likelihoodsequence detection," IEEE Commun. Letters,pp. 52{ 55,March 1997.

[13] B. C. Ng, D. Gesbert, A. Paulraj, \A semi-blind approach to structured channel

equalization," Proc. of IEEE Intern. Conf. on Commun. (ICC) 1997, pp. 3385 {

3388, Atlanta,7 {11 June, 1998.

[14] W.Mow,Sequencedesignforspreadspectrum,TheChineseUniversityPress,Chinese

University of Hong Kong, Shatin,Hong Kong,1995.

[15] D. Tse and S. V.Hanly, \Linearmultiuserreceivers: Eective interference, eective

bandwidth andcapacity,"IEEE Trans.Inform. Theory, Vol.45,No.2, pp.641{657,

March 1999.

[16] S. Verduand S.Shamai(Shitz), \Spectral eÆciency of CDMA with randomspread-

ing," IEEE Trans.Inform. Theory,Vol.45, No.2, pp. 622{640,March 1999.

[17] S. Verdu and S. Shamai (Shitz), \The eect of frequency-at fading on the spec-

(28)

Also: \Capacity of CDMA fadingchannels," 1999 IEEE Inform. Theory Workshop,

Metsovo, Greece,June 27-July 1,1999.

[18] E. Biglieri, G. Caire, G. Taricco and E. Viterbo, \How fading aects CDMA: an

asymptotic analysis with linear receivers," to appear on IEEE Journ. Sel. Areas on

Commun. (wireless series),2000.

[19] E.Biglieri,J.ProakisandS.Shamai(Shitz),\Fadingchannels: Information-theoretic

and communications aspects," IEEE Trans. on Inform. Theory, Vol. 44, No. 6, pp.

2619{2692, Oct. 1998.

[20] C. Berrou and A. Glavieux, \Near optimum error-correcting coding and decoding:

Turbo codes," IEEE Trans.on Commun., Vol. 44,No. 10,Oct. 1996.

[21] T.RichardsonandR.Urbanke,\Thecapacityoflow-densityparitycheckcodesunder

message passing decoding," submitted toIEEE Trans. on Inform. Theory,1998.

[22] S. Verdu, Multiuser Detection. New York, NY: Cambridge University Press, 1998.

[23] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd Edition, New York:

Academic Press, 1985.

[24] 3GGP { TS 25.224 V3.1.0,\3GPP-TSG-RAN-WG1; Physical Layer Procedures

(TDD)," ETSI (Available atwww.etsi.org/umts), Dec. 1999.

[25] S. Kay, Fundamentals of statistical signal processing: Estimation theory. New York:

Prentice-Hall, 1993.

[26] S. Grant, J. Cavers, \Multiuser channel estimation for detection of cochannel sig-

nals," Proceedings of 1999 IEEE Intern. Conf. on Commun. ICC '99, Vancouver,

BC, Canada, 6 { 10June, 1999.

[27] J. Ng,K.BenLetaief, R.Murch,\Complexoptimalsequences with constantmagni-

tude for fast channel estimation initialization," IEEE Trans. on Commun., Vol. 46,

No. 3, pp. 305 { 308, March1998.

[28] G. Golub, C. Van Loan, Matrix computations. The John Hopkins University Press,

2nd ed., 1989.

(29)

[29] COST207 ManagementCommittee,\COST 207: Digitallandmobileradio commu-

nications (nal report)," Commission of the European Communities, 1989.

[30] G. Caire, \Two-stage non-data aided adaptive linear receivers for DS/CDMA," to

appear on IEEE Trans. on Commun., Oct. 2000.

[31] R. De Gaudenzi, F. Giannetti, M. Luise, \Design of a low-complexity adaptive

interference-mitigatingdetectorforDS/SSreceiversinCDMAradionetworks,"IEEE

Trans. Commun., Vol. 46,pp. 125 {134, Jan. 1998.

[32] X. Wang and V. Poor, \Blind equalization and multiuser detection in dispersive

CDMA channels," IEEE Trans. onCommun., Vol.46,No.1,pp. 91{103,Jan. 1998.

[33] A. J. Viterbi, CDMA { Principles of Spread Spectrum Communication. Reading,

MA: Addison-Wesley, 1995.

[34] J. Evans and D. Tse, \Large system performance of linear multiuser receivers in

multipath fading channels," to appear on IEEE Trans. on Inform. Theory, 2000.

Also presented at Intern. Sympos. on Inform. Theory ISIT 2000, Sorrento, Italy,

June 2000.

(30)

+

Ψ (t) c (t)

User 1

1 Ψ ^(t) c (t)

User 2

2 Ψ ^(t) c (t)

User K

K

Σ

ν ^(t)

LPF

Sampling at rate W

Figure1: Baseband equivalent representation of a DS/CDMA system.

(31)

Transmitted user signal block

Data Training Data

Q-1 chips

M chips

Total delay-spread: Q chips User delay

Convolution with the channel

Received user signal block

Data Samples for channel estimation Data

Q-1 chips

M chips

Guard

Figure 2: Signal block structure beforeand after convolution with the channel of length

at most Q chips.

(32)

-6 -5 -4 -3 -2 -1 0 1

0 20 40 60 80 100 120 140

Singular values (log-scale)

Singular value index Q=32, RRC pulses, α=0.22

N _c =2 N _c =4

Figure 3: Normalizedsquared singularvaluesfor the chip-shapingpulse convolutionma-

trix, for Q=32, N

c

=2;4, and RRC pulse with roll-o=0:22.

(33)

0 0.2 0.4 0.6 0.8 1

-15 -10 -5 0 5 10

SINR cdf

SINR (dB)

K=8, L=16, M=256, slow power control, SUMF Ideal

Unstr Str.Type I Str.Type II,perf.

Str.Type II,est.

Figure 4: SINR cdf with slow power control and SUMF receiver, with dierent channel

information: Idealknowledge(Ideal),Unstructuredestimation(Unstr.),TypeIstructured

estimation (Str.Type I), Type II structured estimation with perfect delays (Str.Type II,

perf.) and with estimated delays ,(Str. Type II, est.).

(34)

0 0.2 0.4 0.6 0.8 1

-2 -1 0 1 2 3 4 5 6 7

SINR cdf

SINR (dB)

K=8, L=16, M=256, fast power control, SUMF Ideal

Unstr Str.Type I Str.Type II,perf.

Str.Type II,est.

Figure 5: SINR cdf with fast power control and SUMF receiver, with dierent channel

information: Idealknowledge(Ideal),Unstructuredestimation(Unstr.),TypeIstructured

estimation (Str.Type I), Type II structured estimation with perfect delays (Str.Type II,

perf.) and with estimated delays ,(Str. Type II, est.).

Structured multiuser channel estimation for block-synchronous DS/CDMA

+

Ψ (t) c (t)

User 1

1

Ψ (t) c (t)

User 2

2

Ψ (t) c (t)

User K

K

Σ

ν (t)

LPF

Sampling at rate W

Transmitted user signal block

Data Training Data

Q-1 chips

M chips

Total delay-spread: Q chips User delay

Convolution with the channel

Received user signal block

Data Samples for channel estimation Data

Q-1 chips

M chips

Guard

-6 -5 -4 -3 -2 -1 0 1

0 20 40 60 80 100 120 140

Singular values (log-scale)

Singular value index Q=32, RRC pulses, α=0.22

N c =2 N c =4

0 0.2 0.4 0.6 0.8 1

-15 -10 -5 0 5 10

SINR cdf

SINR (dB)

K=8, L=16, M=256, slow power control, SUMF Ideal

Unstr Str.Type I Str.Type II,perf.

Str.Type II,est.

0 0.2 0.4 0.6 0.8 1

-2 -1 0 1 2 3 4 5 6 7

SINR cdf

SINR (dB)

K=8, L=16, M=256, fast power control, SUMF Ideal

Unstr Str.Type I Str.Type II,perf.

Str.Type II,est.

Ψ ^(t) c (t)

Ψ ^(t) c (t)

ν ^(t)

N _c =2 N _c =4