A Coordinate System for Gaussian Networks

(1)

A Coordinate System for Gaussian Networks

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Abbe, Emmanuel, and Lizhong Zheng. “A Coordinate System for

Gaussian Networks.” IEEE Transactions on Information Theory 58.2

(2012): 721–733.

As Published

http://dx.doi.org/10.1109/tit.2011.2169536

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Version

Author's final manuscript

Citable link

http://hdl.handle.net/1721.1/73671

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike 3.0

(2)

A Coordinate System for Gaussian Networks

Emmanuel A. Abbe

IPG, EPFL Lausanne, 1015 CH Email: [email protected]

Lizhong Zheng

RLE, MIT Cambridge, MA 02139 Email: [email protected]

Abstract—This paper studies network information theory problems where the external noise is Gaussian distributed. In particular, the Gaussian broadcast channel with coherent fading and the Gaussian interference channel are investigated. It is shown that in these problems, non-Gaussian code ensembles can achieve higher rates than the Gaussian ones. It is also shown that the strong Shamai-Laroia conjecture on the Gaussian ISI channel does not hold. In order to analyze non-Gaussian code ensembles over Gaussian networks, a geometrical tool using the Hermite polynomials is proposed. This tool provides a coordinate system to analyze a class of non-Gaussian input distributions that are invariant over Gaussian networks.

I. INTRODUCTION

Let a memoryless additive white Gaussian noise (AWGN) channel be described by Y = X + Z, where Z ∼ N (0, v) is independent of X. If the input is imposed an average power constraint given by EX2 ≤ p, the input distribution maximizing the mutual information is Gaussian. This is due to the fact that under second moment constraint, the Gaussian distribution maximizes the entropy, hence

arg max

X:EX2_=ph(X + Z) ∼ N (0, p). (1) On the other hand, if we use a Gaussian input distribution, i.e., X ∼ N (0, p), the worst noise that can occur, i.e., the noise minimizing the mutual information, among noises with bounded second moment, is again Gaussian distributed. This can be shown by using the entropy power inequality (EPI), cf. [11], which reduces in this setting to

arg min Z: h(Z)=1 2log 2πev h(X + Z) ∼ N (0, v) (2) and implies arg min Z:EZ2_=vh(X + Z) − h(Z) ∼ N (0, v). (3) Hence, in the single-user setting, when optimizing the mutual information as above, a Gaussian input is the best input for a Gaussian noise and a Gaussian noise is the worst noise for a Gaussian input. This provides a game equilibrium between user and nature, as defined in [7], p. 263. With these results, many problems in information theory dealing with Gaussian noise can be solved. However, in Gaussian networks, that is, in multi-user information theory problems where the external noise is Gaussian distributed, several new phenomena make the search for the optimal input ensemble more complex. Besides for some specific cases of Gaussian networks, we still

do not know how interference should be treated in general. Let us consider two users interfering on each other in addition to suffering from Gaussian external noise and say that the receivers treat interference as noise. Then, if the first user has drawn its code from a Gaussian ensemble, the second user faces a frustration phenomenon: using a Gaussian ensemble maximizes its mutual information but minimizes the mutual information of the first user. It is an open problem to find the optimal input distributions for this problem. This is one illustration of the complications appearing in the network setting. Another example is regarding the treatment of the fading. Over a single-user AWGN channel, whether the fading is deterministic or random, but known at the receiver, does not affect the optimal input distribution. From (1), it is clear that maximizing I(X; X + Z) or I(X; HX + Z|H) under an average power constraint is achieved by a Gaussian input. However, the situation is different if we consider a Gaussian broadcast channel (BC). When there is a deterministic fading, using (1) and (3), the optimal input distribution can be shown to be Gaussian. However, it has been an open problem to show whether Gaussian inputs are optimal or not for a Gaussian BC with a random fading known at the receiver, even if the fading is such that it is a degraded BC.

A reason for these open questions in the network in-formation theoretic framework, is that Gaussian ensembles are roughly the only ensembles that can be analyzed over Gaussian networks, as non-Gaussian ensembles have left most problems in an intractable form. In this paper, a novel tech-nique is developed to analyze a class of non-Gaussian input distributions over Gaussian noise channels. This technique is efficient to analyze the competitive situations occurring in the network problems described below. It allows in particular to find certain non-Gaussian ensembles that outperform Gaussian ones on a Gaussian BC with coherent fading channel, a two user interference channel, and it allows to disprove the strong Shamai-Laroia conjecture on the Gaussian intersymbol interference channel. This tool provides a new insight on Gaussian networks and confirms that non-Gaussian ensembles do have a role to play in these networks. We now introduce with more details the notion of competitive situations. A. Competitive Situations

1) Fading Broadcast Channel: Consider a degraded Gaus-sian BC with coherent memoryless fading, where the fading

(3)

is indeed the same for both receivers, i.e. Y1 = HX + Z1,

Y2 = HX + Z2

but Z1 ∼ N (0, v1) and Z2 ∼ N (0, v2), with v1 < v2. The

input X is imposed a power constraint denoted by p. Because the fading is coherent, each receiver also knows the realization of H, at each channel use. The fading and the noises are memoryless (iid) processes. Since this is a degraded broadcast channel, the capacity region is given by all rate pairs

(I(X; Y1|U, H), I(U ; Y2|H))

with U −X −(Y1, Y2). The optimal input distributions, i.e., the

distributions of (U, X) achieving the capacity region boundary, are given by the following optimization, where µ ∈ R,

arg max (U,X): U −X−(Y1,Y2)

EX2 ≤p

I(X; Y1|U, H) + µI(U ; Y2|H). (4)

Note that the objective function in the above maximization is given by

h(Y1|U, H) − h(Z1) + µh(Y2|H) − µh(Y2|U, H).

Now, each term in this expression is individually maximized by a Gaussian distribution for U and X, but these terms are combined with different signs, so there is a competitive situation and the maximizer is not obvious. When µ ≤ 1, one can show that Gaussian distributions are optimal. Also, if H is compactly supported, and if v is small enough as to make the support of H and 1/vH non overlapping, the optimal distribution of (U, X) is jointly Gaussian (cf. [13]). However, in general the optimal distribution is unknown. We do not know if it because we need more theorems, or if it is really that with fading, non-Gaussian codes can actually perform better than the Gaussian ones.

2) Interference Channel: We consider the symmetric mem-oryless interference channel (IC) with two users and white Gaussian noise. The average power is denoted by p, the interference coefficients by a, and the respective noise by Z1 and Z2 (independent standard Gaussian). We define the

following expression mSa,p(X1m, X m 2 ) (5) = I(X1m; X m 1 + aX m 2 + Z m 1 ) + I(X m 2 ; X m 2 + aX m 1 + Z m 2 ) = h(X₁m+ aX₂m+ Z₁m) − h(aX₂m+ Z₁m) + h(X₂m+ aX₁m+ Z₂m) − h(aX₁m+ Z₂m),

where X1mand X2mare independent random vectors of

dimen-sion m with a covariance having a trace bounded by mp and Zm

i , i = 1, 2, are iid standard Gaussian. For any dimension

m and any distributions of Xm

1 and X2m, Sa,p(X1m, X2m) is

a lower bound to the sum-capacity. Moreover, it is tight by taking m arbitrarily large and Xm

1 and X2m maximizing (5).

Now, a similar competitive situation as for the fading broadcast problem takes place: Gaussian distributions maximize each entropy term, but these terms are combined with different

signs. Would we then prefer to take X1 and X2 Gaussian

or not? This should depend on the value of a. If a = 0, we have two parallel AWGN channels with no interference, and Gaussian inputs are optimal. We can then expect that this might still hold for small values of a. It has been proved recently in [2], [8], [10], that the sum-capacity is achieved by treating interference as noise and with iid Gaussian inputs, as long as pa3_{+ a − 1/2 ≤ 0. Hence, in this regime, the iid}

Gaussian distribution maximizes (5) for any m. But if a is above that threshold and below 1, the problem is open.

Let us now review the notion of “treating interference as noise”. For each user, we say that the decoder is treating interference as noise, if it does not require the knowledge of the other user’s code book. However, we allow such decoders to have the knowledge of the distribution, under which the other user’s code book may be drawn. This is for example necessary to construct a sum-capacity achieving code1 in [2], [8], [10], where the decoder of each user treats interference as noise but uses the fact that the other user’s code book is drawn from an iid Gaussian distribution. But, if we allow this distribution to be of arbitrarily large dimension m in our definition of treating interference as noise, we can get a misleading definition. Indeed, no matter what a is, if we take m large enough and a distribution of Xm

1 , X2mmaximizing (5),

we can achieve rates arbitrarily close to the sum-capacity, yet, formally treating interference as noise. The problem is that the maximizing distributions in (5) may not be iid for an arbitrary a, and knowing it at the receiver can be as much information as knowing the other user’s code book (for example, if the distribution is the uniform distribution over a code book of small error probability). Hence, one has to be careful when taking m large. In this paper, we will only work with situations that are not ambiguous with respect to our definition of treating interference as noise. It is indeed an interesting problem to discuss what kind of m-dimensional distributions would capture the meaning of treating interference as noise that we want. This also points out that studying the maximizers of (5) relates to studying the concept of treating interference as noise or information. Since for any chosen distributions of the inputs we can achieve (5), the maximizers of (5) must have a different structure when a grows. For a small enough, iid Gaussian are maximizing distributions, but for a ≥ 1, since we do not want to treat interference as noise, the maximizing distributions must have a “heavy structure”, whose characterization requires as much information as giving the entire code book. This underlines that an encoder can be drawn from a distribution which does not maximize (5) for any value of m, but yet, a decoder may exist in order to have a capacity achieving code. This happens if a ≥ 1, iid Gaussian inputs will achieve the sum-capacity if the receiver decodes the message of both users (one can show that the problem is equivalent to having two MAC’s). However, if a ≥ 1, the iid Gaussian distribution does not maximize Sa,p(X1, X2) (for the dimension 1, hence for

arbitrary dimensions).

(4)

In any cases, if the Gaussian distribution does not maximize (5) for the dimension 1, it means that iid Gaussian inputs and treating interference as noise is not capacity achieving, since a code which treats interference as noise and whose encoder is drawn from a distribution can be capacity achieving only if the encoder is drawn from a distribution maximizing (5). Hence, understanding better how to resolve the competitive situation of optimizing (5) is a consequent problem for the interference channel.

B. ISI channel and strong Shamai-Laroia Conjecture Conjecture 1: Let h, p, v ∈ R+, X1G ∼ N (0, p) and Z ∼

N (0, v) (independent of XG

1). For all X, X1 i.i.d. with mean

0 and variance p, we have

I(X; X + hX₁G+ Z) ≤ I(X; X + hX1+ Z). (6)

This conjecture has been brought to our attention by Shlomo Shamai (Shitz), who referred to the strong conjecture for a slightly more general statement, where an arbitrary memory for the interference term is allowed, i.e., where Pn

i=1hiXi stands for hX1. The strong conjecture then

claims that picking all Xi’s Gaussian gives a minimizer.

However, we will show that even for the memory one case, the conjecture does not hold. The weak conjecture, also referred to as the Shamai-Laroia conjecture, corresponds to a specific choice of the hi’s, which arises when using an

MMSE decision feedback equalizer on a Gaussian noise ISI channel, cf. [9]. This conjecture is investigated in a work in progress.

There are many other examples in network information theory where such competitive situations occur. Our goal in this paper is to explore the degree of freedom provided by non-Gaussian input distributions. We show that the neighborhood of Gaussian distributions can be parametrized in a specific way, as to simplify greatly the computations arising in com-petitive situations. We will be able to precisely quantify how much a certain amount of non-Gaussianness, which we will characterize by means of the Hermite polynomials, affects or helps us in maximizing the competitive entropic functional of previously mentioned problems.

II. PROBLEMSTATEMENT

A. Fading BC

For the fading BC problem problem described in I-A1, we want to determine if/when the distribution of (U, X) maximizing (4) is Gaussian or not.

B. IC

For the interference channel problem described in I-A2, we know from [2], [8], [10] that treating interference as noise and using iid Gaussian inputs is optimal when pa3_{+ a − 1/2 > 0.}

We question when this coding scheme is no longer optimal. More generally, we want to analyze the maximizers of (5).

We distinguish the implication of such a threshold in both the synchronized and asynchronized users setting, as there will

be an interesting distinction between these two cases. We recall how the synch and asynch settings are defined here. In the synch setting, each user of the IC sends their code words of a common block length n simultaneously, i.e., at time 1, they both send the first component of their code word, at time 2 the second component, etc. In the asynch setting, each user is still using code words of the same block length n, however, there might be a shift between the time at which the first and second users start sending their code words. We denote this shift by τ , and assume w.l.o.g. that 0 ≤ τ ≤ n. In the totally asynch setting, we assume that τ is drawn uniformly at random within {0, . . . , n}. We may also distinguish the cases where τ is not known at the transmitter but at the receiver, and when τ is not known at both. Note that if iid input distributions are used to draw the code books, and interference is treated as noise, whether the users are synch or asynch is not affecting the rate achievability2. However, if the users want to time-share over the channel uses, such as to fully avoid their interference, they will need synchronization.

Definition 1: Time sharing over a block length n (assumed to be even) with Gaussian inputs refers to using X1 Gaussian

with covariance 2P ˆIn/2 and X2 Gaussian with covariance

2P ˆIc

n/2, where ˆIn/2 is a diagonal matrix with n/2 1’s and

0’s, and ˆI_n/2c flips the 1’s and 0’s on the diagonal. C. ISI Channel and Strong Shamai-Laroia Conjecture

We want to determine whether conjecture 1 holds or not. D. General Problem

Our more general goal is to understand better the problem posed by any competitive situations. For this purpose, we formulate the following mathematical problem.

We start by changing the notation and rewrite (1) and (3) as arg max f : m2(f )=p h(f ? gv) = gp (7) arg min f : m2(f )=p h(f ? gv) − h(f ) = gp (8)

where gp denotes the Gaussian density with zero mean and

variance p, and the functions f are density functions on R, i.e., positive functions integrating to 1, and having a well-defined entropy and second moment m2(f ) =

R

Rx

2_{f (x)dx.}

We consider the local geometry by looking at densities of the form fε(x) = gp(x)(1 + εL(x)), x ∈ R, (9) where L : R → R satisfies inf x∈RL(x) > −∞ (10) Z R L(x)gp(x)dx = 0. (11)

2_{hence, (5) with an iid distribution for X}

1and X2can still be defined for

(5)

With these two constraints on L, fε is a valid density for

ε sufficiently small. It is a perturbed Gaussian density, in a “direction” L. Observe that,

m1(fε) = 0 iff M1(L) = Z R xL(x)gp(x)dx = 0 (12) m2(fε) = p iff M2(L) = Z R x2L(x)gp(x)dx = 0. (13)

We are now interested in analyzing how these perturbations affect the output distributions through an AWGN channel. Note that, if the input distribution is a Gaussian gp perturbed

in the direction L, the output is a Gaussian gp+v perturbed in

the direction (gpL)?gv gp+v , since fε? gv= gp+v(1 + ε (gpL) ? gv gp+v ).

Convention:gpL?gvrefers to (gpL)?gv, i.e., the multiplicative

operator precedes the convolution one.

For simplicity, let us assume in the following that the function L is a polynomial satisfying (10), (11). Lemma 1: We have D(fε||gp) = 1 2ε 2_kLk2 gp+ o(ε 2₎ D(fε? gv||gp? gv) = 1 2ε 2_kgpL ? gv gp+v k2 gp+v+ o(ε 2_). where kLk2 gp = Z R L2(x)gp(x)dx.

Moreover, note that for any density f , if m1(f ) = a and

m2(f ) = p + a2, we have

h(f ) = h(ga,p) − D(f ||ga,p). (14)

Hence, the extremal entropic results of (7) and (8) are locally expressed as arg min L: M2(L)=0 kgpL ? gv gp+v k2 gp+v= 0 (15) arg max L: M2(L)=0 kgpL ? gv gp+v k2 gp+v− kLk 2 gp= 0, (16)

where 0 denotes here the zero function. If (15) is obvious, (16) requires a proof which will be done in section V. Let us define the following mapping,

Γ(+): L ∈ L2(gp) 7→

gpL ? gv

gp+v

∈ L2(gp+v; R), (17)

where L2(gp) denotes the space of real functions having a

finite k · kgp norm. This linear mapping gives, for a given perturbed direction L of a Gaussian input gp, the resulting

perturbed direction of the output through additive Gaussian noise gv. The norm of each direction in their respective spaces,

i.e., in L2(gp) and L2(gp+v), gives how far from the Gaussian

distribution these perturbations are (up to a scaling factor). Note that if L satisfies (10)-(11), so does Γ(+)_{L for the}

measure gp+v. The result in (16) (worst noise case) tells us

that this mapping is a contraction, but for our goal, what would be helpful is a spectral analysis of this operator, to allow more quantitative results than the extreme-case results of (15) and (16).

In order to do so, one can express Γ(+) _{as an operator}

defined and valued in the same space, namely L2 with the

Lebesgue measure λ, which is done by inserting the Gaussian measure in the operator argument. We then proceed to a singular function/value analysis. Formally, let K = L√gp,

which gives kKkλ= kLkgp, and let Λ : K ∈ L2(λ) 7→ √ gpK ? gv √ gp+v ∈ L2(λ) (18)

which gives kΓ(+)Lkgp+v = kΛKkλ. Denoting by Λ

∗ _the

adjoint operator of Λ, we want to find the singular functions of Λ, i.e., the eigenfunctions K of Λ∗Λ:

Λ∗ΛK = γK. III. RESULTS

A. General Result: Local Geometry and Hermite Coordinates The following theorem gives the singular functions and values of the operator Λ defined in previous section.

Theorem 1:

Λ∗ΛK = γK, K 6= 0 holds for each pair

(K, γ) ∈ {(√gpH [p] k , _p p + v k )}k≥0, where H_k[p](x) = √1 k!Hk(x/ √ p) Hk(x) = (−1)kex 2_/2 dk dxke −x2_/2 , _{k ≥ 0, x ∈ R.}

The polynomials H_k[p]are the normalized Hermite polynomials (for a Gaussian distribution having variance p) and √gpH

[p] k

are called the Hermite functions. For any p > 0, {H_k[p]}k≥0

is an orthonormal basis of L2(gp), this can be found in [12].

One can check that H1, respectively H2 perturb a Gaussian

distribution into another Gaussian distribution, with a different first moment, respectively second moment. For k ≥ 3, the Hk

perturbations are not modifying the first two moments and are moving away from Gaussian distributions. Since H₀[p] = 1, the orthogonality property implies that H_k[p] satisfies (11) for any k > 0. However, it is formally only for even values of k that (13) is verified (although we will see in section V that essentially any k can be considered in our problems). The following result contains the property of Hermite polynomials mostly used in our problems, and expresses Theorem 1 with the Gaussian measures.

The following result contains the property of Hermite polynomials mostly used in our problems, and expresses Proposition 1 with the Gaussian measures.

(6)

Theorem 2: Γ(+)H_k[p]=gpH [p] k ? gv gp+v = _p p + v k/2 H_k[p+v], (19) Γ(−)H_k[p+v]= H_k[p+v]? gv = p p + v k/2 H_k[p]. (20)

Last Theorem implies Theorem 1, since

Γ(−)Γ(+)L = γL ⇐⇒ Λ∗ΛK = γK for

K = L√gp.

Comment: the results that we have just derived are related to properties of the Ornstein-Uhlenheck process.

Summary:In words, we just saw that Hkis an eigenfunction

of the input/output perturbation operator Γ(+), in the sense that Γ(+)_H[p] k = p p+v k/2

H_k[p+v]. Hence, over an additive

Gaussian noise channel gv, if we perturb the input gp in the

direction H_k[p] by an amount ε, we will perturb the output gp+v in the direction H [p+v] k by an amount _p p+v k/2 ε. Such a perturbation in Hkimplies that the output entropy is reduced

(compared to not perturbing) by_p+vp

k

ε2

2 (if k ≥ 3).

B. Fading BC Result

The following result states that the capacity region of a degraded fading BC with Gaussian noise is not achieved by a Gaussian superposition code in general.

Theorem 3: Let

Y1= HX + Z1,

Y2= HX + Z2

with X such that EX2_{≤ p, Z}

1∼ N (0, v), 0 < v < 1, Z2∼

N (0, 1) and H, X, Z1, Z2 mutually independent. There exists

a fading distribution and a value of v for which the capacity achieving input distribution is non-Gaussian. More precisely, let U be any auxiliary random variable, with U −X −(Y1, Y2).

Then, there exists p, v, a distribution of H and µ such that (U, X) 7→ I(X; Y1|U, H) + µI(U ; Y2|H) (21)

is maximized by a non jointly Gaussian distribution.

In the proof, we present a counter-example to Gaussian being optimal for H binary. In order to defeat Gaussian dis-tributions, we construct input distributions using the Hermite coordinates. The proof also gives a condition on the fading distribution and the noise variance v for which a non-Gaussian distribution strictly improves on the Gaussian one.

C. IC Result Definition 2: Let Fk(a, p) = lim

δ&0ε&0lim

2 ε2Sa,p(X1, X2) − Sa,p(X G 1, X G 2 ) , where X1G, X2G iid ∼ gp, X1 ∼ gp(1 + ε eHk) and X2 ∼

gp(1 − ε eHk), with eHk defined in (22) below (as explain in

section IV, eHk is a formal modification of Hk to ensure the

positivity of the perturbed densities).

In other words, Fk(a, p) represents the gain (positive or

negative) of using X1 perturbed along Hk and X2 perturbed

along −Hk with respect to using Gaussian distributions. Note

that the distributions we chose for X1and X2are not the most

general ones, as we could have chosen arbitrary directions spanned by the Hermite basis to perturb the Gaussian densities. However, as explained in the proof of the theorem 4, this choice is sufficient for our purpose.

Theorem 4: We have for k ≥ 2 Fk(a, p) = _a2 pa2_{+ 1} k − (a k_{− 1)}2 (pa2_{+ p + 1)}k.

For any fixed p, the function Fk(·, p) has a unique positive

root, below which it is negative and above which it is positive. Theorem 5: Treating interference as noise with iid Gaussian inputs does not achieve the sum-capacity of the symmetric IC (synch or asynch) and is outperformed by X1∼ gp(1 + ε eH3)

and X2∼ gp(1 − ε eH3), if F3(a, p) > 0.

This Theorem is a direct consequence of Theorem 4. Proposition 1: For the symmetric synch IC, time sharing improves on treating interference as noise with iid Gaussian distribution if F2(a, p) > 0.

We now introduce the following definition.

Definition 3: Blind time sharing over a block length n (assumed to be even) between two users, refers to sending non-zero power symbols only at the instances marked with a 1 in (1, 0, 1, 0, 1, 0, . . . 1, 0) for the first user, and zero power symbols only at the instances marked with a 1 in (1, 1, . . . , 1, 0, 0, . . . , 0) for the second user.

Proposition 2: For the symmetric totally asynch IC, if the receivers (but not transmitters) know the asynchronization delay, blind time sharing improves on treating interference as noise with iid Gaussian distributions if B2(a, p) > 0, where

B2(a, p) = 1₄(log(1+2p)+log(1+_1+2a2p2_p))−log(1+

p

1+a2_p).

If the receivers do not know the asynchronization delay, blind time sharing cannot improve on treating interference as noise with iid Gaussian distributions if B2(a, p) ≤ 0.

How to read these results:We have four thresholds to keep track of:

• T1(p) is when pa3+ a −12= 0. If a ≤ T1(p), we know

from [2], [8], [10] that iid Gaussian inputs and treating interference as noise is sum-capacity achieving.

• T2(p) is when F2(a, p) = 0. If a > T2(p), we know

from Prop. 1 that, if synchronization is permitted, time sharing improves on treating interference as noise with iid

(7)

Gaussian inputs. This regime matches with the so-called moderate regime defined in [5].

• T3(p) is when F3(a, p) = 0. If a > T3(p), we know from

Prop. 5 that treating interference as noise with iid non-Gaussian distributions (opposites in H3) improves on the

iid Gaussian ones.

• T4(p) is when B2(a, p) = 0. If a > T4(p), we know from

Prop. 2 that, even if the users are totally asynchronized, but if the receivers know the asynchronization delay, blind time sharing improves on treating interference as noise with iid Gaussian inputs. If the receivers do not know the delay, the threshold can only appear for larger values of a.

The question is now, how are these thresholds ranked. It turns out that 0 < T1(p) < T2(p) < T3(p) < T4(p). And if p =

1, the above inequality reads as 0.424 < 0.605 < 0.680 < 1.031. This implies the following for a decoder that treats interference as noise. Since T2(p) < T3(p), it is first better

to time share than using non-Gaussian distributions along H3.

But this is useful only if time-sharing is permitted, i.e., for the synch IC. However, for the asynchronized IC, since T3(p) <

T4(p), we are better off using the non-Gaussian distributions

along H3 before a Gaussian input scheme, even with blind

time-sharing, and even if the receiver could know the delay. We notice that there is still a gap between T1(p) and T2(p),

and we cannot say if, in this range, iid Gaussian inputs are still optimal, or if another class of non-Gaussian inputs (far away from Gaussians) can outperform them. In [4], another technique (which is related to ours but not equivalent) is used to find regimes where non-Gaussian inputs can improve on Gaussian ones on the same problem that we consider here. The threshold found in [4] is equal to 0.925 for p = 1, which is looser than the value of 0.680 found here.

Finally, the following interesting and curious fact has also been noticed. In theorem 4, we require k ≥ 2. Nevertheless, if we plug k = 1 in the right hand side of theorem 4 and ask for this expressions to be positive, we precisely get pa3_{+ a −} 1

2 > 0, i.e., the complement range delimited by

T1(p). However, the right hand side of theorem 4 for k = 1

is not equal to F1(a, p) (this is explained in more details in

the proof of theorem 4). Indeed, it would not make sense that moving along H1, which changes the mean with a fixed

second moment within Gaussians, would allow us to improve on the iid Gaussian scheme. Yet, getting to the exact same condition, when working on the problem of improving on the iid Gaussian scheme, seems to be a strange coincidence. D. Strong Shamai-Laroia Conjecture

We show in Section V that conjecture 1 does not hold. We provide counter-examples to the conjecture, pointing out that the range of h for which the conjecture does not hold increases with SNR.

IV. HERMITECODING: FORMALITIES

The Hermite polynomial corresponding to k = 0 is H₀[p]= 1 and is clearly not a valid direction as it violates (11). Using

the orthogonality property of the Hermite basis and since H₀[p]= 1, we conclude that H_k[p]satisfies (11) for any k > 0. However, it is only for k even that H_k[p]satisfies (10). On the other hand, for any δ > 0, we have that H_k[p]+ δH_4k[p]satisfies (10), whether k is even or not (we chose 4k instead of 2k for reasons that will become clear later). Now, if we consider the direction −H_k[p], (10) is not satisfied for both k even and odd. But again, for any δ > 0, we have that −H_k[p]+ δH_4k[p]satisfies (10). Hence, in order to ensure (10), we will often work in the proofs with ±H_k[p]+δH_4k[p], although it will essentially allow us to reach the performance achieved by any ±H_k[p](odd or even), since we will then take δ arbitrarily small and use continuity arguments.

Convention: We drop the variance upper script in the Hermite terms whenever a Gaussian density with specified variance is perturbed, i.e., the density gp(1 + εHk) always

denotes gp(1 + εH [p]

k ), and gpHk always denotes gpH [p]

k , no

matter what p is. Same treatment is done for k · kgp and k · k. Now, in order to evaluate the entropy of a perturbation, i.e., h(gp(1 + εL)), we can express it as the entropy of h(gp)

minus the divergence gap, as in (14), and then use Lemma 1 for the approximation. But this is correct if gp(1 + εL) has

the same first two moments as gp. Hence, if L contains only

Hk’s with k ≥ 3, the previous argument can be used. But if

L contains H1and/or H2terms, the situation can be different.

Next Lemma describes this. Lemma 2: Let δ, p > 0 and

b eH_k[p]= (

b(H_k[p]+ δH_4k[p]), if b ≥ 0,

b(H_k[p]− δH_4k[p]), if b < 0. (22) We have for any αk ∈ R, k ≥ 1, ε > 0

h(gp(1 + ε X k≥1 αkHek)) = h(gp) − D(gp(1 + ε X k≥1 αkHek)||gp) + εα2 √ 2.

Finally, when we convolve two perturbed Gaussian distribu-tions, we get ga(1 + εHj) ? gb(1 + εHk) = ga+b+ ε[gaHj?

gb + ga ? gbHk] + ε2gaHj ? gbHk. We already know from

Theorem 2 how to describe the terms in ε, what we still need is to describe the term in ε2_{. We have the following.}

Lemma 3: We have gaH [a] k ? gbH [b] l = Cga+bH [a+b] k+l ,

where C is a constant depending only on a, b, k and l. In particular if k = l = 1, we have C =

√ 2ab

a+b.

V. PROOFS

We start by reviewing the proof of (16), as it brings interesting facts. We then prove the main result.

Proof of (16):

We first assume that fε has zero mean and variance p. Using

the Hermite basis, we express L as L =P

k≥3αkH

[p]

k (L must

(8)

norm, to make sense of the original expressions). Using (19), we can then express (16) as

X k≥3 α2_k _p p + v k −X k≥3 α2_k (23)

which is clearly negative. Hence, we have proved that kgpL ? gv

gp+v

k2gp+v ≤ kLk

2

gp (24)

and (16) is maximized by taking L = 0. Note that we can get tighter bounds than the one in previous inequality, indeed the tightest, holding for H3, is given by

kgpL ? gv gp+v k2 gp+v≤ p p + v 3 kLk2 gp (25)

(this clearly holds if written as a series like in (23)). Hence, locally the contraction property can be tightened, and locally, we have stronger EPI’s, or worst noise case. Namely, if ν ≥ p p+v 3 , we have arg min f : m1(f )=0,m2(f )=p h(f ? gv) − νh(f ) = gp (26) and if ν < _p+vp 3 , gp is outperformed by non-Gaussian

distributions. Now, if we consider the constraint m2(f ) ≤ p,

which, in particular, allows to consider m1(f ) > 0 and

m2(f ) = p, we get that if ν ≥ _p+vp ,

arg min

f : m2(f )≤p

h(f ? gv) − νh(f ) = gp (27)

and if ν < _p+vp , gp is outperformed by gp−δ for some δ > 0.

It would then be interesting to study if these tighter results hold in a greater generality than for the local setting.

Proof of Theorem 2: We want to show H_k[p+v]? gv= ( p p + v) k/2_H[p] k , gpH [p] k ? gv= ( p p + v) k/2_g p+vH [p+v] k ,

which is proved by an induction on k, using the following properties (Appell sequence and recurrence relation) of Her-mite polynomials: ∂ ∂xH [p] k+1(x) = s k + 1 p H [p] k (x) ∂ ∂x gp(x)H [p] k (x) = − s k + 1 p gp(x)H [p] k+1(x). Proof of Theorem 3:

We refer to (21) as the mu-rate. Let us first consider Gaussian codes, i.e., when (U, X) is jointly Gaussian, and see what mu-rate they can achieve. Without loss of generality, we can assume that X = U + V , with U and V independent and

Gaussian, with respective variance Q and R satisfying P = Q + R. Then, (21) becomes 1 2E log(1 + RH2 v ) + µ 1 2E log 1 + P H2 1 + RH2. (28)

Now, we pick a µ and look for the optimal power R that must be allocated to V in order to maximize the above expression. We are interested in cases for which the optimal R is not at the boundary but at an extremum of (28), and if the maxima is unique, the optimal R is found by the first derivative check, which gives Ev+RHH2 2 = µE

H2

1+RH2. Since we will look for µ, v, with R > 0, previous condition can be written as

E RH2

v + RH2 = µE

RH2

1 + RH2. (29)

We now check if we can improve on (28) by moving away from the optimal jointly Gaussian (U, X). There are several ways to perturb (U, X), we consider first the following case. We keep U and V independent, but perturb them away from Gaussian’s in the following way:

pUε(u) = gQ(u)(1 + ε(H [Q] 3 (u) + δH4)) (30) pVε(v) = gR(v)(1 − ε(H [R] 3 (v) − δH4)) (31)

with ε, δ > 0 small enough. Note that these are valid density functions and that they preserve the first two moments of U and V . The reason why we add δH4, is to ensure that

(13) is satisfied, but we will see that for our purpose, this can essentially be neglected. Then, using Lemma 2, the new distribution of X is given by pX(x) = gP(x)(1 + ε Q P 32 H₃[P ]− ε R P 32 H₃[P ]) + f (δ) where f (δ) = δgP(x)ε( Q P 4₂ H₄[P ] + R_P42_H[P ] 4 ), which

tends to zero when δ tends to zero. Now, by picking P = 2R, we have

pX(x) = gP(x) + f (δ). (32)

Hence, by taking δ arbitrarily small, the distribution of X is arbitrarily close to the Gaussian distribution with variance P . We now want to evaluate how these Hermite perturbations perform, given that we want to maximize (21), i.e.,

h(Y1|U, H) − h(Z1) + µh(Y2|H) − µh(Y2|U, H). (33)

We wonder if, by moving away from Gaussian distributions, the gain achieved for the term −h(Y2|U, H) is higher than the

loss suffered from the other terms. Using Theorem 2, Lemma 1 and Lemma 2, we are able to precisely measure this and we get h(Y1|U = u, H = h) = h(ghu,v+Rh2(1 − ε Rh2 v + Rh2 32 H₃[hu,v+Rh2])) =1 2log 2πe(v + Rh 2_{) −}ε 2 2 Rh2 v + Rh2 3 + o(ε2) + o(δ)

(9)

h(Y2|U = u, H = h) = 1 2log 2πe(1 + Rh 2_{) −}ε 2 2 Rh2 1 + Rh2 3 + o(ε2) + o(δ) and because of (32) h(Y2|H = h) = 1 2log 2πe(1 + P h 2_{) + o(ε}2_{) + o(δ).}

Therefore, collecting all terms, we find that for Uε and Vε

defined in (30) and (31), expression (41) reduces to IG− ε2 2E _RH2 v + RH2 3 + µε 2 2E _RH2 1 + RH2 3 + o(ε2) + o(δ) (34)

where IG is equal to (28) (and is the mu-rate obtained with

Gaussian inputs). Hence, if for some distribution of H and some v, we have that

µE _RH2 1 + RH2 k − E _RH2 v + RH2 k > 0, (35) when k = 3 and R is optimal for µ, we can take ε and δ small enough in order to make (34) strictly larger than IG. We have shown how, if verified, inequality (35) leads

to counter-examples of the Gaussian optimality, but with similar expansions, we would also get counter-examples if the following inequality holds for any power k instead of 3, as long as k ≥ 3. Let us summarize what we obtained: Let R be optimal for µ, which means that (29) holds if there is only one maxima (not at the boarder). Then, non-Gaussian codes along Hermite’s strictly outperforms Gaussian codes, if, for some k ≥ 3, (35) holds. If the maxima is unique, this becomes

ET (v)k ET (1)k < ET (v) ET (1) where T (v) = RH 2 v + RH2.

So we want the Jensen gap of T (v) for the power k to be small enough compared to the Jensen gap of T (1).

We now give an example of a fading distribution for which the above conditions can be verified. Let H be binary, taking values 1 and 10 with probability half and let v = 1/4. Let µ = 5/4, then for any values of P , the maximizer of (28) is at R = 0.62043154, cf. Figure 1, which corresponds in this case to the unique value of R for which (29) is satisfied. Hence if P is larger than this value of R, there is a corresponding fading BC for which the best Gaussian code splits the power on U and V with R = 0.62043154 to achieve the best mu-rate with µ = 5/4. To fit the counter-examples with the choice of Hermite perturbations made previously, we pick P = 2R. Finally, for these values of µ and R, (35) can be verified for k = 8, cf. Figure 2, and the corresponding Hermite code (along H8) strictly outperforms any Gaussian codes.

Note that we can consider other non-Gaussian encoders, such as when U and V are independent with U Gaussian and

Fig. 1. Gaussian mu-rate, i.e., expression (28), plotted as a function of R for µ = 5/4, v = 1/4, P = 1.24086308 and H binary {1; 10}. Maxima at R = 0.62043154.

Fig. 2. LHS of (35) as a function of R, for µ = 5/4, v = 1/4, k = 8 and H binary {1; 10}, positive at R = 0.62043154.

V non-Gaussian along Hermite’s. Then, we get the following condition. If for k ≥ 3 and R optimal for µ, we have

E _RH2 v + RH2 k (36) < µ " E _RH2 1 + RH2 k − R P k E _{P H}2 1 + P H2 k# , (37) then Gaussian encoders are not optimal. Notice that previous inequality is stronger than the one in (35) for fixed values of the parameters. Yet, it can still be verified for valid values of the parameters and there are also codes with U Gaussian and V non-Gaussian that outperform Gaussian codes for some degraded fading BCs.

(10)

Proof of Theorem 4:

Let ε, δ > 0 and let X1 and X2 be respectively distributed

as gp(1 + ε[Hk+ δH4k]) and gp(1 − ε[Hk− δH4k]), where

k 6= 1, 2. We have

I(X1, X1+ aX2+ Z1) = h(X1+ aX2+ Z1) − h(aX2+ Z1)

where X_kG are independent Gaussian 0-mean and p-variance random variables. Hence, we need to evaluate the contribution of each divergence appearing in previous expression, in order to know if the perturbations are improving on the Gaussian distributions. Let us first analyze h(X1+ aX2 + Z1). The

density of X1+ aX2+ Z1 is given by

gp(1 + ε[Hk+ δH4k]) ? ga2_p(1 − ε[H_k− δH_4k]) ? g₁, (38) which, from Theorem 2, is equal to

gp+a2_p+1(1+ ε{ " p p + a2_{p + 1} k2 Hk+ δ _p p + a2_{p + 1} 2k H4k # − " a2_p p + a2_{p + 1} k2 Hk− δ _a2_p p + a2_{p + 1} 2k H4k # − εL}) where L = gp[Hk+ δH4k] ? ga2p[Hk− δH4k] ? g1 gp+a2_p+1 . Note that each direction in each line of the bracket {·} above, including L, satisfy (10) and (11). Using Lemma 3, we have

L = gp[Hk+ δH4k] ? ga2_p+1[ _a2_p a2_p+1 k2 Hk− _a2_p a2_p+1 2k δH4k] gp+a2_p+1 = C1H [p+a2p+1] 2k + C2H [p+a2p+1] 5k + C3H [p+a2p+1] 8k , (39)

where C1, C2, C3are constants. Therefore, the density of X1+

aX2+Z1is a Gaussian gp+a2_p+1perturbed along the direction Hk in the order ε and several Hl with l ≥ 2k in the order ε2

(and other directions but that have a δ order). So we can use Lemma 2 and write

h(X1+ aX2+ Z1) = h(X1G+ aX G 2 + Z1) − D(X1+ aX2+ Z1||X1G+ aX G 2 + Z1)

Using Lemma 1, we have D(X1+ aX2+ Z1||X1G+ aX G 2 + Z1) · = ε 2 2k " p p + a2_{p + 1} k2 Hk+ δ _p p + a2_{p + 1} 2k H4k # − " _a2_p p + a2_{p + 1} k2 Hk− δ _a2_p p + a2_{p + 1} 2k H4k # k2 = ε 2 2(1 − a k₎2 p p + a2_{p + 1} k +ε 2 2o(δ). Hence h(X1+ aX2+ Z1) = h(X1G+ aX G 2 + Z1) −ε 2 2(1 − a k₎2 _p p + a2_{p + 1} k +ε 2 2 o(δ). Similarly, we get D(aX2+ Z1||aX2G+ Z1) · = ε 2 2 a2p a2_{p + 1} k +ε 2 2o(δ) and I(X1, X1+ aX2+ Z1) · = I(X₁G, X₁G+ aX₂G+ Z1) +ε 2 2 " _a2_p a2_{p + 1} k − (1 − ak₎2 _p p + a2_{p + 1} k# +ε 2 2o(δ). Finally, we have I(X2, X2+ aX1+ Z2) = I(X1, X1+ aX2+ Z1) and I(X1, X1+ aX2+ Z1) + I(X2, X2+ aX1+ Z2) · = I(X₁G, X₁G+ aX₂G+ Z1) + I(X2G, X G 2 + aX G 1 + Z2) + ε2 " _a2_p a2_{p + 1} k − (1 − ak₎2 _p p + a2_{p + 1} k# + ε2o(δ). Hence, if for some k 6= 3 we have

_a2 a2_{p + 1} k − (a k_{− 1)}2 (p + a2_{p + 1)}k > 0 (40)

we can improve on the iid Gaussian distributions gp by using

the respective Hermite perturbations.

Now, we could have started with X1 and X2 distributed as

gp(1 + εbkHek) and gp(1 + εckHek), where eHk is defined in

(22). With similar expansions, we would then get that we can improve on the Gaussian distributions if for some some bk, ck

and k 6= 1, 2 we have " _a2_p a2_{p + 1} k − (a 2_p)k_{+ p}k (p + a2_{p + 1)}k # (b2_k+ c2_k) − 4 _ap p + a2_{p + 1} k bkck > 0.

But the quadratic function

(b, c) ∈ R27→ γ(b2_{+ c}2_{) − 2δbc,}

with δ > 0, can be made positive if and only if γ + δ > 0, and is made so by taking bk = −ck. Hence, the initial choice we

made about X1 and X2 is optimal. Moreover, note that for

this distribution of X1and X2, we could have actually chosen

k = 2 as well. Because, even if Lemma 2 tells us that we must use correction terms, these correction terms will cancel out when we consider the sum-rate, since bk= −ck and since

the correction is in ε. There is however another problem when using k = 2, which is that gp(1 + εH2) has a larger second

(11)

2, we can compensate this excess on the first channel use with the second channel use, and because of the symmetry, we can achieve the desired rate. But this is allowed only with synchronization. We could also have used perturbations that are mixtures of Hermite’s, such as gp(1 + εPkbkHk).

We would then get mixtures of previous equations as our condition. But in the current problem this will not be helpful. Finally, perturbing iid Gaussian inputs in a independent but non i.d. way, i.e., to perturb different components in different Hermite directions, cannot improve on our scheme, from previous arguments. The only option which is not investigated here (but in a work in progress), is to perturb iid Gaussian inputs in a non independent manner. Finally, if we work with k = 1, the proof sees the following modification. In (39), we now have a term in H2. However, even if this term is in the

order ε2, we can no longer neglect it, since from Lemma 2, a ε2_H

2 term in the direction comes out as a ε

2

√

2 term in

the entropy. Hence, we do not get the above condition for k = 1, but the one obtained by replacing (ak _{− 1)}2 _with

(a2_{+1), and the condition for positivity can never be fulfilled.}

Proof of Proposition 1:

From Theorem 4, we know that when treating interference as noise and when F2(a, p) > 0, it is better to use encoders

drawn from the 2 dimensional distributions X₁2 and X₂2, where (X1)1 ∼ gp(1 + ε eH2), (X2)1 ∼ gp(1 − ε eH2),

(X1)2∼ gp(1 − ε eH2) and (X2)2∼ gp(1 + ε eH2), as opposed

to using Gaussian distributions. But perturbations in H2

are changing the second moment of the input distribution. Hence, this scheme is mimicing a time-sharing in our local setting. Moreover, a direct computation also allows to show that, constraining each user to use Gaussian inputs of arbitrarily block length n, with arbitrary covariances having a trace bounded by nP , the optimal covariances are pIn if

F2(a, p) ≤ 0, and otherwise, are given by a time-sharing

scheme (cf. Definition 1 for the definition of a Gaussian time-sharing scheme and covariance matrices).

Proof of Proposition 2:

Note that when using blind time-sharing, no matter what the delay in the asynchronization of each user is, the users are interfering in n/4 channel uses and have each a non-intefering channel in n/4 channel uses (the rest of the n/4 channel uses are not used by any users). Hence, if the receiver have the knowledge of the asynchronization delay, the following sum-rate can be achieved: 1₄(log(1 + 2p) + log(1 +_1+2a2p2_p)). And if the delay is unknown to the receivers, the previous sum-rate can surely not be improved on.

Disproof of Conjecture 1:

This proof uses similar steps as previous proofs. Using Lemma (14), we express I(X; X + hX₁G+ Z) ≤ I(X; X + hX1+ Z). as − D(X + hXG 1 + Z||X G_{+ hX}G 1 + Z) ≤ −D(X + hX1+ Z||XG+ hX1G+ Z) + D(hX1+ Z||hX1G+ Z). (41)

We then pick X, X1∼ gp(1 + ε eHk) and assume that k is even

for now. We then have X + hX₁G+ Z ∼ gp(1 + εHk) ? gph2_+v = gp+ph2+v(1 + ε( p p + ph2_{+ v}) k/2_H k) and D(X + hX1G+ Z||XG+ hX1G+ Z) · = ε 2 2( p p + ph2_{+ v}) k_. Similarly, X + hX1+ Z ∼ gp(1 + εHk) ? gph2(1 + H_k) ? g_v = gp(1 + εHk) ? gph2_+v(1 + ( ph2 ph2_{+ v}) k/2_H k) gp+ph2_+v(1 + ε( p p + ph2_{+ v}) k/2_H k + ε( ph 2 p + ph2_{+ v}) k/2 Hk) and D(X + hX1+ Z||XG+ hX1G+ Z) · =ε 2 2 ( p p + ph2_{+ v}) k_{(1 + h}k₎2_. Finally, hX1+ Z ∼ gph2(1 + εHk) ? gv = gh2_p+v(1 + ε( ph2 ph2_{+ v}) k/2_H k) and D(hX1+ Z||hX1G+ Z) · =ε 2 2( ph2 ph2_{+ v}) k_, Therefore, (41) is given by − _p p + ph2_{+ v} k ≤ − _p p + ph2_{+ v} k (1 + hk)2+ _ph2 ph2_{+ v} k + o(1) and if _p p + ph2_{+ v} k (1 + hk)2− _p p + ph2_{+ v} k − ph2 ph2_{+ v} k > 0 (42)

(12)

Fig. 3.

for some k even and greater than 4, we have a counter example to the strong conjecture. Note that, using the same trick as in previous proofs, that is, perturbing along eHkinstead of Hk, we

get that if (42) holds for any k ≥ 3, we have a counter example to the strong conjecture. Defining u := v/p = 1/SNR, (42) is equivalent to G(h, u, k) := (1 + hk₎2 (1 + h2_{+ u)}k − ₁ 1 + h2_{+ u} k − _h2 h2_{+ u} k > 0. (43) As shown in Figure 3, this can indeed happen. An interesting observation is that the range where (43) holds is broader when u is larger, i.e., when SNR is smaller. Indeed, when u = v = 0, which corresponds to dropping the additive noise Z, we do not get a counter-example to the conjecture. But in the presence of Gaussian noise, the conjecture does not hold for some distributions of X, X1. The conjecture had been numerically

checked with binary inputs at low SNR, and in this regime, it could not be disproved. With the hint described above, we checked numerically the conjecture with binary inputs at high SNR, and there we found counter-examples.

VI. DISCUSSION

We have developed a technique to analyze codes drawn from non-Gaussian ensembles using the Hermite polynomials. If the performance of non-Gaussian inputs is usually hard to analyze, we showed how with this tool, it reduces to the analysis of analytic power series. This allowed us to show that Gaussian inputs are in general not optimal for degraded fading Gaussian BC, although they might still be optimal for many fading distributions. For the IC problem, we found that in the asynchronous setting and when treating interference as noise, using non-Gaussian code ensembles (H3

perturbations) can strictly improve on using Gaussian ones,

when the interference coefficient is above a given threshold, which significantly improves on the existing threshold (cf. [4]). We have also recovered the threshold of the moderate regime by using H2 perturbations in the synch setting, showing that

this global threshold is reflected in our local setting. We also met mysteriously in our local setting the other global threshold found in [2], [8], [10], below which treating interference as noise with iid Gaussian inputs is optimal. It is worth noting that this two global thresholds (moderate regime and noisy interference) are recovered with our tool from a common analytic function. We hope to understand this better with a work in progress.

The Hermite technique provides not only counter-examples to the optimality of Gaussian inputs but it also gives insight on the competitive situations in Gaussian network problems. For example, in the fading BC problem, the Hermite technique gives a condition on what kind of fading distributions and degradedness (values of v) non-Gaussian inputs must be used. It also points out that the perturbation in H3are most effective

when carried in an opposite manner for the two users, so as to make the distribution of X close to Gaussian.

Finally, in a different context, local results could be “lifted” to corresponding global results in [1]. There, the localization is made with respect to the channels and not the input distri-bution, yet, it would be interesting to compare the local with the global behavior for the current problem too. The fact that we have observed some global results locally, as mentioned previously, gives hope for possible local to global extensions. A work in progress aims to use our tool beyond the local set-ting, in particular, by analyzing all sub-Gaussian distributions. Moreover, there are interesting connections between the results developed in this paper and the properties of the Ornstein-Uhlenbeck process. Indeed, some of these properties have already been used in [3] to solve the long standing entropy monotonicity conjecture, and we are currently investigating these relations from closer.

ACKNOWLEDGMENT

The authors would like to thank Tie Liu and Shlomo Shamai for pointing out problems relevant to the application of the proposed tool, Daniel Stroock for discussions on Hermite polynomials, and Emre Telatar for stimulating discussions and helpful comments.

REFERENCES

[1] E. Abbe and L. Zheng, ”Linear universal decoding: a local to global geometric approach”, Submitted IEEE Trans. Inform. Theory, 2008. [2] V.S. Annapureddy and V.V. Veeravalli, “Gaussian Interference Network:

sum Capacity in the Low Interference Regime and New Outer Bounds on the Capacity Region”, Submitted IEEE Trans. Inform. Theory, 2008. [3] S. Artstein, K. Ball, F. Barthe and A. Naor, Solution of Shannon’s Problem

on the Monotonicity of Entropy, Journal of the American Mathematical Society 17, 975-982 (2004).

[4] E. Calvo, J. Fonollosa, and J. Vidal, The Totally Asynchronous Interfer-ence Channel with Single-User Receivers, ISIT 2009, Seoul.

[5] M. Costa, ”On the Gaussian interference channel,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 607-615, Sept. 1985.

[6] T. M. Cover, “Comments on broadcast channel”, IEEE Trans. on Inform. Theoryvol. 44, n. 6, pp. 2524-2530, Oct. 1998.

(13)

[7] T. M. Cover and J. A. Thomas, “Elements of Information Theory”, John Wiley & Sons, New York, NY, 1991.

[8] A. S. Motahari, A. K. Khandani, “Capacity Bounds for the Gaussian Interference Channel”, CoRR abs/0801.1306: (2008).

[9] S. Shamai (Shitz) and R. Laroia, The intersymbol interference channel: Lower bounds on capacity and channel precoding loss, IEEE Trans. Inform. Theory, vol. 42, pp. 1388-1404, Sept. 1996.

[10] X. Shang, G. Kramer, B. Chen, “Outer bound and noisy-interference sum-rate capacity for symmetric Gaussian interference channels”, CISS 2008: 385-389.

[11] A.J. Stam, ”Some inequalities satisfied by the quantities of information of Fisher and Shannon”, Information and Control, 2: 101-112, 1959. [12] D. W. Stroock, “A Concise Introduction to the Theory of Integration”,

Third Edition, Birkh¨auser, 1999.

[13] D. Tuninetti and S. Shamai, “On two-user fading Gaussian broadcast channels with perfect channel state information at the receivers”, Int. Symp. on Inform. Theory (ISIT), 2003.