Joint Subcarrier and Power Allocation in NOMA: Optimal and Approximate Algorithms

(1)

HAL Id: hal-02302721

https://hal.archives-ouvertes.fr/hal-02302721v2

Submitted on 18 Mar 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Optimal and Approximate Algorithms

Lou Salaun, Marceau Coupechoux, Chung Shue Chen

To cite this version:

Lou Salaun, Marceau Coupechoux, Chung Shue Chen. Joint Subcarrier and Power Allocation in

NOMA: Optimal and Approximate Algorithms. IEEE Transactions on Signal Processing, Institute

of Electrical and Electronics Engineers, 2020, 68 (1), pp.2215 - 2230. �10.1109/TSP.2020.2982786�.

�hal-02302721v2�

(2)

Joint Subcarrier and Power Allocation in NOMA:

Optimal and Approximate Algorithms

Lou Sala¨un, Student Member, IEEE, Marceau Coupechoux, and Chung Shue Chen, Senior Member, IEEE

Abstract—Non-orthogonal multiple access (NOMA) is a promising technology to increase the spectral efficiency and enable massive connectivity in 5G and future wireless networks. In contrast to orthogonal schemes, such as OFDMA, NOMA multiplexes several users on the same frequency and time resource. Joint subcarrier and power allocation problems (JSPA) in NOMA are NP-hard to solve in general. In this family of problems, we consider the weighted sum-rate (WSR) objective function as it can achieve various tradeoffs between sum-rate performance and user fairness. Because of JSPA’s intractability, a common approach in the literature is to solve separately the power control and subcarrier allocation (also known as user selection) problems, therefore achieving sub-optimal result. In this work, we first improve the computational complexity of existing single-carrier power control and user selection schemes. These improved procedures are then used as basic building blocks to design new algorithms, namely OPT-JSPA, ε-JSPA and GRAD-JSPA. OPT-JSPA computes an optimal solution with lower complexity than current optimal schemes in the literature. It can be used as a benchmark for optimal WSR performance in simulations. However, its pseudo-polynomial time complexity remains impractical for real-world systems with low latency requirements. To further reduce the complexity, we propose a fully polynomial-time approximation scheme called ε-JSPA. Since, no approximation has been studied in the literature, ε-JSPA stands out by allowing to control a tight trade-off between performance guarantee and complexity. Finally, GRAD-JSPA is a heuristic based on gradient descent. Numerical results show that it achieves near-optimal WSR with much lower complexity than existing optimal methods.

I. INTRODUCTION

In multi-carrier multiple access systems, the total frequency bandwidth is divided into subcarriers and assigned to users to optimize the spectrum utilization. Orthogonal multiple access (OMA), such as orthogonal frequency-division multiple access (OFDMA) adopted in 3GPP-LTE and also 5G New Radio Phase 1 standards [1], [2], only serves one user per subcarrier in order to avoid intra-cell interference and have low-complexity signal decoding at the receiver side. OMA is known to be suboptimal in terms of spectral efficiency [3].

The principle of multi-carrier non-orthogonal multiple ac-cess (MC-NOMA) is to multiplex several users on the same subcarrier by performing signals superposition at the transmit-ter side. Successive intransmit-terference cancellation (SIC) is applied at the receiver side to mitigate interference between super-posed signals. MC-NOMA is a promising multiple access

L. Sala¨un and C. S. Chen are with Bell Labs, Nokia Paris-Saclay, 91620 Nozay, France, and also with Lincs, Paris 75013, France (e-mail: lou.salaun@nokia-bell-labs.com; chung shue.chen@nokia-bell-labs.com).

L. Sala¨un and M. Coupechoux are with LTCI, Telecom

ParisTech, University of Paris-Saclay, Paris 75013, France (e-mail:

marceau.coupechoux@telecom-paristech.fr).

technology for 5G and beyond mobile networks as it can achieve higher spectral efficiency than OMA schemes [4], [5]. Careful optimization of the transmit powers is required to control the intra-carrier interference of superposed signals and maximize the achievable data rates. Besides, due to error propagation and decoding complexity concerns in practice [6], subcarrier allocation for each transmission also needs to be optimized. As a consequence, joint subcarrier and power allocation problems (JSPA) in NOMA have received much attention. In this class of problems, weighted sum-rate (WSR) maximization is especially important as it can achieve different tradeoffs between sum-rate performance and user fairness [7]. Two types of power constraints are considered in the lit-erature. On the one hand, cellular power constraint is mostly used in downlink transmissions to represent the total transmit power budget available at the base station (BS). On the other hand, individual power constraint sets a power limit independently for each user. The latter is often considered in uplink scenarios [8], [9], nevertheless it can also be applied to the downlink [10], [11].

It is known that the equal-weight sum-rate and WSR max-imization are both strongly NP-hard if we consider individual power constraints in OFDMA [12] and in MC-NOMA sys-tems [11], [13]. Nevertheless, several algorithms have been developed to perform subcarrier and/or power allocation for MC-NOMA and this type of constraints. Fractional transmit power control (FTPC) is a simple heuristic that allocates a fraction of the total power budget to each user based on their channel conditions [4]. In [8] and [9], heuristic user pairing strategies and iterative resource allocation algorithms are studied for uplink transmissions. A time efficient two-step heuristic is introduced in [10] to solve the problem with equal weights. Reference [11] derives an upper bound on the optimal WSR and proposes a Lagrangian duality and dynamic programming (LDDP) scheme. This scheme achieves near-optimal result, assuming the power budget is divided in J equal parts to be allocated. It mainly serves as benchmark due to its high computational complexity when J is large in practical systems, which may not be suitable for low latency requirements.

If we consider now cellular power constraints without individual power constraints, the equal-weight sum-rate max-imization is polynomial time solvable in OFDMA [14]. To the best of our knowledge, it is unknown whether WSR maximization in MC-NOMA under this type of constraints is polynomial time solvable or NP-hard. Reference [15, Proposi-tion 1] proves that the subcarrier optimizaProposi-tion is NP-hard only in the case of equal power allocation among the users. The

(3)

proposed polynomial-time reduction from the NP-complete 3-dimensional matching (3DM) to the NOMA problem should have shown that all instances of 3DM can be mapped into an instance of the NOMA problem to be complete. Besides, the two-stage dynamic programming (TSDP) proposed in [11] solves it optimally in pseudo-polynomial time depending on J. Therefore, the WSR problem with cellular power constraint is weakly NP-hard at most (in contrast to strongly NP-hard for the individual power constraints as mentioned previously). Only a few papers have developed optimization schemes in this setting, which are either heuristics with no theoretical performance guarantee or algorithms with impractical com-putational complexity. For example, a greedy user selection and heuristic power allocation scheme based on difference-of-convex programming is proposed in [16]. In reference [15], a matching algorithm is developed to perform subcarrier allocation. A minorization-maximization algorithm is used in [17] to compute precoding vectors of a MISO-NOMA system. The authors of [18] employ monotonic optimization to develop an optimal resource allocation policy, which serves as benchmark due to its exponential complexity. The TSDP scheme is also optimal for cellular power constraint scenarios as proven in Theorem 13 of reference [11], but it has high pseudo-polynomial complexity as well.

We note that, to the best of our knowledge, no polynomial-time approximation scheme (PTAS) has been proposed in the literature. Although PTAS is interesting for practical consider-ations of NP-hard problems, as it provides theoretical perfor-mance guarantees with controllable computational complexity. Motivated by this observation, we extend the framework of our previous paper [19] with a fully polynomial-time approxima-tion scheme (FPTAS) for the WSR maximizaapproxima-tion problem with cellular power constraint. In [19], we developed the following algorithms: two basic building blocks SCPC and SCUS which solve respectively the carrier power control and single-carrier user selection problems in polynomial time; and a heuristic JSPA scheme based on projected gradient descent, SCPC and SCUS, denoted here by GRAD-JSPA. Our contri-butions are as follows:

1) We improve SCPC and SCUS by performing precom-putation to avoid repeated operations each time they are executed. This reduces their computational complexity by a factor proportional to the number of users.

2) The above precomputation also speeds up GRAD-JSPA, which now has low and practical computational com-plexity. In addition, numerical results show that GRAD -JSPA achieves near-optimal WSR, as well as significant improvement in performance over OMA.

3) We develop a new optimal algorithm, called OPT-JSPA, suitable for use as a benchmark for optimal WSR per-formance in simulations. We show that OPT-JSPA has lower computational complexity than existing optimal schemes [11], [18].

4) We propose a FPTAS, which is denoted by ε-JSPA. Its design is based on the improved SCPC and SCUS, as well as techniques from the multiple choice knapsack prob-lem [20]. By definition of FPTAS, its performance is within

a factor1_{− ε of the optimal, for any ε > 0. Moreover, it} has polynomial complexity in both the input size and1/ε. Since, no approximation has been studied in the literature, ε-JSPA stands out by allowing to control a tight trade-off between performance guarantee and complexity.

Through the aforementioned points, our aim is to deepen the understanding of JSPA and NOMA resource allocation problems. We develop optimal, approximate and heuristic schemes which are each suitable for systems with different computational capabilities, as well as for performance bench-marking. In addition, we provide mathematical tools to study the WSR maximization problem, which can also be applied to other similar resource allocation problems.

The paper is organized as follows. In Section II, we present the system model and notations. Section III formu-lates the WSR problem. We consider two single-carrier sub-problems in Section IV that were previously solved using SCPC and SCUS in [19]. We propose improved versions of these algorithms, namely i-SCPC and i-SCUS, which perform precomputation to reduce their complexity. Based on these basic building blocks, we develop a low complexity gradient descent based heuristic (GRAD-JSPA), a pseudo-polynomial time optimal algorithm (OPT-JSPA) and a FPTAS with ε-approximation guarantee (ε-JSPA) in Section V. We show in Section VI some numerical results, highlighting our solution’s WSR performance and computational complexity. In Section VII, we discuss about how to generalize our framework to more realistic channel estimation models and multi-antenna systems. Finally, we conclude in Section VIII.

II. SYSTEMMODEL ANDNOTATIONS

We define in this section the system model and notations used throughout the paper. We consider a downlink multi-carrier NOMA system composed of one base station (BS) serving K users. We denote the index set of users by K , {1, . . . , K}, and the set of subcarriers by N , {1, . . . , N}. The total system bandwidthW is divided into N subcarriers of bandwidthWn, for each n∈ N , such that Pn∈NWn= W .

We assume orthogonal frequency division, so that adjacent subcarriers do not interfere each other. Moreover, each sub-carrier n_{∈ N experiences frequency-flat block fading on its} bandwidthWn.

Let pnk denotes the transmit power from the BS to user

k _{∈ K on subcarrier n ∈ N . User k is said to be active on} subcarrier n if pn

k > 0, and inactive otherwise. In addition,

let gn

k be the channel gain between the BS and user k on

subcarriern, and ηn

k be the received noise power. We assume

that the channel gains are perfectly known. We discuss about more realistic models with imperfect channel state information (CSI) in Section VII. For simplicity of notations, we define the normalized noise power as η˜n

k , ηkn/gnk. We denote

by p , (pn

k)k∈K,n∈N the vector of all transmit powers, and

pn_{, (p}n

k)k∈K the vector of transmit powers on subcarriern.

In power domain NOMA, several users are multiplexed on the same subcarrier using superposition coding. A common approach adopted in the literature is to limit the number of superposed signals on each subcarrier to be no more thanM .

(4)

The value of M is meant to characterize practical limitations of SIC due to decoding complexity and error propagation [6]. We represent the set of active users on subcarrier n by Un,{k ∈ K : pnk > 0}. The aforementioned constraint can

then be formulated as _{∀n ∈ N , |U}n| ≤ M, where |·| denotes

the cardinality of a finite set. Each subcarrier is modeled as a multi-user Gaussian broadcast channel [6] and SIC is applied at the receiver side to mitigate intra-band interference.

The SIC decoding order on subcarrier n is usually de-fined as a permutation over the active users on n, i.e., πn:{1, . . . , |Un|} → Un. However, for ease of reading, we

choose to represent it by a permutation over all users_{K, i.e.,} πn:{1, . . . , K} → K. These two definitions are equivalent

in our model since the Shannon capacity (2) does not depend on the inactive users k ∈ K \ Un, for which pnk = 0. For

i _{∈ {1, . . . , K}, π}n(i) returns the i-th decoded user’s index.

Conversely, user k’s decoding order is given by π−1 n (k).

In this work, we consider the optimal decoding order studied in [6, Section 6.2]. It consists of decoding users’ signals from the highest to the lowest normalized noise power:

˜ η_πn_n(1)≥ ˜η n πn(2) ≥ · · · ≥ ˜η n πn(K). (1)

Userπn(i) first decodes the signals of users πn(1) to πn(i−1)

and subtracts them from the superposed signal before decoding its own signal. Interference from users πn(j) for j > i is

treated as noise. The maximum achievable data rate of userk on subcarrier n is given by Shannon capacity:

Rn k(pn) , Wnlog2 1 + gn kpnk PK j=π−1 n (k)+1g n kpnπn(j)+ η n k ! , (a) = Wnlog2 1 + pn k PK j=π−1n (k)+1p n πn(j)+ ˜η n k ! , (2)

where equality (a) is obtained after normalizing by gn k. We

assume perfect SIC, therefore interference from users πn(j)

for j < π−1n (k) is completely removed in (2).

III. PROBLEMFORMULATION

Let w =_{w1, . . . , wK} be a sequence of K positive

weights. The main focus of this work is to solve the following JSPA optimization problem:

maximize p X k∈K wk X n∈N Rnk(pn), subject to C1 : X k∈K X n∈N pnk ≤ Pmax, C2 : X k∈K pnk ≤ Pmaxn , n∈ N , C3 : pnk ≥ 0, k ∈ K, n ∈ N , C4 : _|Un| ≤ M, n ∈ N . (_P)

The objective of _{P is to maximize the system’s WSR. As} discussed in Section I, this objective function has received much attention since its weights w can be chosen to achieve different tradeoffs between sum-rate performance and fair-ness [7]. Note thatC1 represents the cellular power constraint, i.e., a total power budget Pmax at the BS. In C2, we set a

power limit ofPn

max for each subcarriern. This is a common

assumption in multi-carrier systems, e.g., [12], [14]. Constraint C3 ensures that the allocated powers remain non-negative. Due to decoding complexity and error propagation in SIC [6], we restrict the maximum number of multiplexed users per subcarrier toM in C4.

For ease of reading, we summarize some system parameters of a given instance of_{P, for all n ∈ N , as follows:}

In_{= (w,}

K, Wn, (gkn)k∈K, (ηkn)k∈K) .

Let us consider the following change of variables:

∀n ∈ N , xni , (PK j=ipnπn(j), if i∈ {1, . . . , K}, 0, ifi = K + 1. (3) We definex , (xn i)i∈{1,...,K},n∈N andxn, (xni)i∈{1,...,K}.

Lemma 1 (Equivalent problemP0_).

Problem_{P is equivalent to problem P}0 formulated below:

maximize x X n∈N K X i=1 fn i (xni) + A, (P0) subject to C10: X n∈N xn1 ≤ Pmax, C20: xn1 ≤ P n max, n∈ N , C30: xn i ≥ xni+1, i∈ {1, . . . , K}, n ∈ N , C300: xnK+1= 0, n∈ N , C40: _|Un0| ≤ M, n ∈ N ,

where for any i∈ {1, . . . , K} and n ∈ N , we have:

fin(xni) ,      Wnlog2 xn 1 + ˜ηnπn(1) wπn(1) , if i = 1, Wnlog2 (xn i+ ˜η n πn(i))wπn(i) xn i+ ˜ηπn(i−1)n _wπn(i−1) , if i > 1,

and where_Un0 ,{i ∈ {1, . . . , K}: xni > xni+1}. The constant

termA =Pn∈Nwπn(K)log2 1/˜ηn πn(K) is chosen so that_P and _P0 have exactly the same optimal value.

Proof: The idea is to apply the change of variables (3) to problem _{P. Details of the calculation can be found in} Appendix A.

The advantage of this formulation_P0is that it exhibits a sep-arable objective function in both dimensions i∈ {1, . . . , K} and n ∈ N . In other words, it can be written as a sum of functionsfn

i , each only depending on one variablexni.

IV. SINGLE-CARRIEROPTIMIZATION

In this section, we focus on a simpler problem, in which there is a single subcarriern∈ N and a power budget ¯Pn _is

given for this subcarrier:

Fn P¯n= max xn K X i=1 fin(xni) + An, (PSC0 (n)) subject to C20–30: ¯Pn≥ xn 1 ≥ . . . ≥ xnK≥ 0, C40 : _|Un0| ≤ M,

(5)

whereAn_{= w} πn(K)log2 1/˜ηn πn(K) . C20_–30 _{is obtained by}

combining C20, C30 and C300. Fn _P_¯n _{denotes its optimal}

value. Algorithms SCPC and SCUS have been introduced in our previous paper [19] to tackle respectively the single-carrier power control and single-single-carrier user selection sub-problems that arise from_P_SC0 (n). We provide technical details of these algorithms below, and we show how precomputation can further improve their computational complexity. They will be used as basic building blocks in Section V to design efficient algorithms GRAD-JSPA, OPT-JSPA and ε-JSPA, for the joint resource allocation problem.

A. Analysis of the Separable Functions fn i

We introduce auxiliary functions to help us in the analysis of fin and the algorithm design. Forn∈ N , i ∈ {1, . . . , K}

and j _{≤ i, assume that the consecutive variables x}n j, . . . , xni

are all equal to a certain valuex_∈0, ¯Pn_{. We define}_fn j,ias: fj,in(x) , i X l=j fln(x), =      Wnlog2 x + ˜ηn πn(i) w_πn(i) , if j = 1, Wnlog2 (x+˜ηn πn(i))wπn(i) x+˜ηn πn(j−1) _wπn(j−1) , if j > 1. This simplification of notation is relevant for the analysis of SCPC and SCUS in the following subsections. Indeed, if usersj, . . . , i− 1 are not active (i.e., j, . . . , i − 1 /∈ U0

n), then

xn

j = · · · = xni, therefore

Pi

l=jfln can be replaced by fj,in

andxn

j+1, . . . , xni are redundant withxnj. If constraintC40 is

satisfied, up to M users are active on each subcarrier. Thus, evaluating the objective function of _P_SC0 (n) only requires O(M ) operations.

We study the properties offn

j,iin Lemma 2. Note thatfin=

fn

i,i, therefore Lemma 2 also holds for functions fin. Fig. 1

shows the two general forms that can be taken by fn j,i.

Lemma 2 (Properties of fn j,i).

Let n_{∈ N , i ∈ {1, . . . , K}, and j ≤ i, we have:}

• If j = 1 or wπn(i) ≥ wπn(j−1), then f

n

j,i is increasing

and concave on[0,_∞).

• Otherwise when j > 1 and wπn(i) < wπn(j−1), f

n j,i is unimodal. It increases on _−˜ηπn(j−1), c1 and decreases on [c1,∞), where c1= wπn(j−1)η˜πn(i)− wπn(i)η˜πn(j−1) wπn(i)− wπn(j−1) . Besides, fn j,i is concave on −˜ηπn(j−1), c2 and convex on [c2,∞), where c2= √wπn(j−1)η˜πn(i)− √wπn(i)η˜πn(j−1) √wπn(i)− √wπn(j−1) ≥ c1.

Proof: These analytical properties can be obtained by studying the first and second derivatives of fn

j,i. Details can

be found in Appendix B.

We present in Algorithm 1 the pseudocode ARGMAXf

which computes the maximum of fn j,i on

0, ¯Pn _following

the result of Lemma 2. ARGMAXf only requires a constant

number of basic operations, therefore its complexity is O(1).

max atc1 change of convexity atc2 fn j,i, forwπn(i)< wπn(j−1) fn j,i, forwπn(i)≥ wπn(j−1)

Fig. 1. The two general forms of functionsfn

j,i

Algorithm 1 Compute maximum offn j,ion 0, ¯Pn function ARGMAXf j, i,In, ¯Pn 1: a← πn(i) 2: b← πn(j− 1) 3: if j = 1 or wa≥ wb then 4: return ¯Pn 5: else

6: returnmaxn0, minnwbη˜a−waη˜b

wa−wb , ¯P

noo

7: end if end function

B. Single-Carrier Power Control

The single-carrier power control problem _P_{SCP C}0 (n) is equivalent to problem_P_SC0 (n), with the exception that a fixed user selection_U_n0 (or equivalently_Un) is given as input instead

of being an optimization variable. It is defined below:

maximize xn K X i=1 fin(xni) + An, (PSCP C0 (n)) subject to C20–30: ¯Pn ≥ xn 1 ≥ . . . ≥ xnK ≥ 0.

We denote its optimal value byFn

U0 n, ¯Pn

.

Since inactive users k /_{∈ U}n have no contribution on the

data rates, i.e., pn

k = 0 and Rnk = 0, we remove them for

the study of this sub-problem. Without loss of generality, we index the remaining active users on subcarrier n by in ∈ {1n, . . . ,|Un0|n}. For example, if Un0 ={4, 7, 10}, then

1n = 4, 2n = 7 and 3n = 10. For simplicity of notation,

we add an index 0n = 0, which does not correspond to any

user. From the definition of_U_n0, variablesxnl with index from

l = (i_{− 1)}n+ 1 to in are equal, for any i≥ 1. In the above

example, we would have x1 = x2 = x3 = x4 > x5 =

x6 = x7 > x8 = x9 = x10. Thus, the objective function

of _P_{SCP C}0 (n) can be written as:

K X i=1 fin(xni) + An = |U0 n| X i=1 f_(i−1)n _n_+1,i_n xnin + Bn, (4)

whereBn_{= A}n _{if the last active user’s index is}_|U0

n|n= K,

andBn _{= f}n |U0

n|n+1,K(0)+A

n_{otherwise. For}₁_{≤ j ≤ i ≤ K,}

we simplify some notations as follows: ˜

fj,in (Un0,·) , f(jn−1)n+1,in(·) ,

ARGMAXf j, i,˜ In,U_n0, ¯Pn_,

(6)

We reformulate the problem as: maximize xn in |U0 n| X i=1 ˜ fi,in Un0, xnin + Bn, (_P_{SCP C}0 (n)) subject to C20–30: ¯Pn ≥ xn 1n≥ . . . ≥ x n |U_n0_|_n≥ 0.

Algorithm 2 presents the SCPC method proposed in [19]. The idea is to iterate over variablesxn

infori = 1 to|U

0

n|, and

com-pute their optimal value x∗ = ARGMAXf (i, i,˜ In,U_n0, ¯Pn)

at line 3. If the current allocation satisfies constraint C30, then xnin gets value x

∗_{. Otherwise, the algorithm backtracks}

at line 6 and finds the highest index j _{∈ {1, . . . , i − 2}} such that xn jn ≥ ARGMAX ˜ f (j + 1, i,_In_, U0 n, ¯Pn). Then, variables xn (j+1)n, . . . , x n

in are set equal to ARGMAX

˜ f (j + 1, i,_In_,

U0

n, ¯Pn) at line 10. The optimality and complexity

of SCPC are presented in Theorem 3.

Algorithm 2 Single-carrier power control algorithm (SCPC) function SCPC _In,_U0

n, ¯Pn

1: fori = 1 to_|U0 n| do

2: . Compute the optimal of ˜f_i,in

3: x∗← ARGMAXf i, i,˜ In,U_n0, ¯Pn

4: . Modify x∗ if this allocation violates constraintC30

5: j← i − 1 6: while xn jn< x ∗ _and_j _{≥ 1 do} 7: x∗← ARGMAXf j, i,˜ In,U_n0, ¯Pn 8: j← j − 1 9: end while 10: xn_(j+1) n, . . . , x n in← x ∗ 11: end for 12: returnxn 1n, . . . , x n |U0 n|n end function

Theorem 3 (Optimality and complexity of SCPC). Given subcarrier n ∈ N , a set U0

n of M active users and

a power budget ¯Pn_{, algorithm} _{SCPC computes the optimal}

single-carrier power control. Its worst case computational complexity isO M2_.

Proof: We prove this theorem in Appendix C by mathe-matical induction combined with Lemma 2.

In multi-carrier resource allocation schemes, such as GRAD -JSPA and ε--JSPA, it is often required to compute the optimal single-carrier power control and WSR for many different values of power budget ¯Pn. In these cases, running many times SCPC is actually not efficient in terms of computational complexity, since several computations may be repeated. To avoid this, we propose in Algorithm 3 an improved SCPC algorithm (i-SCPC). The idea is to perform precomputation before runtime by calling SCPC(_In_,

U0

n, Pmax) and storing its

resultxn 1n, . . . , x

n |U0

n|n as a global variable (also called lookup

table). Any subsequent evaluation with input _In, _U_n0, ¯Pn_,

where ¯Pn

≤ Pmax, can be obtained as in line 1.

Theorem 4 (Optimality and complexity of i-SCPC). Given subcarrier n ∈ N and a set U0

n of M active users,

the precomputation of i-SCPC has complexity O M2_{. Any}

subsequent evaluation costs O(M ). Hence, for C different

Algorithm 3 Improved SCPC algorithm with precomputation input: _In,_U0 n, Pmax global variable: xn 1n, . . . , x n |U0 n|n initialization: xn 1n, . . . , x n |U0 n|n← SCPC(I n_, U0 n, Pmax) function i-SCPC ¯Pn 1: return min{xn1n, ¯P n }, . . . , min{xn |U_n0_|_n, ¯Pn} end function

power budgets, algorithm i-SCPC computes their respective optimal single-carrier power control with overall complexity O M2_{+ CM}_.

Proof: We show in Appendix D that subsequent evalua-tions of SCPC can be obtained as in line 1 of Algorithm 3.

Remark. Note that SCPC and i-SCPC returns _|Un0| values

xn 1n, . . . , x

n |U0

n|n representing only the active users’ variables.

These values are sufficient to compute the optimal power allocation and WSR of _P_{SCP C}0 (n) as shown in Eqn. (4). If needed, the full vector xn _{can be obtained by the following}

procedure inO(K) operations:

1: for i = 1 to_|Un0| and l = (i − 1)n+ 1 to in do 2: xn_l ← xn_i n 3: end for 4: for l =|U0 n|n+ 1 to K do 5: xn_l ← 0 6: end for

C. Single-Carrier User Selection

Unlike in the previous subsection, we consider here further-more the user selection _U_n0 optimization under multiplexing and SIC constraint C40_{, i.e., we solve} _P0

SC(n). In [19], a

dynamic programming (DP) is proposed to solve _P_SC0 (n). Here, we first develop a similar DP procedure in Algorithm 4 (SCUS). Then, we propose an improved version (i-SCUS) which performs SCUS as precomputation.

The idea of SCUS is to compute recursively the elements of three arraysV , X, U . Let m∈ {0, . . . , M}, j ∈ {1, . . . , K} andi_{∈ {j, . . . , K}, we define V [m, j, i] as the optimal value} of the following problem_P_SC0 [m, j, i]:

V [m, j, i] , max xn K X l=j fn l (xnl), (PSC0 [m, j, i]) subject to C20, C30, C300, C40: _|U_n0_{| ≤ m,} C50: xnj =· · · = xni.

This problem is more restrictive than_P_SC0 (n). The objective function only depends on variablesxn

j, . . . , xnK.C40limits the

number of active users tom. Moreover, variables xn j, . . . , xni

are equal according toC50_.

It is interesting to note thatV [M, 1, 1] is the optimal value of _P_SC0 (n), since the objective function is PK_l=1fn

l (xnl) for

j = 1 and constraint C50 _{becomes trivially true for}_{j = i. Let}

xn

j∗, . . . , xnK∗be the optimal solution achievingV [m, j, i]. We

defineX[m, j, i] , xn

(7)

Algorithm 4 Single-carrier user selection algorithm (SCUS) function SCUS _In, M, ¯Pn

1: . Initialize arrays V , X, U for m = 0 and i = K 2: fori = K to 1 and j = i to 1 do 3: V [0, j, i]← f_j,Kn (0) 4: X [0, j, i]← 0 5: U [0, j, i]← ∅ 6: end for 7: form = 1 to M and j = K to 1 do 8: x∗← ARGMAXf j, K,In, ¯Pn 9: V [m, j, K]← f_j,Kn (x∗) 10: X [m, j, K]← x∗ 11: U [m, j, K]← ∅ 12: end for

13: . Compute V , X, U for m∈ [1, M] and j ≤ i ≤ K − 1 14: fori = K− 1 to 1 and m = 1 to M and j = i to 1 do

15: x∗← ARGMAXf j, i,In, ¯Pn

16: vact← fj,in (x∗) + V [m− 1, i + 1, i + 1]

17: vinact← V [m, j, i + 1]

18: ifvact> vinactandx∗> X [m− 1, i + 1, i + 1] then

19: V [m, j, i]← vact 20: X [m, j, i]← x∗ 21: U [m, j, i]← (m − 1, i + 1, i + 1) 22: else 23: V [m, j, i]← vinact 24: X [m, j, i]← X [m, j, i + 1] 25: U [m, j, i]← (m, j, i + 1) 26: end if 27: end for

28: . Retrieve the optimal solution xn 29: xn1, . . . , xnK ← 0 30: (m, j, i)← (M, 1, 1) 31: repeat 32: xn_j, . . . , xn_i ← X [m, j, i] 33: (m, j, i)← U [m, j, i] 34: until(m, j, i) = ∅ 35: returnxn end function

due to constraint C50_{. The idea of SCUS is to recursively}

compute the elements of V through the following relation:

V [m, j, i] =     

vact, ifvact> vinact

andx∗_{> X [m}_{− 1, i + 1, i + 1],}

vinact, otherwise,

(5) where x∗ _{= A}_RGMAX_{f j, i,}_In_{, ¯}_Pn_{, and} _v

act (resp.vinact)

corresponds to allocation where useri is active (resp. inactive): vact = fj,in (x∗) + V [m− 1, i + 1, i + 1],

vinact= V [m, j, i + 1].

During SCUS’s iterations, the arrayU keeps track of which previous element of V has been used to compute the current value functionV [m, j, i]. This allows us to retrieve the entire optimal vector xn _{at the end of Algorithm 4 (at lines}

28-35) by backtracking from index (M, 1, 1) to ∅, where ∅ is set at initial indices (see lines 5 and 11) to indicate the recursion termination. To sum up,X and U have two different

recurrence relations depending on the cases in Eqn. (5). IfV [m, j, i] = vact, then: X [m, j, i] = x∗, U [m, j, i] = (m_{− 1, i + 1, i + 1).} IfV [m, j, i] = vinact, then: X [m, j, i] = X [m, j, i + 1] , U [m, j, i] = (m, j, i + 1).

Whenm = 0, no user can be active on this subcarrier due to constraintC40. Therefore,V , X, U can be initialized by:

V [0, j, i] = fn j,K(0) ,

X [0, j, i] = 0, U [0, j, i] = ∅.

For simplicity, we also extendV , X and U on the index i = K andj_{≤ K and initialize them as follows:}

V [m, j, K] = fj,Kn (x∗) ,

X [m, j, K] = x∗, U [m, j, K] = ∅.

A detailed analysis is given in Appendix E. Theorem 5 (Optimality and complexity of SCUS). Given a subcarrier n_{∈ N , a power budget ¯}Pn _and _M

≥ 1, algorithm SCUS computes the optimal single-carrier power control and user selection of _P_SC0 (n). Its worst case compu-tational complexity is O M K2_.

Proof: The proof is done by induction based on the principle of dynamic programming. See Appendix E.

We present i-SCUS in Algorithm 5, which performs pre-computation to avoid repeating the DP procedure when mul-tiple evaluations are required. The algorithm precomputes vectorsV , X, U from SCUS(_In_{, M, P}

max) before runtime,

at line 1. Then, in lines 2-5, it retrieves the active users set U0

n and optimal solution xn1, . . . , xnK of each V [M, 1, i], i ∈

{1, . . . , K}, and stores them in collection. Any subsequent evaluation with a lower budget ¯Pn

≤ Pmax, can be obtained

by searching the best allocation among the K possibilities in collection (lines 6-7). Each allocation is truncated as in i-SCPC ¯Pn _{to satisfy budget ¯}_Pn_{. The optimality and}

complexity of Algorithm 5 are given in Theorem 6. Theorem 6 (Optimality and complexity of i-SCUS). Given a subcarrier n∈ N , a power budget ¯Pn _and _M _{≥ 1,}

the precomputation of i-SCUS has complexityO M K2_{. Any}

subsequent evaluation costs O(M K). Hence, for C different power budgets, i-SCUS computes their respective optimal single-carrier power control and user selection_P_SC0 (n) with overall complexityO M K2_{+ CM K}_.

Proof:See Appendix F.

Table I summarizes the complexity of the single-carrier algorithms developed in this section. They will be used as basic building blocks to design JSPA schemes in Section V.

(8)

Algorithm 5 Improved SCUS algorithm with precomputation input: _In, M, Pmax

global variable: collection initialization:

1: GetV, X, U from SCUS(_In_{, M, P} max)

2: fori = 1 to K do

3: Retrieve the active users set _U_n0 of V [M, 1, i] and its optimal solutionxn 1, . . . , xnK 4: Add(U0 n, xn1, . . . , xnK) to collection 5: end for function i-SCUS ¯Pn 6: Get (_U0

n, xn1, . . . , xnK) in collection that maximizes

Fn₍ U0 n, ¯Pn) =P|U 0 n| l=1 f˜l,ln Un0, min{xnln, ¯P n }+ Bn 7: returnmin_{xn 1, ¯Pn}, . . . , min{xnK, ¯Pn} end function TABLE I

SUMMARY OF THE SINGLE-CARRIER RESOURCE ALLOCATION SCHEMES

Algorithm Complexity to performC evaluations

SCPC [19] O CM2

i-SCPC O M2₊_CM

SCUS [19] O CM K2

i-SCUS O M K2₊_{CM K}

V. JOINTSUBCARRIER ANDPOWERALLOCATION

Recall that Fn _P_¯n _{is the optimal value of} _P0

SC(n) with

power budget ¯Pn_{. We have} _Fn _P_¯n₌PK

i=1fin(xni) + An,

where xn

1, . . . , xnK is the output of i-SCUS ¯Pn

. Using this notation, the JSPA problem _P0 can be simplified as:

maximize ¯ P X n∈N Fn P¯n, (_P_{M C}0 ) subject to P¯n∈ F,

where ¯Pn_{, for}_n_{∈ N , are intermediate variables representing}

each subcarrier’s power budget. ¯P , ¯P1_{, . . . , ¯}_PN_denotes

the power budget vector. The feasible set F , { ¯P : X

n∈N

¯ Pn

≤ Pmax and0≤ ¯Pn≤ Pmaxn , n∈ N }

is chosen to satisfyC10 andC20 in_P0.

Problem _P_{M C}0 consists in optimizing the power budget ¯

Pn _{allocated to each subcarrier} _{n. For a given budget ¯}_Pn_,

Fn _P_¯n _{is computed by finding the optimal single-carrier}

user selection and power control using i-SCUS ¯Pn_{. The}

choice of ¯Pn_{affects the single-carrier user selection and power}

control, i.e., variables xn _and

U0

n. The latter influence the

value ofFn_{, which in turn has an impact on the power budget}

optimization. Although variablesxn_and

U0

nare hidden inFn,

they are nevertheless optimized jointly with ¯Pn_{. Indeed, we}

can see that _P_{M C}0 is equivalent to _P0 when replacing Fn _by

its definition in_P_SC0 (n) along with its constraints C20 _to_C40_.

A. Gradient Descent Based Heuristic

GRAD-JSPA is an efficient heuristic based on projected gradient descent. Its principle is to perform a two-stage opti-mization as presented in Fig. 2. The first-stage is a projected gradient descent on ¯P in the search space F. The gradient descent requires to evaluateP_n_∈NFn_{and its gradient at each}

iteration. This task is carried out by i-SCUS in the second-stage. We denote the derivative of Fn _{at ¯}_Pn _by _Fn0 _P¯n_.

Lemma 7 shows how to compute it. As illustrated in Fig. 2, the second-stage is called at each gradient iteration to return Fn _P_¯n _and_Fn0 _P¯n_{to the first-stage, for all}_n

∈ N . Lemma 7 (Derivative ofFn_).

Letxn

1, . . . , xnK be the output of i-SCUS ¯Pn

. The left deriva-tive of Fn _{at ¯}_Pn_{, can be computed as follows:}

Fn0 P¯n= Wnwπn(l) xn l + ˜ηπnn(l) ln(2)= Wnwπn(l) ¯ Pn_{+ ˜}_ηn πn(l) ln(2), wherel is the greatest index such that xn

l = ¯Pn, andln(2) is

the natural logarithm of2.

Proof: To get Fn0, we first prove that Fn0 U0 n, ¯Pn = f1,ln0 P¯n andFn_{( ¯}_Pn_{) = max} U0 n{F n₍_U0 n, ¯Pn)}, where max is

taken over all active users sets incollection of i-SCUS. See Appendix G for the detailed proof of semi-differentiability.

First-stage algorithm: projected gradient descent Follow the gradient ofP_n_∈NFn_{( ¯}_Pn_{) and update} each subcarrier’s power budget ¯Pn_{in the simplex}

F

Second-stage algorithm: i-SCUS Compute the optimal single-carrier power allocation

xn_{under budget ¯}_Pn_{and constraint} |U0 n| ≤ M Input: M, Pmax, Pmaxn in out Output: optimal power allocation

among subcarriers ¯P

For each Fn_evaluation: In_{, M, ¯}_Pn

in out

Fn_{( ¯}_Pn_{), F}n0 ( ¯Pn₎

Fig. 2. Overview of GRAD-JSPA

The pseudocode of GRAD-JSPA is described in Algo-rithm 6. Input ξ corresponds to the error tolerance at ter-mination, as we can see at line 8. The search direction at lines 4-5 is the gradient of P_n_∈NFn _{evaluated at ¯}_{P .}

Since F1_{, . . . , F}N _{are independent, it is equal to the vector}

of F10 _P¯1_{, . . . , F}N0 _P¯N_{. Note that the step size} _{α at}

line 6 can be tuned by backtracking line search or exact line search [21, Section 9.2]. We adopt the latter to perform simulations. The projection of ¯P + α∆ on the simplex F at line 7 can be computed efficiently [21, Section 8.1.1], the details of its implementation are omitted here.

We showed in our previous work [19] that GRAD-JSPA worst case complexity isO log(1/ξ)N M K2_{when SCUS is}

used to evaluate functionsFn_,_n

∈ N . We show in Theorem 8 that the complexity of GRAD-JSPA can be reduced by the use of i-SCUS.

Theorem 8 (Complexity of GRAD-JSPA).

Letξ be the error tolerance at termination. Algorithm GRAD -JSPA has complexityO N M K2_{+ log(1/ξ)N M K}_when

i-SCUS is used to evaluate functions Fn_,_n

∈ N .

Proof: In Appendix G, we prove that the objective function is piece-wise α-strongly concave and β-smooth.

(9)

Algorithm 6 Gradient descent based heuristic (GRAD-JSPA) function GRAD-JSPA (_In₎

n∈N, M, Pmax, Pmaxn , ξ

1: Let ¯P _{← 0 be the starting point}

2: repeat

3: Save the previous vector ¯P0 _{← ¯}_P

4: Determine a search direction∆

5: ∆← F10 P¯1, . . . , FN0 P¯N 6: Choose a step size α

7: Update ¯P ← projection of ¯P + α∆ onF

8: until_{|| ¯}P0− ¯P|| ≤ ξ

9: return ¯P end function

Therefore, the convergence to a local optimum follows from classical convex optimization results [22, Section 2.2.4].

Although i-SCUS (or equivalently SCUS) is optimal, the returned Fn _P_¯n _{is not necessarily concave in ¯}_Pn_{. As a}

consequence, GRAD-JSPA is not guaranteed to converge to a global maximum. Nevertheless, we show by numerical results in Section VI that it achieves near-optimal WSR performance with low complexity.

B. Pseudo-Polynomial Time Optimal Scheme

The JSPA problem as formulated in_P_{M C}0 has real variables ¯

Pn_{on a continuous search space}

F. However, the study of NP-hard optimization problems and their approximation requires parameters and variables to be represented by a bounded number of bits [23], i.e., with bounded precision. This is also a reasonable assumption in practice since MC-NOMA systems are subject to minimum transmit power limitation at the BS and floating-point arithmetic precision of the hardware. As a consequence, we discretize the search space _{F, in the same} way as in [11]. Let δ be the minimum transmit power such that the variables ¯Pn_{can only take value of the form}_l

· δ, for l _{∈ {0, 1, . . . , b}Pmax

δ c}. We denote the number of non-zero

power values asJ =bPmax

δ c. The feasible set then becomes

F0,{ ¯P : X

n∈N

¯

Pn_{≤ P}max and0≤ ¯Pn≤ Pmaxn , n∈ N ,

and ¯Pn= l_{· δ, l ∈ {0, . . . , J}, n ∈ N }.} We rewrite problem_P_{M C}0 with search space_F0 as follows:

maximize y X n∈N J X l=1 cn,lyn,l, (MCKP) subject to X n∈N J X l=1 an,lyn,l≤ Pmax, J X l=1 an,lyn,l≤ Pmaxn , n∈ N , J X l=1 yn,l≤ 1, n ∈ N , yn,l∈ {0, 1}, n ∈ N , l ∈ [1, J] ,

where cn,l = Fn(l· δ) and an,l = l · δ. The discretized

JSPA problem, denoted by MCKP, is known as the multiple choice knapsack problem [20]. It has N disjoint classes

each containing J items to be packed into a knapsack of capacityPmax. Each item has a profitcn,land a weightan,l,

representing respectively the WSR and power consumption of this allocation on subcarriern. The binary variable yn,l takes

value 1 if and only if item l in class n is assigned to the knapsack. The problem consists in assigning at most one item from each class to maximize the sum of their profit. We denote its optimal value by F∗

M CKP.

As mentioned previously, discretizing _P_{M C}0 is necessary due to the bounded precision which arises inherently from the study of algorithms and their implementation in practical sys-tems. Besides, MCKP can be used to approach the continuous solution of_P_{M C}0 with arbitrary precision. Indeed, Theorem 9 shows that the discretization error is upper-bounded by a linear function in δ with a coefficient depending on the system’s parameters.

Theorem 9 (Discretization error betweenF∗ _and_F∗ M CKP).

The gap between the optimal values of the continuous prob-lem_P_{M C}0 and its discretized version MCKP with step sizeδ is upper-bounded by: F∗− FM CKP∗ ≤ δ X n∈N max k∈K ( Wnwπn(k) ¯ Pn∗_{+ ˜}_ηn πn(k) ln(2) ) ,

where ¯Pn∗_{is the optimal power budget of}_P0

M Con subcarrier

n_{∈ N .}

Proof:We derive the proof in Appendix H.

The discrete problem MCKP can be solved optimally by dy-namic programming by weightsstudied in [20, Section 11.5]. Based on this idea, we propose OPT-JSPA (see Algorithm 7) to solve _P_{M C}0 . We first transform _P_{M C}0 to problem MCKP: from line 1 to 5, every item’s profitcn,l is computed using

i-SCUS. Then, we perform dynamic programming by weights at lines 6-7. We summarize the optimality and complexity of OPT-JSPA in Theorem 10. Detailed analysis of the dynamic programming can be found in Appendix I.

Algorithm 7 The pseudo-polynomial time optimal scheme function OPT-JSPA (_In₎

n∈N, M, Pmax, Pmaxn , δ

1: . Compute the parameters of MCKP 2: forn_{∈ N and l ∈ [0, J] do}

3: an,l← l · δ

4: cn,l← Fn(l· δ)

5: end for

6: return optimal allocation from the dynamic

program-7: ming by weights[20] end function

Theorem 10 (Optimality and complexity of OPT-JSPA). Given a minimum transmit power δ, algorithm OPT-JSPA computes the optimal of _P_{M C}0 on the discrete set _F0. Its computational complexity isO N M K2_{+ JN M K + J}2_N_,

which is pseudo-polynomial in J.

Proof:We explain the principle of dynamic programming by weights and derive its complexity in Appendix I.

(10)

TABLE II

COMPARISON OF SOMEJSPASCHEMES PROPOSED IN THIS WORK AND IN THE LITERATURE

Algorithm Performance guarantee Complexity forJ discrete power values

Monotonic optimization with outer

polyblock approximation [18] Optimal Exponential inK and N

TSDP [11] Optimal O J2_{N M K}

OPT-JSPA Optimal O N M K2₊_{JN M K + J}2_N

ε-JSPA FPTAS, i.e., its performance is within a

factor 1− ε of the optimal, for any ε > 0 O

N M K2_+minn_log(J)N2M K ε + N3 ε2 , JN M K +J 2_No

GRAD-JSPA Heuristic O N M K2_{+ log(J)N M K}

OPT-JSPA is said to be pseudo-polynomial since it depends on the total number of power values J, whereas all system’s parameters and variables are encoded inO(log(J)) bits. As a consequence, in practical systems, the contribution ofJ to the computation time is way higher than parameters N , K, M .

C. Fully Polynomial-Time Approximation Scheme

We develop a FPTAS to avoid the pseudo-polynomial com-plexity inJ that is inherent to the optimal schemes OPT-JSPA and TSDP [11]. According to [24], an algorithm is said to be a FPTAS if it outputs a solution within a factor 1_{− ε of the} optimal, for any ε > 0. Moreover, its running time is bounded by a polynomial in both the input size and 1/ε. A FPTAS is the best trade-off one can hope for an NP-hard optimization problem in terms of performance guarantee and complexity, assuming P _{6= NP.}

The proposed FPTAS, called ε-JSPA (see Algorithm 8), is based on dynamic programming with scaled profits. Scaling the profits is a common technique to reduce the number of items computed in MCKP. First, we compute an estimationU of MCKP’s optimal value, such thatU ≥ F∗

M CKP ≥ U/4. We

explain the estimation procedure in Appendix J. Then, instead of computing all JN profit values cn,l, we only consider the

subset Ln of items on each subcarriern such that:

Ln ,{l0≤ J, l ≤

4N

ε − 1: cn,l

0 ≥ lεU

4N > cn,l0−1}. This can be seen as considering only one profit value per interval of the form [(l− 1) · εU/4N, l · εU/4N], for l ∈ {1, . . . , 4N/ε}. Each Ln, for n ∈ N , can be obtained by

multi-key binary search [25]. All function evaluations required by the multi-key binary search are done by i-SCUS.

Finally, we apply the dynamic programming by profits [20, Section 11.5] in lines 5-6. It is known that the optimal solution obtained by dynamic programming by profits considering only items in Ln, differs from FM CKP∗ by at most a factor 1− ε.

The performance of ε-JSPA are summarized in Theorem 11. We provide more details on the estimation U in Appendix J and the dynamic programming by profits in Appendix K. Theorem 11 (Performance and complexity of ε-JSPA). Given a minimum transmit power δ and an approximation factor ε, algorithm ε-JSPA computes an ε-approximation of P0

M C on the discrete set F0.

The algorithm is a FPTAS with asymptotic complexity: ON M K2_+minn_log(J)N2_{M K}

ε +

N3

ε2 , JN M K +J 2_No_.

Proof: We derive this result in Appendix K, using the dynamic programming by profits and the estimation procedure studied in Appendix J.

Algorithm 8 The proposed FPTAS (ε-JSPA) function ε-JSPA (_In)_n_∈N, M, Pmax, Pmaxn , δ, ε

1: Compute an estimationU of F∗ M CKP

2: forn_{∈ N do}

3: Getan,l,cn,l, for l∈ Ln by multi-key binary search

4: end for

5: return ε-approximate allocation from the dynamic

6: programming by profits [20] end function

D. Comparison of JSPA Algorithms

In Table II, we compare the performance and complexity of the proposed algorithms with JSPA schemes in the literature. Reference [18] studied an optimal monotonic optimization framework, which has exponential complexity in K and N . The two-stage dynamic programming algorithm (TSDP) pro-posed by Lei et al. has complexity O J2_{N M K} _according

to [11, Theorem 13]. Both TSDP and the proposed OPT -JSPA are optimal. However, OPT-JSPA has lower complexity than TSDP. Indeed, the right term J2_{N is lower by a factor}

M K, the middle term JN M K by a factor J. The left term N M K2 _{also improves the complexity, since reference [10]}

shows that in practical systemsJ = Θ (min{K, MN}). This result is verified by simulation in Section VI. ε-JSPA is the proposed FTPAS. Its complexity is bounded by a polynomial inN/ε and log(J). If N/ε = O(J), it has lower complexity than OPT-JSPA. Otherwise, if N/ε = Ω(J), its complexity

is asymptotically equivalent to OPT-JSPA’s complexity. This means that for very low error rate ε, the complexity of ε-JSPA tends to that of OPT-JSPA. Finally, GRAD-JSPA is a heuristic. Its performance is evaluated through simulation in the next section. When applied in a discrete setting, the error tolerance or precision ξ is related to δ = 2ξ. Hence, its complexity is proportional to log(J), which is way lower than the optimal schemes with pseudo-polynomial complexity due toJ.

VI. NUMERICALRESULTS

We evaluate the WSR and computational complexity of OPT-JSPA, ε-JSPA and GRAD-JSPA through numerical sim-ulations. We compare them with the optimal benchmark scheme TSDP introduced in [11]. We consider a hexagonal

(11)

TABLE III

SIMULATION PARAMETERS

Parameter Value

Cell radius 1000 m

Min. distance from user to BS 35 m

Carrier frequency 2 GHz

Path loss model 128.1 + 37.6 log10d dB, d is in km

Shadowing Log-normal, 10 dB standard deviation

Fading Rayleigh fading with variance 1

Noise power spectral density −174 dBm/Hz

System bandwidthW 5 MHz

Number of subcarriersN 20

Number of usersK 5 to 60

Total power budgetPmax 10 W

Minimum transmit powerδ 0.01 W

Number of power valuesJ 103

Error toleranceξ 10−4

ParameterM 1 (OMA), 2 and 3 (NOMA)

5 10 15 20 25 30 35 40 45 50 55 60 Number of users K 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 W e ig h te d s u m -r a te (b it /s ) ×107 OPT-JSPA, M = 3 OPT_{-JSPA, M = 2} OPT-JSPA, M = 1 TSDP, M = 3 TSDP, M = 2 TSDP, M = 1

Fig. 3. WSR of the optimal schemes for different number of usersK

cell of diameter1000 meters, with one BS located at its center andK users distributed uniformly at random in the cell. The users’ weights are generated uniformly at random in [0, 1]. The number of users K varies from 5 to 60, and the number of subcarriers is N = 20. We assume a system bandwidth of W = 5 MHz and Wn = W/N for all n. We follow

the radio propagation model of [26], including path loss, shadowing and Rayleigh fading. The minimum transmit power is δ = 0.01 W. The cellular power budget is Pmax = 10 W,

therefore the number of power values isJ = 103_{. Each point}

in the following figures is the average value obtained over 1000 random instances. Only Fig. 6 and 7 represent a single instance. The simulation parameters and channel model are summarized in Table III.

Fig. 3 shows the WSR performance of OPT-JSPA and TSDP, for M = 1, 2 and 3. We only simulate TSDP for K = 5 to 30, due to its high running time complexity. We see that OPT-JSPA and TSDP achieve the same WSR performance, which is consistent with the fact that they are both optimal. Indeed, the optimality of OPT-JSPA is shown in Theorem 10, and the optimality of TSDP has been proven in [11, Theorem 13]. Although both algorithms have the same performance, we will see further on in Fig. 5 that OPT -JSPA has lower computational complexity than TSDP. The performance gain of NOMA with M = 2 (resp. M = 3) over

5 10 15 20 25 30 35 40 45 50 55 60 Number of users K 10−5 10−4 10−3 P e rf o rm a n c e lo s s GRAD-JSPA, M = 1 GRAD_{-JSPA, M = 2} GRAD_{-JSPA, M = 3}

Fig. 4. Performance loss of GRAD-JSPA compared to the optimal WSR

OMA (i.e.,M = 1) is about 8% (resp. 10%), for K = 60. Fig. 4 illustrates the performance loss of GRAD-JSPA compared to the optimal, forM = 1, 2 and 3. The performance loss is defined as:

Optimal WSR_{− G}RAD-JSPA WSR Optimal WSR .

The markers represent the average performance loss, while the upper intervals indicate the 90th percentile. For example, for K = 10 and M = 1, 90% of GRAD-JSPA results have less than9_{× 10}−4 _{of performance loss. We observe that the}

average performance loss is always below 6_{× 10}−4_{. Hence,}

our proposed heuristic GRAD-JSPA achieves near-optimal solutions in these simulation settings. It is also suitable for large systems, since the performance loss decreases as K or M increases.

In Fig. 5, we count the number of basic operations (ad-ditions, multiplications, comparisons) performed by each al-gorithm, which reflects their computational complexity. The term “improved” in the legend represents the complexity of OPT-JSPA and GRAD-JSPA when using i-SCPC and i-SCUS instead of SCPC and SCUS. There is a significant speed up by employing i-SCPC and i-SCUS as basic building blocks. Indeed, for K = 60 and M = 1, 2 or 3, there is a factor of at least 10 between OPT-JSPA and its improved version. Besides, the improved OPT-JSPA outperforms TSDP

in terms of complexity. For instance, OPT-JSPA reduces the complexity by a factor330, for K = 30 and M = 3. Finally, GRAD-JSPA has low complexity, which makes it a good choice for practical implementation.

Fig. 6 and 7 present the WSR and complexity of ε-JSPA versus 4N/ε. We choose such a normalized x-axis, as it is equal to the number of items evaluated in each subcarrier, i.e., |Ln| = 4N/ε. It can be directly compared to J, which is the

total number of items in each subcarrier in the discretized problem MCKP. Here, we simulate a single instance with K = 60 users to show how ε-JSPA behaves as a function of ε. In Fig. 6, we also present its performance guarantee. Recall that the performance guarantee is1− ε times the optimal. As expected, ε-JSPA is always above its performance guarantee. As N/ε increases, the approximation guarantee tends to the

(12)

10 20 30 40 50 60 Number of users K 104 105 106 107 108 109 1010 N u m b e r o f b a s ic o p e ra ti o n s TSDP, M = 3 TSDP, M = 2 TSDP, M = 1 OPT-JSPA, M = 3 OPT-JSPA, M = 2 OPT-JSPA, M = 1 improved OPT-JSPA, M = 3 improved OPT-JSPA, M = 2 improved OPT-JSPA, M = 1 GRAD-JSPA, M = 3 GRAD-JSPA, M = 2 GRAD-JSPA, M = 1 improved GRAD-JSPA, M = 3 improved GRAD-JSPA, M = 2 improved GRAD-JSPA, M = 1

Fig. 5. Number of basic operations performed by each algorithm versusK

optimal. In this instance, the algorithm already achieves a near-optimal solution for 4N/ε = 400, i.e., ε = 0.2. In Fig. 7, we also plot the complexity of the improved OPT-JSPA for comparison. As explained in Section V-C, the complexity increases with N/ε and becomes (asymptotically) equal to that of OPT-JSPA for N/ε = Ω(J). In this regime, there is apparently no benefit of using ε-JSPA, since OPT-JSPA achieves the optimal with the same complexity. Nevertheless, in practice, we can see that even for 4N/ε_{≥ J, ε-JSPA has} less operations than OPT-JSPA. This is because the number of items computed by ε-JSPA increases slowly and smoothly as a function of ε. This behavior is not captured in the asymptotic complexity (big-O notation). This is verified in Fig. 7 for up to 4J = 4000. In summary, ε-JSPA allows us to control the trade-off between WSR and complexity with ε.

VII. DISCUSSION ONPOSSIBLEGENERALIZATIONS

In this work, we assume that the channel gains are perfectly known. Two more realistic models using only partial CSI can be considered instead: imperfect CSI studied in [27], [28], for which the channel gains are given with a known estimation error probability distribution, and second order statistics (SOS) adopted in [29], for which only the distances between users and BS are known. We believe that our framework can be extended to these cases by maximizing the expected WSR depending on stochastic channel gains, while the power con-straints remain unchanged (i.e., non-stochastic). The challenge

0 500 J = 1000 1500 2000 2500 3000 3500 4000 4N/ε 5.2 5.4 5.6 5.8 6.0 6.2 6.4 W e ig h te d s u m -r a te (b it /s ) ×107 M= 3 : M= 2 : M= 1 : ε-JSPA ε-JSPA ε-JSPA Perf. guarantee Perf. guarantee Perf. guarantee

Fig. 6. WSR of ε-JSPA and its guaranteed performance bound versus 4N/ε

0 500 J = 1000 1500 2000 2500 3000 3500 4000 4N/ε 107 N u m b e r o f b a s ic o p e ra ti o n s M= 3 : M= 2 : M= 1 : ε-JSPA ε-JSPA ε-JSPA improved OPT-JSPA improved OPT-JSPA improved OPT-JSPA

Fig. 7. Number of basic operations performed by ε-JSPA versus 4N/ε

would be to characterize such a stochastic objective function in many scenarios. This is a possible future research direction. As multi-antenna technologies are becoming more and more important in 5G and Beyond 5G systems [1], [2], it would be interesting to extend the current work to multi-antenna transmissions. Paper [30] states that MC-NOMA with multiple antennas is a much more complex problem which requires to develop novel low complexity solutions. One may draw inspiration from the work of Sun et al. [31] which generalizes the monotonic optimization framework of [18] to a MC-MISO-NOMA system. A similar approach might be adopted in our framework, which is explained as follows: Since the SIC decoding order in multi-antenna MC-NOMA systems does not only depend on the channel gains but also on the beamforming (BF), the user clustering (UC) and BF have to be jointly optimized to achieve optimal or approximate performance. Hence, the idea would be to extend SCUS to a joint UC and BF optimization scheme. This scheme can then be integrated in OPT-JSPA, ε-JSPA and GRAD-JSPA, while preserving their performance guarantees. Although optimal joint UC and BF remains a difficult open problem, existing heuristics can be adopted instead. For example, the schemes of paper [32] have shown to outperform classical (OMA-based) MIMO systems

(13)

and other multi-antenna NOMA algorithms by simulations. VIII. CONCLUSION

In this work, we investigate the WSR maximization in MC-NOMA with cellular power constraint. We improve the com-plexity of the single-carrier power control (SCPC) and user selection (SCUS) procedures using precomputation. These improved schemes are denoted by i-SCPC and i-SCUS. We develop three algorithms to solve the JSPA problem, based on i-SCPC and i-SCUS. Firstly, OPT-JSPA gives optimal results with lower complexity than current state-of-the-art optimal schemes, i.e., TSDP [11] and monotonic optimization [18]. Secondly, ε-JSPA is a FPTAS. It achieves a controllable and tight trade-off between approximation guarantee and complex-ity. OPT-JSPA and ε-JSPA are both suitable for performance benchmarking. Finally, GRAD-JSPA is a heuristic. We show by simulation that it has near-optimal WSR with low and practical complexity.

REFERENCES

[1] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The Next Generation

Wireless Access Technology. Academic Press, 2018.

[2] V. W. Wong, R. Schober, D. W. K. Ng, and L.-C. Wang, Key technologies

for 5G wireless systems. Cambridge university press, 2017.

[3] T. M. Cover and J. A. Thomas, Elements of Information Theory. John

Wiley & Sons, 2012.

[4] Y. Saito, A. Benjebbour, Y. Kishiyama, and T. Nakamura, “System-level performance evaluation of downlink non-orthogonal multiple access (NOMA),” in IEEE 24th Int. Symp. on Personal Indoor and Mobile Radio Commun. (PIMRC), 2013, pp. 611–615.

[5] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-orthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends,” IEEE Commun. Mag., vol. 53, no. 9, pp. 74–81, 2015.

[6] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge university press, 2005.

[7] P. C. Weeraddana, M. Codreanu, M. Latva-aho, A. Ephremides, C. Fis-chione et al., “Weighted sum-rate maximization in wireless networks: a review,” Found. and Trends in Netw., vol. 6, no. 1–2, pp. 1–163, 2012. [8] S. Chen, K. Peng, and H. Jin, “A suboptimal scheme for uplink NOMA in 5G systems,” in Int. Wireless Commun. and Mobile Computing Conf. (IWCMC), 2015, pp. 1429–1434.

[9] M. Al-Imari, P. Xiao, M. A. Imran, and R. Tafazolli, “Uplink non-orthogonal multiple access for 5G wireless networks,” in 11th Int. Symp. on Wireless Commun. Syst. (ISWCS), 2014, pp. 781–785.

[10] Y. Fu, L. Sala¨un, C. W. Sung, and C. S. Chen, “Subcarrier and power allocation for the downlink of multicarrier NOMA systems,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 11 833–11 847, 2018. [11] L. Lei, D. Yuan, C. K. Ho, and S. Sun, “Power and channel allocation

for non-orthogonal multiple access in 5G systems: tractability and computation,” IEEE Trans. Wireless Commun., vol. 15, no. 12, pp. 8580– 8594, 2016.

[12] Y.-F. Liu and Y.-H. Dai, “On the complexity of joint subcarrier and power allocation for multi-user OFDMA systems,” IEEE Trans. Signal Process., vol. 62, no. 3, pp. 583–596, 2014.

[13] L. Sala¨un, C. S. Chen, and M. Coupechoux, “Optimal joint subcarrier and power allocation in NOMA is strongly NP-hard,” in IEEE Int. Conf. Commun. (ICC), 2018.

[14] Y.-F. Liu, “Complexity analysis of joint subcarrier and power allocation for the cellular downlink OFDMA system,” IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 661–664, 2014.

[15] B. Di, L. Song, and Y. Li, “Sub-channel assignment, power allocation, and user scheduling for non-orthogonal multiple access networks,” IEEE Trans. Wireless Commun., vol. 15, no. 11, pp. 7686–7698, 2016. [16] P. Parida and S. S. Das, “Power allocation in OFDM based NOMA

systems: a DC programming approach,” in Globecom Workshops, 2014. [17] M. F. Hanif, Z. Ding, T. Ratnarajah, and G. K. Karagiannidis, “A minorization-maximization method for optimizing sum rate in the down-link of non-orthogonal multiple access systems,” IEEE Trans. Signal Process., vol. 64, no. 1, pp. 76–88, 2016.

[18] Y. Sun, D. W. K. Ng, Z. Ding, and R. Schober, “Optimal joint power and subcarrier allocation for MC-NOMA systems,” in IEEE Global Commun. Conf., 2016.

[19] L. Sala¨un, M. Coupechoux, and C. S. Chen, “Weighted sum-rate maximization in multi-carrier NOMA with cellular power constraint,” in IEEE INFOCOM, 2019, pp. 451–459.

[20] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Berlin

Heidelberg: Springer-Verlag, 2004.

[21] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge

university press, 2004.

[22] Y. Nesterov, “Introductory lectures on convex programming volume I: Basic course,” Lecture notes, 1998.

[23] M. R. Garey and D. S. Johnson, Computers and Intractability. W. H.

Freeman New York, 2002, vol. 29.

[24] V. V. Vazirani, Approximation Algorithms. Springer Science & Business Media, 2013.

[25] A. Tarek, “Multi-key binary search and the related performance,” in

Proc. Amer. Conf. on Appl. Math., ser. MATH’08. World Scientific

and Engineering Academy and Society (WSEAS), 2008, pp. 104–109. [26] GreenTouch, Mobile communications WG architecture doc2: Reference

scenarios, May 2013.

[27] Z. Wei, D. W. K. Ng, J. Yuan, and H.-M. Wang, “Optimal resource allocation for power-efficient MC-NOMA with imperfect channel state information,” IEEE Trans. Commun., vol. 65, no. 9, pp. 3944–3961, 2017.

[28] J. Choi, “Joint rate and power allocation for NOMA with statistical CSI,” IEEE Trans. Commun., vol. 65, no. 10, pp. 4519–4528, 2017. [29] Z. Yang, Z. Ding, P. Fan, and G. K. Karagiannidis, “On the

perfor-mance of non-orthogonal multiple access systems with partial channel information,” IEEE Trans. Commun., vol. 64, no. 2, pp. 654–667, 2015. [30] S. R. Islam, M. Zeng, O. A. Dobre, and K.-S. Kwak, “Resource allocation for downlink NOMA systems: Key techniques and open issues,” IEEE Wireless Commun., vol. 25, no. 2, pp. 40–47, 2018. [31] Y. Sun, D. W. K. Ng, and R. Schober, “Optimal resource allocation

for multicarrier MISO-NOMA systems,” in IEEE Int. Conf. Commun. (ICC), 2017.

[32] S. Ali, E. Hossain, and D. I. Kim, “Non-orthogonal multiple access (NOMA) for downlink multiuser MIMO systems: User clustering, beamforming, and power allocation,” IEEE access, vol. 5, pp. 565–577, 2016.

APPENDIX

We first provide in Lemma 12 an important property on the solution maximizingPi_l=1f˜n

l,lsubject toC20–30, fori≤ |Un0|.

This Lemma will be used in Appendices C and G. Lemma 12.

Assume we are given a subcarriern∈ N , a set U0

n of active

users, a power budget ¯Pn_{, and an index} _{i. Let x}n 1n, . . . , x

n in

be the allocation maximizing Pi_l=1f˜n

l,l Un0, xnln , while also satisfying C20_–30_{, i.e., ¯}_Pn ≥ xn 1n ≥ · · · ≥ x n in ≥ 0. xn 1n, . . . , x n

incan be partitioned into sequences of consecutive

terms with the same value. That is, sequences of the form xn qn, . . . , x n q0 n, wherex n qn=· · · = x n q0 n and1≤ q ≤ q 0_{≤ |U}0 n|, q = 1 or xn (q−1)n > x n qn,q 0 ₌_|U0 n| or xnq_n0 < xn(q0+1)n. Any

such sequence satisfies: xnqn=· · · = x n q0 n= ARGMAX ˜ f q, q0,_In,_Un0, ¯Pn . Proof: In this proof, we simplify notation ˜fn

l,l(Un0,·) as ˜ fn l,l(·). Let xnqn, . . . , x n q0

n be a sequence of consecutive terms

with the same value, as defined in Lemma 12. Assume, for the sake of contradiction, thatxn

qn=· · · = x

n

q0_n6= x∗, wherex∗=

ARGMAXf q, q˜ 0,In,U_n0, ¯Pn. Without loss of generality, we

consider the case xn

qn < ARGMAXf q, q˜ 0_,_In_, U0 n, ¯Pn and q > 1. Let yn 1n, . . . , y n

in be an allocation defined as:

ylnn, ( min_{xn (q−1)n, x ∗_{}, if q ≤ l ≤ q}0_, xn ln, otherwise. (6)

(14)

We have the following inequalities: i X l=1 ˜ fl,ln ylnn = X l /∈{q,...,q0_} ˜ fl,ln xnln + ˜fq,qn 0 ynln , (7) > X l /∈{q,...,q0_} ˜ fl,ln xnln + ˜fq,qn 0 xnln = i X l=1 ˜ fl,ln xnln . (8)

Equality (7) comes from the definition in (6). According to Lemma 2, ˜fn

q,q0 is increasing on [0, x∗], which implies

inequality (8). In summary, yn 1n, . . . , y

n

in satisfies C2

0_–30 _by

its definition in (6), and it achieves greater value of Pi_l=1f˜n l,l

thanxn 1n, . . . , x

n

in. This is a contradiction, therefore it must be

that xnqn ≥ ARGMAXf q, q˜ 0_,_In_, Un0, ¯Pn . (9) If q = 1, the same reasoning can be applied by replac-ing min{xn

(q−1)n, x

∗_{} by ¯}_Pn _{in Eqn. (6). We can}

per-form a similar proof by contradiction on the case xn qn >

ARGMAXf q, q˜ 0,In,U_n0, ¯Pnto deduce that:

xn qn ≤ ARGMAX ˜ f q, q0,_In_, Un0, ¯Pn . (10) The desired result follows from (9) and (10).

A. Proof of Lemma 1

The objective of _{P can be written as:} X k∈K wk X n∈N Rnk(pn) = X n∈N X k∈K wkRnk(pn), (b) = X n∈N Wn K X i=1 wπn(i)log2 PK j=ipnπn(j)+ ˜η n πn(i) PK j=i+1pnπn(j)+ ˜η n πn(i) ! , (c) = X n∈N Wn K X i=1 log2   PK j=ipnπn(j)+ ˜η n πn(i) wπn(i) PK j=i+1pnπn(j)+ ˜η n πn(i) w_πn(i)  , (d) = X n∈N Wn  wπn(1)log2   K X j=1 pn πn(j)+ ˜η n πn(1)   + K X i=2 log2   PK j=ipnπn(j)+ ˜η n πn(i) w_πn(i) PK j=ipnπn(j)+ ˜η n πn(i−1) w_πn(i−1)   + wπn(K)log2 1 ˜ ηn πn(K) !# .

Equality (b) comes from the definition in (2). At (c), the weights wπn(i) are put inside the logarithm. Finally, (d) is

obtained by combining the numerator of the i-th term with the denominator of the (i_{− 1)-th term, for i ∈ {2, . . . , K}.}

By applying the change of variables shown in (3), we derive the equivalent problem _P0. The constant term is A = P_n_∈Nwπn(K)log2 1/˜ηn πn(K) . Constraints C10 _and

C20 _{are respectively equivalent to} _{C1 and C2 since x}n 1 =

PK

j=1pnπn(j) =

P

k∈Kpnk, for n ∈ N . Constraints C30

and C300 _{come from} _{C3 and the fact that x}n

i − xni+1 =

pn

πn(i), for any i ∈ {1, . . . , K} and n ∈ N . In the

same way, the active users set in C40 is defined as U0

n,{i ∈ {1, . . . , K}: xni > xni+1}.

B. Proof of Lemma 2

We study the first and second derivatives of fn

j,i, denoted

byfn

j,i0 andfj,in00. Ifj = 1, then we have:

f1,in 0(x) = Wnwπn(i) x + ˜ηn πn(i) ln(2), (11)

which is strictly positive and decreasing for x _{≥ 0. Hence,} fn

1,iis increasing and concave. Forj > 1, the first and second

derivatives are as follows: fj,in0(x) = Wn ln(2) wπn(i) x + ˜ηn πn(i) − wπn(j−1) x + ˜ηn πn(j−1) ! , fj,in00(x) = Wn ln(2)    wπn(j−1) x + ˜ηn πn(j−1) 2 − wπn(i) x + ˜ηn πn(i) 2    . We know thatη˜n πn(j−1)≥ ˜η n

πn(i)by construction of the optimal

decoding order in Eqn. (1). If, in addition, we havewπn(i)≥

wπn(j−1), then f

n

j,i0(x) ≥ 0 and fj,in00(x) ≤ 0 for all x ≥ 0.

We deduce thatfn

j,iis increasing and concave. This proves the

first point of Lemma 2. Now suppose thatwπn(i)< wπn(j−1)

instead. Valuesc1 andc2 defined in Lemma 2 are the unique

roots of the first and second derivatives, i.e.,fn

j,i0(c1) = 0 and

fn

j,i00(c2) = 0. fj,in0is positive on −˜ηπn(j−1), c1

and negative on(c1,∞). This implies that f_j,in is unimodal and has a unique

global maximum atc1 for x > 0. Similarly, fj,in00 is negative

on _−˜ηπn(j−1), c2

and positive on(c2,∞). Therefore, f_j,in is

concave beforec2and convex afterc2. This proves the second

point of Lemma 2.

C. Proof of Theorem 3

The complexity and optimality proofs of SCPC are pre-sented below.

Complexity analysis: At each for loop iterationi, the while loop at line 6 has at most i iterations. Thus, the worst case complexity is proportional toP|Un0|

i=1 i = O |Un0|2

= O M2_.

Optimality analysis: Without loss of generality, we can suppose that the xn

in’s are initialized to zero. We will prove

by induction that at the end of each iteration i at line 10 of Algorithm 2, the following loop invariants are true:

H1(i): Pi_l=1f˜_l,ln is maximized byxn1n, . . . , x

n in,

H2(i): C20–30 is satisfied, i.e., ¯Pn ≥ xn1n≥ · · · ≥ x

n in≥ 0.

Basis: Fori = 1, x∗ _{computed at line 3 is indeed the optimal}

of ˜fn

1,1. The while loop has no effect sincej = 0 < 1, therefore

xn 1n ← x

∗ _{and statements} _H₁_{(1) and H}₂_{(1) are both true.}

Inductive step: Assume thatxn

1n(i−1), . . . , x

n

(i−1)n(i−1) are

the variables verifying H1(i− 1) and H2(i− 1) at iteration

i_{− 1 < K. Let the variables at iteration i be x}n 1n, . . . , x

n in. We

consider two cases: i) We first suppose that:

x∗= ARGMAXf (i, i,˜ In, ¯Pn)≤ xn_(i₋₁₎_n(i− 1). (12)

In this case, Algorithm 2 sets xn in = x

∗ _and _xn ln =

xn

ln(i−1), for all l < i. The induction hypothesis H2(i−

1) states that ¯Pn _{≥ x}n

1n ≥ · · · ≥ x

n