Delay, Memory, and Messaging Tradeoffs in Distributed Service Systems

(1)

Delay, Memory, and Messaging

Tradeoffs in Distributed Service Systems

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Gamarnik, David; Tsitsiklis, John N. and Zubeldia, Martin. “Delay,

Memory, and Messaging Tradeoffs in Distributed Service Systems.”

Proceedings of the 2016 ACM SIGMETRICS International Conference

on Measurement and Modeling of Computer Science, June 14-18

2016, Antibes Juan-les-Pins, France, Association for Computing

Machinery, June 2016. © 2016 The Authors

As Published

http://dx.doi.org/10.1145/2896377.2901478

Publisher

Association for Computing Machinery (ACM)

Version

Author's final manuscript

Citable link

http://hdl.handle.net/1721.1/109061

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike

(2)

Delay, Memory, and Messaging Tradeoffs in Distributed

Service Systems

David Gamarnik

MIT, ORC 77 Massachusetts Ave. Cambridge, MA 02139

gamarnik@mit.edu

John N. Tsitsiklis

MIT, LIDS 77 Massachusetts Ave. Cambridge, MA 02139

jnt@mit.edu

Martin Zubeldia

MIT, LIDS 77 Massachusetts Ave. Cambridge, MA 02139

zubeldia@mit.edu

ABSTRACT

We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate λN , with 0 < λ < 1, and are immediately dispatched to one of several queues associated with N identical servers with unit pro-cessing rate. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers. We study the fundamental resource requirements (mem-ory bits and message exchange rate), in order to drive the expected steady-state queueing delay of a typical job to zero, as N increases. We propose a certain policy and establish (using a fluid limit approach) that it drives the delay to zero when either (i) the message rate grows superlinearly with N , or (ii) the memory grows superlogarithmically with N . Moreover, we show that any policy that has a certain symmetry property, and for which neither condition (i) or (ii) holds, results in an expected queueing delay which is bounded away from zero.

Finally, using the fluid limit approach once more, we show that for any given α > 0 (no matter how small), if the policy only uses a linear message rate αN , the resulting asymp-totic (as N → ∞) expected queueing delay is positive but upper bounded, uniformly over all λ > 1. This is a sig-nificant improvement over the popular “power-of-d-choices” policy, which has a limiting expected delay that grows as log (1/(1 − λ)) when λ ↑ 1.

Categories and Subject Descriptors

G.3 [Probability and Statistics]: Queueing theory, Markov processes; C.2.1 [Network Architecture and Design]: Performance tradeoffs

Keywords

Distributed service, Dispatching policies, Performance trade-offs.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

SIGMETRICS ’16, June 14 - 18, 2016, Antibes Juan-Les-Pins, France

c

2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-4266-7/16/06. . . $15.00

DOI:http://dx.doi.org/10.1145/2896377.2901478

1. INTRODUCTION

This paper addresses the tradeoffs between various mea-sures of performance (delay, message rate and memory) in the context of large scale queueing systems. More specif-ically, we consider the supermarket model [11], a service system that involves a stream of incoming jobs that are to be dispatched to the queue associated with one of multiple servers. This setting is the dynamic generalization of the classic balls and bins model [2], where N balls have to be sequentially placed into one of N bins.

There is a variety of ways that this system can be oper-ated, which correspond to different system architectures and policies, with different delay performance. At one extreme, incoming jobs can be sent to a random queue. This policy has no informational requirements but incurs a substantial delay because it does not take advantage of resource pooling. At the other extreme, incoming jobs can be sent to a shortest queue, or to a server with the smallest workload. The latter policies have very good performance (small queueing delay), but rely on a substantial information exchange overhead.

Many intermediate policies have been explored in the liter-ature, which achieve different performance levels while using varying degrees of resources such as messages and memory (e.g., the power-of-d-choices [11, 15] or the PULL algorithm [14]). Such policies have been individually analyzed, and the merits of each have been pointed out. However, a compar-ison is not always possible, because different policies often involve different types of decision making architectures. A more detailed discussion of the relevant literature and the various schemes therein is deferred to Section 4.

The performance-resources tradeoffs have been analyzed in the context of balls into bins models. In particular, the tradeoff between number of messages and maximum load was recently obtained by Lenzen and Wattenhofer [8]. More-over, the tradeoff between the number of bits of memory available and the maximum load was studied in [1, 4]. To the best of our knowledge, the current study is the first to address these tradeoffs for a dynamic model.

1.1 Our contribution

Instead of focusing on yet another policy or decision mak-ing architecture, in this paper we step back and formulate a more fundamental problem. We consider a broad family of decision making architectures and policies, which includes most of those considered in the earlier literature, and work towards characterizing the attainable delay performance, for a given level of resources that are employed.

(3)

focus on the following qualitative criterion: does the aver-age queueing delay go to zero, as the system size increases? Regarding resources, we allow the dispatcher to have some limited memory, where it can store information on the state of the queues. We also allow the exchange of messages, ei-ther from the dispatcher to the servers (queries), and from the servers to the dispatcher; these latter messages can be either responses to queries or spontaneous status updates.

The main question that we study is the following: What is the message rate (total over both directions) and/or memory size requirement that are necessary and sufficient in order to achieve vanishing delay, as the system size grows to infin-ity? We are able to provide a fairly complete answer to this question. Specifically,

a) If the message rate is proportional to the arrival rate and the memory size (in bits) is logarithmic in the number of servers, then every decision making archi-tecture and policy, within a certain broad class of poli-cies, results in a delay that does not vanish as the system size increases. The main constraints that we impose on the policies that we consider are: (i) there is no queueing at the dispatcher, and (ii) policies are symmetric, in a sense to be defined.

b) If either the message rate is superlinear in the arrival rate or the memory size is superlogarithmic in the number of servers, we provide a fairly simple and nat-ural policy that results in a vanishing delay as the sys-tem size increases. Moreover, even if the message rate is proportional to the arrival rate and the memory is logarithmic in the number of servers, this same policy performs better than the alternatives considered ear-lier in the literature: among other properties, it has uniformly upper bounded queueing delay as λ ↑ 1. We analyze the proposed policy by considering a corre-sponding fluid limit. This results in expressions that provide a detailed characterization of the resulting queue lengths in the regime where delays do not vanish (case (b) above). Al-though we do not allow queueing at the dispatcher in our model, the negative result still holds even if we had a cen-tral queue with a finite buffer whose size is uniformly upper bounded independent of N . However, if the buffer size can grow with N , there exists a policy with a linear message rate and no memory that achieves vanishing queueing delay. Whether similar negative results can be established without our symmetry assumption is left as an open problem.

1.2 Outline of the paper

The rest of this paper is organized as follows. In Section 2 we introduce some notation. The model and the main results are presented in Section 3, where we also discuss the implications of the results and compare our policy to the so-called “power-of-d-choices” [11, 15]. In Section 4 we put our results in the context of some popular dispatching policies in the literature. In Sections 5, 6, and 7, we provide the proofs of the main results. Finally, in Section 8 we provide our conclusions and suggestions for future work.

2. NOTATION

We define the following sets: S ,ns ∈ [0, 1]Z+_{: s} 0= 1; si≥ si+1, ∀i ≥ 0 o , S1 , ( s ∈ S : ∞ X i=0 si< ∞ ) , QN, x ∈ [0, 1]Z+_{: x} i= Ki N , for some Ki∈ Z+, ∀ i . We define the weighted `2 norm || · ||w on RZ+by

||x − y||2w, ∞ X i=0 |xi− yi|2 2i .

We also define a partial order on RZ+

+ as follows:

x ≥ y ⇔ xi≥ yi ∀ i ≥ 0,

x > y ⇔ x ≥ y, x 6= y, and xi> yi, if i ≥ 1 and yi> 0.

For a given positive integer N , we define the set of server IDs N , {1, . . . , N }. We let ΠN be the set of probability

distributions over N , and let SN denote the group of

per-mutations over N . We can also define the action of SN over

ΠN by letting, for every σ ∈ SN and p ∈ ΠN,

σ(p)(σ(i)) = p(i), ∀i ∈ N .

In words, the probability of σ(i) under σ(p) is the same as the probability of i under p. Finally, we denote by P(N ) the power set of N and we further extend the action of SN to

ΠP(N ) (the set of probability distributions over the power

set P(N )) so that, for any σ ∈ SN and p ∈ ΠP(N ), and for

all r ∈ N ,

σ(p)({σ(n1), . . . , σ(nr)}) = p({n1, . . . , nr}),

for all {n1, . . . , nr} ∈ P(N ). This means that the

proba-bility of selecting the subset of servers {σ(n1), . . . , σ(nr)}

under σ(p) is the same as the probability of selecting the subset {n1, . . . , nr} under p.

3. MODEL AND MAIN RESULTS

In this section we present our main results. In Section 3.1 we describe the model and our assumptions. In Section 3.2 we introduce and study three different variants of a certain pull-based dispatching policy. We study their fluid limits and state the validity of fluid approximations for the tran-sient and the steady-state regimes. Then, in Section 3.3 we present a unified framework for a broad set of dispatching policies that includes most of the policies studied in previous literature, and present our negative result on the queueing delay for resource constrained policies.

3.1 Modeling assumptions

We consider a system consisting of N parallel servers, where each server has a processing rate equal to 1. Fur-thermore, each server is associated with an infinite capacity, FIFO queue. We use the convention that a job that is being served remains in the queue until its processing is completed. We assume that each server is work conserving: a server is idle if and only if the corresponding queue is empty. Jobs arrive to the system as a single Poisson process of rate λN (for some fixed λ < 1). Job sizes are i.i.d., independent from the arrival process, and exponentially distributed with mean 1. A central controller (dispatcher) is responsible for routing every incoming job to a suitable queue, immediately upon arrival. The dispatcher may have limited information on the state of the queues; it can only rely on a limited amount

(4)

of local memory and on messages that provide partial in-formation about the state of the system. These messages (which are assumed to be instantaneous) can be sent from the servers to the dispatcher at any time, or from the dis-patcher to the server (in the form of queries) at the time of an arrival. Messages from the servers can only contain information about the state of their queue (number of re-maining jobs and total rere-maining workload). Within this context, a system designer has the freedom to choose a mes-saging policy, the rules with which the memory is updated, and the dispatching policy. We are interested in the case where N is very large and where we have some constraints on the number of messages exchanged and on the memory size. The performance metric of interest is the expected de-lay in steady state of a typical job, i.e., the expected time between its arrival and the time at which it starts receiving service.

3.2 Our dispatching policy and its performance

In this section we introduce our policy, and study the properties of three of its variants.

3.2.1 Policy description

Let us fix N . We consider a policy whereby the dis-patcher’s memory is used to run a virtual queue that stores up to C(N ) server IDs (or tokens), so that the memory size is of order C(N ) log2(N ) bits. Since there are only N

dis-tinct servers, we will assume throughout the rest of the paper that C(N ) ≤ N . While a server is idle, it sends messages to the dispatcher as a Poisson process of rate µ(N ) to in-form or remind the dispatcher of its idleness. Whenever the dispatcher receives a message, it adds the ID of the server that sent the message to the virtual queue of tokens that is kept in the memory, unless this ID is already stored or the virtual queue is full, in which case it discards the new message. When a new job arrives, if there is at least one server ID in the virtual queue, the job goes to the queue of a server chosen uniformly at random from the virtual queue and the corresponding token is deleted. If there are no to-kens present, the job is sent to a queue chosen uniformly at random. This policy, called from now on the Resource Con-strained Pull-Based (RCPB) policy or Pull-Based policy for short, is depicted in Figure 1.

Nλ Dispatcher C(N ) Queue of tokens Jobs to empty queues Messages from idle servers

...

N servers

Figure 1: Resource Constrained Pull-Based policy. Jobs are sent to queues associated with idle servers, based on tokens in the virtual queue. If no tokens are present, a queue is chosen at random.

Note that under our policy, no messages are sent from the dispatcher to the servers, which is why it is called a “Pull-Based” policy [14]. We present a summary of our results for the three variants of our policy in Table 1, where we also introduce some mnemonic terms that we will use to refer to these variants. For our purposes, the most relevant subcase of the High Memory variant is when µ ≥ λ

1−λ, which achieves

zero asymptotic delay with superlogarithmic memory and linear message rate.

Variant Memory Idle message rate Delay

High Mem. C(N ) ∈ ω(1) µ(N ) = µ ≥ _1−λλ 0 µ(N ) = µ < λ

1−λ > 0

High Mess. C(N ) = C ≥ 1 µ(N ) ∈ ω(1) 0 Constr. C(N ) = C ≥ 1 µ(N ) = µ > 0 > 0

Table 1: The three variants (regimes) of our policy, and the resulting delays, in the limit as N → ∞.

3.2.2 System state representation

For a fixed N , we can model the system as a continuous time Markov chain using as state the queue length vector, denoted by QNi (t)

N

i=1 ∈ Z N

+, together with the number

of tokens, denoted by MN_{(t) ∈ {0, 1, . . . , C(N )}.}

Further-more, as the system is symmetric with respect to the queues, we can replace the queue length vector by the more conve-nient representation SN(t) = SiN(t) ∞ i=0, where SiN(t) , 1 N N X j=1 1[i,+∞) QNj(t) , i ∈ Z+,

is the fraction of queues with at least i jobs at time t.

3.2.3 Fluid model

We now introduce the fluid model of SN(t), associated with our policy.

Definition 1. (Fluid model) Given an initial condition s0_∈

S, a function s(t) : [0, ∞) → S is said to be a solution to the fluid model (or fluid solution) if

1. s(0) = s0.

2. For all t ≥ 0, s0(t) = 1.

3. For almost all t ∈ [0, ∞), and for every i ≥ 1, si(t) is

differentiable and satisfies ds1(t)

dt =λ[1 − s1(t)P0(t)] − [s1(t) − s2(t)], (1) dsi(t)

dt =λ[si−1(t) − si(t)]P0(t) − [si(t) − si+1(t)], (2) for all i ≥ 2, where P0(t) is given, for the three variants

considered, by:

(i) High Memory: P0(t) =

h

1 −µ[1−s1(t)]

λ

i+ ; (ii) High Message: P0(t) =

h 1 −1−s2(t) λ i+ 1{s1(t)=1}; (iii) Constrained: P0(t) = C P k=0 µ[1−s1(t)] λ k−1 .

(5)

Note that the fluid model does not include a variable as-sociated with the number of tokens. This is because, as we will see, the virtual queue process MN(t) evolves on a much faster time scale than the processes of the queue lengths and does not have a deterministic limit. We thus have a process with two different time scales: on the one hand, the virtual queue evolves on a fast time scale (at least N times faster) and from its perspective the queue process SN_{(t) appears}

static; on the other hand, the queue process SN(t) evolves on a slower time scale; from its perspective, the virtual queue appears to be at equilibrium. This latter property is mani-fested in the drift of the fluid model: P0(t) is the probability

that the virtual queue is empty when the rest of the system is fixed at the state s(t) (and is given by a different expres-sion for each variant).

We now provide some intuition for each of the drift terms in Equations (1) and (2).

(i) λ[1 − P0(t)]: This term corresponds to arrivals to an

empty queue while there are tokens in the virtual queue, taking into account that the virtual queue is nonempty with probability 1 − P0(t), in the limit.

(ii) λ[si−1(t) − si(t)]P0(t): This term corresponds to

ar-rivals to a queue with exactly i − 1 jobs while there are no tokens in the virtual queue. This happens when the virtual queue is empty and when a queue with i − 1 jobs is drawn uniformly at random, which hap-pens with probability P0(t)[si−1(t) − si(t)].

(iii) −[si(t)−si+1(t)]: This term corresponds to departures

from queues with exactly i jobs, which occur at a rate equal to the proportion si(t) − si+1(t) of servers with

exactly i jobs.

(iv) Finally, the expressions for P0(t) are obtained through

an explicit calculation of the steady-state distribution of MN(t) when SN(t) is fixed at the value s(t), while also letting N → ∞.

3.2.4 Properties of the fluid solutions

The existence of fluid solutions is established by show-ing that the limit of every convergent subsequence of sam-ple paths (SN(t)) is a fluid solution. Moreover, the follow-ing theorem states their uniqueness for all initial conditions s0 _{∈ S, and it also characterizes the (unique) equilibrium}

of the fluid model and its asymptotic stability for all ini-tial conditions s0 ∈ S1 _{⊂ S. The variants described below}

correspond to assumptions on memory and message rates described in the 2nd and 3rd columns of Table 1, respec-tively.

Theorem 1. (Uniqueness and stability of fluid so-lutions) A fluid solution, as described in Definition 1, exists and is unique for any initial condition s0_{∈ S. Furthermore,}

within the set S1, the fluid model has a unique equilibrium s∗, given by s∗i = λ[λP

∗

0]i−1, for all i ≥ 1, where P ∗ 0 is given

for the three variants considered by: (i) High Memory: P0∗=

h

1 −µ(1−λ)_λ i+; (ii) High Message: P0∗= 0;

(iii) Constrained: P0∗= _C P k=0 µ(1−λ) λ k−1 .

This equilibrium is asymptotically stable, i.e., lim

t→∞ks(t) − s ∗

k_w= 0, for all initial conditions s0∈ S1

.

The proof for uniqueness and stability is given in Section 5. The proof of the existence is omitted due to space con-straints.

3.2.5 Approximation theorems

The three results in this section justify the use of the fluid model as an approximation to the finite stochastic system. The first theorem states that the evolution of the process SN(t) is almost surely uniformly close, over any finite time horizon [0, T ], to the unique fluid solution s(t).

Theorem 2. (Fluid limit) Fix T > 0. If lim N →∞ S N (0) − s0 w = 0, a.s., for some s0_{∈ S, then}

lim N →∞_0≤t≤Tsup S N (t) − s(t) w= 0, a.s.,

where s(t) is the fluid solution with initial condition s0. The proof of this theorem is omitted due to space con-straints.

If we combine Theorems 2 and 1 we have that the finite system SN_{(t) can be approximated by the equilibrium of the}

fluid model s∗after some time, because SN(t)−−−−→ s(t)N →∞ −−−→ st→∞ ∗,

almost surely. If we interchange the order of the limits over N and t, we obtain the limiting behavior of the invariant distributions πsNof the finite systems as N increases. In the

next proposition and theorem, it is shown that the result is the same, i.e., that

SN(t)−−−→ πt→∞ N s

N →∞

−−−−→ s∗,

in distribution, so that the interchange of limits is justified. We start by showing that the stochastic process for any fixed N is positive recurrent.

Proposition 3. (Positive recurrence for finite N ) For any fixed N , the stochastic process SN_{(t), M}N_(t)

is positive recurrent and therefore has a unique invariant dis-tribution πN.

Proof. In this case it is advantageous to work with the number of jobs in the queues QNi (t)

N

i=1 instead of with

SN_{(t). Then the positive recurrence follows by using the}

simple quadratic Lyapunov function Φ(Q, m) , 1 N N X i=1 Q2i. (3)

We then show that the sequence of invariant measures converges in distribution to the Dirac measure concentrated on the equilibrium of the fluid model.

(6)

Theorem 4. (Interchange of limits) For any N , let πNbe the unique invariant distribution of the stochastic pro-cess SN_{(t), M}N_(t) and let πN s (·) = C(N ) P m=0 πN_{(·, m) be the}

marginal for SN_{. Then,}

lim

N →∞π N

s = δs∗, in distribution.

The proof is given in Section 6.

Putting everything together, we conclude that the fluid model is a very accurate approximation for the stochastic system when N is very large for the both transient regime (Theorems 2 and 1) and for the steady state regime (Theo-rem 4).

3.2.6 Delay

Having shown that we can approximate the stochastic sys-tem by its fluid model for large N , we can analyze the per-formance of the equilibrium of the fluid model and thus ap-proximate the expected delay implied by our policy.

For the N -th system, we define the average delay (more precisely, the waiting time) of a job, in steady state, gener-ically denoted by EWN_{, as the mean time that the job}

spends in queue until its service starts. Then by Little’s law, we have that

E h WN i = 1 λNE "∞ X i=1 N SiN # − 1,

and taking limits as N → ∞ we obtain the limiting queueing delay as E[W ] , lim N →∞E h WNi= 1 λ ∞ X i=1 s∗i ! − 1,

where the interchange of the limit in N , and the expecta-tion and the infinite sum can be justified by applying the Dominated Convergence Theorem using the geometric up-per bound from Lemma 10.

High Memory and High Message variants.

For the High Memory variant with µ ≥ λ

1−λ, as well as

for the High Message variant, the equilibrium is s∗1 = λ, and

s∗i = 0 for all i ≥ 2, which yields the claimed zero queueing

delay (cf. Table 1).

Constrained variant.

For the Constrained variant, even though we obtain a pos-itive delay, our Pull-Based policy has some remarkable prop-erties. First of all, as discussed above, the delay is

E[W ] = 1 λ ∞ X i=1 s∗i − 1 = ∞ X i=1 (λP0∗)i−1− 1 = λP0∗ 1 − λP∗ 0 . (4) Suppose that the message rate of each idle server is µ = µ0_1−λλ for some constant µ0 > 0. Since a server is idle (on average) a fraction 1 − λ of the time, the resulting aver-age messaver-age rate at each server is λµ0. We can rewrite the equilibrium probability as P0∗= " _C X k=0 µ0k #−1 .

Substituting this in Equation (4) we can obtain an upper bound (uniform in λ < 1) for the delay in steady state

E[W ] ≤ " _C X k=1 µ0k #−1 ∀ λ < 1. (5) This implies that if µ0 > 1, the queueing delay decreases exponentially fast with the maximum number of tokens C. More importantly, for a fixed µ0> 0 and C, the delay is in fact bounded as λ ↑ 1. Finally, the delay is monotonically decreasing with µ0and C.

Phase transition of heavy traffic delay.

We have a phase transition between µ0 = 0 (which cor-responds to a random uniform routing) and µ0 > 0. In the first case, we have the usual unbounded heavy traffic delay of the M/M/1 queue. However, when µ0> 0 the delay is up-per bounded uniformly in λ, as shown in Equation (5). This highlights the importance of sending messages while idle and the fact that an intelligent choice of the message budget is to set µ = µ0 λ

1−λ. Note that this choice still results in a

con-stant (λµ0) average message rate at each server, uniformly upper bounded by µ0 for all N and all λ < 1. This is a key qualitative improvement over all other policies in the liter-ature that have a non-vanishing delay; see the discussion below and in Section 4.

Comparison with the power-of-

d

-choices.

The expected steady-state delay for the power-of-d-choices policy was shown in [11, 15] to be

E[WPod] = ∞

X

i=1

λdi −dd−1 _{− 1 ≥ λ}d_,

which means that the delay decreases at best exponentially with d, much like the delay decreases exponentially with C in our scheme. However, increasing d increases the number of messages sent, unlike our policy.

Furthermore, the delay in heavy traffic for the power-of-d-choices is shown in [11] to grow as log 1

1−λ

as λ ↑ 1 for any fixed d. In contrast, the delay of our scheme has a constant upper bound, independent of λ.

In conclusion, our Pull-Based policy provides better ex-pected delay (especially in heavy traffic) using fewer mes-sages: the power-of-d-choices policy requires 2λdN messages per unit of time, while our policy requires µ0λN messages per unit of time and C log2(N ) bits of memory.

3.3 Delay lower bound for resource constrained

policies

In this subsection we present a negative result that shows that resource constraints on the message rate and on the size of the memory result in asymptotically non-vanishing delays, for a broad class of symmetric policies.

3.3.1 Unified framework for policies

In order to provide a negative result for a class of poli-cies, we need to be very precise on the family of policies considered, including details on the way that the memory is operated, and the precise meaning of “symmetry”. We as-sume that in order to decide the destination of each arriving job, the dispatcher relies entirely on queries to a subset of servers regarding the state of their queues, and on a memory

(7)

with CN bits. The memory is updated only when either (i)

the dispatcher receives a message from a server, or (ii) the dispatcher has just dispatched a job.

We define the set of memory states as M ,1, . . . , 2CN .

Furthermore, we define the set of possible states at a given queue as the set of nonnegative sequences Q , RZ+

+ , where a

sequence specifies the remaining workload of each job in the queue, including the one that is being served. As long as a queue has a finite number of jobs, then the queue state is a sequence with a finite number of non-zero entries. We define next a broad class of memory-based dispatching policies.

Definition 2. (Memory-based dispatching policy) For a fixed N , a memory-based dispatching policy ηN has four components:

1. (Process of messages from servers) We have a bounded rate function µN

: Q → R+ which defines a set of

modulated Poisson1 counting processes n

YnN(t)

oN

n=1

.

The rate of YnN(·) at each time t depends on the

cur-rent queue state QN

n(t) and equals µN(QNn(t)).

Be-sides, the functions are uniformly upper bounded, i.e., µN(q) ≤ µN(0) for all q ∈ Q.

2. (Memory update due to a new message from a server) Given a memory state m ∈ M, whenever there is an arrival of a message from server n ∈ N , containing the current queue state q ∈ Q, the memory is updated according to a function

gNpull: M × N × Q → M.

3. (Server sampling and dispatching rule) Given a mem-ory state m ∈ M, whenever a new job arrives to the system, the dispatcher sends queries to a subset of the servers, chosen according to the distribution specified by a function

νN : M → ΠP(N ).

Then, given the set of pairs {(n1, qn1), . . . , (nr, qnr)}

of sampled subset of servers and the state of the corre-sponding queues, the new arrival is dispatched accord-ing to a probability distribution specified by a function

fN : M × B → ΠN,

where B is the subset of P(N × Q) where each element of N appears only once.

4. (Memory update due to a dispatched job) Immediately after a new job is dispatched to some queue, the mem-ory is updated according to a function

gdN: M × B × N → M,

where the last argument of the function gN

d is the

iden-tity of the server to which the job was dispatched.

1_{Although we will use Poisson processes for the analysis, our}

negative result can also be established even if the messages are more general renewal processes (e.g., with deterministic inter-arrival times).

Remark 1. Note that we only consider the memory used to store information between one arrival or message to the next. When counting the memory resources used by a policy, we do not take into account information that is used in zero time (e.g., the responses from the queries at the time of an arrival), or the memory required to evaluate the various functions that describe the policy. If that additional memory was accounted for, then the memory constraints would be more severe, and therefore our negative result would still hold.

The functions gNpull, ν N

, fN, and gNd define a Markov

pro-cess over the state space QN×M, as follows. First of all, the states of the queues progress naturally through time as jobs get processed. Then, there are two sources of events: the exogenous arrivals and the messages that servers send to the dispatcher according to the processesYN

n (·)

N

n=1. When a

message from server n with queue state q arrives to the dis-patcher, the memory is updated according to gN

pull(·, n, q).

When we have an arrival of a new job, there are three steps that occur sequentially but instantaneously: first, an ele-ment of the power set P(N ) (i.e., a subset of N ) is selected according to νN _{and the current memory state; second, the}

job is sent to a queue according to the distribution obtained by fN evaluated on the actual memory state, sampled sub-set of servers, and actual states of the queues associated with them; and third, the memory is updated once more accord-ing to gNd, taking into account the destination of the last

job, as well as the IDs and the queue states of the sampled servers.

Remark 2. Here we are considering policies that send mes-sages to the servers only at the time of an arrival and in one round of communication. This eliminates the possibil-ity of having policies that sequentially poll the servers uni-formly at random until they find an idle one. The reason be-hind this restriction is that, in practice, these queries involve some processing and travel time . Were we to consider a model with > 0, the sequential polling policies would suffer greatly and would incur positive delay. In contrast our pull-based policy would only have to work with slightly delayed information, but the asymptotic vanishing result would still hold, as long as is small enough.

Remark 3. We could expand the set of policies by allow-ing the servers to also send messages whenever there is a service completion. This would result in some additional notation. However, the same negative result can be proved for this somewhat enlarged class of policies as well.

We now introduce a symmetry assumption on the policies. Definition 3. (Symmetric policy) The memory-based dis-patching policy ηN _{is symmetric if for every permutation}

of the servers σ ∈ SN (and associated permutations, also

denoted by σ, over the sets P(N ), ΠN, and ΠP(N )), there

exists some permutation σM of the memory states such that

the following properties hold.

1. (Memory update due to new message from a server) gpullN (σM(m), σ(n), q) = σM gpullN (m, n, q) , for all (m, n, q) ∈ M × N × Q.

(8)

2. (Server sampling and dispatching rule) νN(σM(m)) = σ

νN(m), for all m ∈ M, and

fN(σM(m),{(σ(n1), q1), . . . , (σ(nr), qr)}) =

σfN(m, {(n1, q1), . . . , (nr, qr)})

, for all (m, {(n1, q1), . . . , (nr, qr)}) ∈ M × B.

3. (Memory update due to a dispatched job) gNd(σM(m), {(σ(n1), q1), . . . , (σ(nr), qr)}, σ(n)) = σM gNd(m, {(n1, q1), . . . , (nr, qr)}, n) , for all (m, {(n1, q1), . . . , (nr, qr)}, n) ∈ M × B × N .

3.3.2 Lower bound for queueing delay

We are now ready to state the main result of this section. Theorem 5. Fix C, α > 0. For any symmetric policy ηN with CN = C log2(N ) bits of memory, with a

time-average message rate bounded by αN almost surely, and with a unique steady state distribution, we have that the expected queueing delay is uniformly lower bounded, i.e.,

E h

WNi≥ b(C, α) > 0,

where EWN_{is the expected queueing delay in steady state,}

and b(C, α) is a constant that only depends on C and α. A detailed proof is given in Section 7.

4. COMPARISON WITH OTHER POLICIES

IN THE LITERATURE

In this section we put our results in perspective by review-ing various dispatchreview-ing policies in the literature, focusreview-ing on their queueing delay and on the amount of resources they use.

4.1 Open-loop policies (without messages)

An optimal open-loop policy dispatches arriving jobs to the queues in a round robin fashion [13]. This policy does not require messages but needs log2(N ) bits of memory to

store the ID of the next queue to receive a job. In the limit, each queue behaves similar to an D/M/1 queue, which still has a Ω 1

1−λ

delay (see [13]). This policy is not symmetric.

4.2 Policies based on queue lengths

4.2.1 Join the shortest queue (SQ)

If we only have access to the queue lengths, an optimal policy is to have each incoming job join a shortest queue. This policy is symmetric, achieves vanishing delay, and re-quires no memory, but uses 2λN2 messages per unit of time (N queries and N responses for each arrival, with rate λN ). Alternatively, joining a queue with the least remaining work-load performs just as well.

4.2.2 SQ(

d

) with memory (SQ(

d

,

b

))

An improvement over the power-of-d-choices, proposed by Mitzenmacher et al. in [12], is obtained by using extra mem-ory to store the b least loaded queues sampled at the time of the previous arrival. When a new job arrives, d queues are sampled uniformly at random and the job is sent to a least loaded queue among the d sampled and the b stored queues. This policy is symmetric, needs 2λdN messages per unit of time and Ω(b log2(N )) bits of memory, and has positive

de-lay as expected. It compares unfavorably to our policy for the same reasons as for the power-of-d-choices.

4.2.3 SQ(

d

) for divisible jobs

More recently, Ying et al. [16] considered the case of having an arrival rate of N λ

m(N ) (with m(N ) ∈ ω(1) and

m(N ) ∈ o(N )), where each job can be divided into m(N ) tasks with mean size 1. The dispatcher samples dm(N ) queues and does a water-filling of those queues with the m(N ) tasks. In this case, the number of messages sent per unit of time is 2λdN and it needs no memory.

Even though this was not mentioned in [16], this policy can drive the queueing delay to 0 if d ≥ _1−λ1 . This policy does not fall into our framework because it works with di-visible jobs. We could get a very similar policy by queueing m(N ) jobs in the dispatcher and dispatching them all at the same time, but this needs an unbounded buffer at the dispatcher.

4.3 Pull-based policies

In order to reduce the message rate, Badonnel and Burgess [3], Lu et al. [10], and then Stolyar [14] proposed that mes-sages be sent from a server to the dispatcher whenever the server becomes idle, so that the dispatcher can keep track of the set of idle servers in real time. Then, an arriving job is to be sent to an empty queue (if there is one) or to a queue chosen uniformly at random if all queues are non-empty. This symmetric policy requires at most λN messages per unit of time and exactly N bits of memory (one bit for each server, indicating whether it is empty or not). Stolyar [14] has shown that when N goes to infinity, the expected delay vanishes.

These pull-based policies are very efficient and they were the inspiration for our own policy. In fact, the High Memory variant of our own policy can be viewed as a more memory-efficient version of Stolyar’s PULL policy [14] where instead of requiring N bits of memory, we achieve the desired van-ishing delay with any superlogarithmic memory size.

4.4 Memory, messages, and queueing delay

We now summarize the resource requirements (memory and message rate) and the asymptotic delay of the policies reviewed in this section that fall within our framework, to-gether with our three variants.

Policy Memory (bits) Message rate Delay

RRobin [13] log2(N ) 0 > 0

JSQ 0 2λN2 0

JSQ(d) [11] 0 2dλN > 0

JSQ(d, b) [12] Ω(b log₂(N )) 2dλN > 0

Pull-based [14] N λN 0

High Memory ω(log₂(N )) λN 0

High Message C log2(N ) ω(N ) 0

(9)

Note that there are several policies that achieve 0 queueing delay in the limit, but (as expected) all of them fall into one (or both) of two categories:

a) Those that need ω(N ) messages. For example: JSQ and JLW require Θ(N2_{) messages per unit of time.}

b) Those that need ω(log2(N )) bits of memory. For

exam-ple, the PULL policy [14] requires N bits of memory. This is consistent with our results: the only way to achieve vanishing delay is by having a superlogarithmic memory or a superlinear message rate. Alternatively, we could obtain a vanishing delay with a linear message rate and an un-bounded central buffer [16].

5. PROPERTIES OF THE FLUID MODEL

We divide the proof of Theorem 1 into Lemmas 6 and 7, and Proposition 9 in which we establish uniqueness of solutions, uniqueness of an equilibrium, and asymptotic sta-bility, respectively. The proof of existence is omitted due to space constraints.

5.1 Uniqueness of solutions

Lemma 6. A fluid solution, as described in Definition 1, with initial condition s0∈ S, is unique.

Proof. The drift in the High Memory and Constrained variants is always Lipschitz-continuous, so the uniqueness of solutions is guaranteed by the Picard-Lindel¨off theorem for infinite dimensional spaces [9]. On the other hand, the drift in the High Message variant is only continuous “almost ev-erywhere” so we cannot guarantee the uniqueness of classical (differentiable) solutions. However, we can show the unique-ness of the possibly non-differential solutions of Definition 1. Uniqueness issues only appear when the trajectories hit the closure of the set where the drift is not Lipschitz-continuous, which in this case is the closure of

D = {s ∈ S : s1= 1 and s2> 1 − λ} .

The proof of the uniqueness for the High Message variant consists in concatenating up to three unique solutions, and has three steps:

1. If s0∈ S\D, then we have uniqueness up to the point in which the solution hits D.

2. If s0 ∈ D, we show that the solution stays in D until we have s2 = 1 − λ, implying that it can only escape

D through D\D. This is because ˙s1> 0 if s2> 1 − λ.

Then, the uniqueness of the solution up to this point is guaranteed by the fact that the drift restricted to D is also Lipschitz-continuous.

3. Last, we show that once that it escapes D, it cannot go back in again, thus guaranteeing the uniqueness of the whole solution.

We omit the details due to space constraints.

5.2 Existence and uniqueness of an

equilib-rium

Lemma 7. The fluid model has the unique equilibrium s∗∈ S1 _{defined in Theorem 1.}

Proof. Suppose that s∗ ∈ S1 is an equilibrium. Since s∗∈ S1

, we haveP∞

i=1s ∗

i < ∞ and we can sum the

equilib-rium conditions over all coordinates j ≥ i for all i ≥ 0, and get the result.

5.3 Asymptotic stability of the equilibrium

We will establish asymptotic stability by sandwiching a solution between two solutions that converge to s∗, similar to the argument in [15]. For this purpose, we first establish a monotonicity result.

Lemma 8. Suppose that s1_{and s}2_{are two fluid solutions}

with s1(0) ≥ s2(0). Then s1(t) ≥ s2(t), for all t ≥ 0. The proof of this Lemma consists of checking the drift terms of the fluid model. It is straightforward and is omitted. Now we are ready to prove the convergence result.

Proposition 9. The equilibrium s∗of the fluid model is asymptotically stable for all initial conditions s0 ∈ S1

, i.e., lim

t→∞ks(t) − s ∗

k_w= 0.

Proof. Fix some s0 ∈ S1. We define initial conditions su(0) and sl(0) by letting sui(0) = max {si(0), s ∗ i} , sli(0) = min {si(0), s∗i} . We then have su_{(0) ≥ s}0 _{≥ s}l_{(0), s}u_{(0) ≥ s}∗ ≥ sl_{(0), and}

su(0), sl(0) ∈ S1. Due to Lemma 8 we obtain that su(t) ≥ s(t) ≥ sl(t) and su(t) ≥ s∗≥ sl

(t), for all t ≥ 0. Thus it suf-fices to prove that ksu_{(t) − s}∗

k_wand

sl_{(t) − s}∗

wconverge

to 0 as t → ∞.

At this point it is convenient work with the representation vu(t), vl(t), and v∗, where vi,

∞

P

j=i

sj. Then, the dynamics

of vu_{(t) are of the form}

dv1u(t) dt =λ − s u i(t) = s ∗ 1− s u i(t) dviu(t) dt =λs u i−1(t)P u 0(t) − s u i(t) ∀ i ≥ 2 =λ[sui−1(t)P u 0(t) − s ∗ i−1P ∗ 0] − [s u i(t) − s ∗ i],

where in the last step we subtracted the equilibrium condi-tion 0 = λs∗i−1P0∗− s∗i from the right-hand side.

Now we will prove the coordinate-wise convergence by in-duction on i. Note that, from the definition of vi, we have

vu1(t) ≥ viu(t). Furthermore, we have s u 1(t) ≥ s∗1, so that dv1(t) dt ≤ 0. It follows that v u 1(0) ≥ vui(t) − v u i(0) ≥ −v u i(0)

uniformly in t. Then, for the base case, we have 0 ≤ ∞ Z 0 (su1(τ ) − s ∗ 1) dτ ≤ v u 1(0) < ∞,

which, together with the Lipschitz-continuity of s1, implies

(su 1(τ ) − s

∗

1) → 0 as τ → ∞. Now assume that we have ∞

R

0

(suk(τ ) − s ∗

k) dτ < ∞ for all non-negative integers k ≤ i−1.

Then, −vui(0) ≤ t Z 0 (λ [sui−1(τ )P u 0(τ ) − s ∗ i−1P ∗ 0] − [s u i(τ ) − s ∗ i]) dτ.

(10)

Adding and subtracting λs∗i−1P0u(τ ) inside the integral, we

can check that the former is less than or equal to

t Z 0 ([sui−1(τ ) − s ∗ i−1] + [P0u(τ ) − P ∗ 0] − [sui(τ ) − s ∗ i]) dτ, (6)

where the first term is uniformly upper-bounded in t by the inductive hypothesis, and it can be checked that the second one is also uniformly upper bounded for all variants using the base case. This completes the induction. It is easily checked that this coordinate-wise convergence implies convergence in k · kw, which concludes the proof.

Note that we proved asymptotic stability only for initial conditions s0 ∈ S1 _{⊂ S, which correspond to the initial}

conditions with finite expected queue lengths.

6. INTERCHANGE OF LIMITS

We will prove Theorem 4 in this section, which shows that the sequence of the marginals of the invariant distributions πN

s

∞

N =1 converges in distribution to a measure

concen-trated on the unique equilibrium (in S1) of the fluid model. This, together with Proposition 3, guarantees that the prop-erties derived from the equilibrium of the fluid model s∗, specifically for the queueing delay, are an accurate approx-imation for the steady state of the stochastic system for N large enough. First, we will find a uniform upper bound for EπNV1N in the following lemma.

Lemma 10. Let πN _{be the unique invariant distribution}

of our process, and let V1N = _N1 PNi=1Q N

i . We then have

the uniform upper bounds πNQN1 ≥ k ≤ 1 2 − λ k₂ , ∀ N, ∀ k, and EπN h V1N i ≤ 1 + 2 1 − λ, ∀ N. Proof. Consider the linear Lyapunov function

Ψ(Q, m) = Q1.

The bounded upward jumps allows us to use Theorem 2.3 from [7] to obtain that EπN[Q1] < ∞. Thus, all conditions

in Theorem 1 in [5] are satisfied, yielding the upper bounds πNQN1 ≥ 1 + 2m ≤ 1 2 − λ m+1 , and EπN h QN1 i ≤ 1 + 2 1 − λ.

The first part of the result is obtained by letting m = (k − 1)/2 if k is odd or m = k/2 − 1 if k is even. Finally, using the definition of VN

1 and then symmetry, we have

E h V1N i = 1 N N X i=1 E [Qi] = E [Q1] ,

which concludes the proof.

Now we prove the interchange of limits.

Proof of Theorem 4. Consider the set Z+∪ {∞}

en-dowed with the topology of Alexandroff’s compactification, which is known to be metrizable. Moreover, the topology defined by the norm k · kw on [0, 1]Z+ is equivalent to the

product topology, which makes it compact. Then, S (cf. Section 2 for its definition) is also compact because it is a closed subset of [0, 1]Z+_{. As a result, we have that the}

prod-uct S × (Z+∪ {∞}) is a compact Polish space under the

product topology. Then,πN ∞

N =1is trivially tight and by

Prohorov’s theorem [6], it is also relatively compact in the weak topology, and any sequence has a weakly convergent subsequence.

Given a weakly convergent subsequenceπNk ∞

k=1and its

limit π, we can use Skorokhod’s representation theorem to construct a probability space (Ω0, A0, P0) and a sequence of

random variables SNk_{(0), M}Nk₍₀₎ distributed according to πNk_{, such that} lim k→∞ S Nk_{(0) − s(0)} w= 0 P0− a.s. (7)

for some random variable s(0) ∼ πs, where πsis the marginal

of π. Furthermore, we use SNk_{(0), M}Nk(0) as the initial

conditions for a sequence of processes SNk_{(t), M}Nk_(t) ∞

k=1,

which makes these processes stationary. Note that the ini-tial conditions, distributed as πNk_{, do not necessarily}

con-verge to a deterministic initial condition (this is actually part of what we are trying to prove), so we cannot use The-orem 2 directly to find the limit of the sequence of processes SNk_(t) ∞

k=1. However, given any ω ∈ Ω0 outside a zero

P0-measure set, we can restrict this sequence of stochastic

processes to the probability space (Ωω, Aω, Pω) =

(ΩD× ΩS× {ω}, AD× AS× {∅, {ω}}, PD× PS× δω)

and apply Theorem 2 to this new space, to obtain lim k→∞_0≤t≤Tsup S Nk_{(t, ω) − s(t, ω)} w = 0, Pω− a.s.,

where s(t, ω) is the fluid solution with initial condition s(0, ω). Since this is true for all ω ∈ Ω0 except for a set of zero P0

-measure, it follows that lim k→∞0≤t≤Tsup S Nk_{(t) − s(t)} w = 0, P − a.s., where P = PD× PS× P0and where s(t) is another stochastic

process whose randomness is only in the initial condition (its trajectory is the deterministic fluid solution for that specific initial condition).

The uniform bound on EπN[V1N] (Lemma 10) and the

con-vergence of πN to π imply that Eπ[v1] ≤ 1 + 2/(1 − λ) and

therefore v1 < ∞ a.s., or equivalently, πs S1 = 1. This

means that every possible initial condition is in S1_{, almost}

surely (and not in S\S1_{). Now suppose that}_SNk₍₀₎ ∞

k=1

does not converge P0-a.s. to s∗(i.e., that πs6= δs∗_{). Then,}

the set of processes with initial conditions in S1\{s∗} has probability πs S1\{s∗}

> 0 and for every such process s(t, ω) with initial condition s(0, ω) we have

ks(0, ω) − s∗k_w> ks(t, ω) − s∗k_w

for t large enough (this is due to the fact that these sample paths converge asymptotically to s∗, by Theorem 1). As a result, Eπsks(0) − s ∗ k_w > Eπsks(t) − s ∗ k_w

(11)

for t large enough, which is a contradiction, because SNk

is stationary and thus s also has to be stationary. Then, the limit of every convergent subsequence of the marginals πN

s

∞

N =1has to be δs∗.

7. PROOF OF THE NEGATIVE RESULT

We now prove our negative result for resource constrained policies. The proof proceeds through a sequence of lemmas. Due to space constraints, we will only prove it for the case in which the distribution for the sampling of servers, νN_(m),

is independent of m. The general case can be proven with a similar line of reasoning.

We first show that for any fixed time t, we can have any given finite number of arrivals before the next message or departure, with probability bounded away from zero, uni-formly for all N .

Lemma 11. Let ηN _{be a memory-based dispatching}

pol-icy with message rate of at most αN for some α > 0. If TtN

is the (random) time of the first departure or message from a server after some fixed time t > 0, then for every k ∈ Z+ we

have that P k arrivals in t, TtN ≥ P (α, λ, k) > 0, where

P (α, λ, k) is a constant independent from N .

Proof. Because of the upper bound for the message rate of any server (cf. part 1 of Definition 2), we have that µN QNn(t)

≤ µN

(0) for all t ≥ 0. In any interval [0, t] we must have at most as many departures as arrivals to the system. Therefore, the time-average number of departures per unit of time is at most λN and thus we will have at least (1 − λ)N idle servers on the average. As a result, µN(0) has to be uniformly upper bounded by α

1−λ in order to have at

most αN messages per unit of time, in average.

Starting at any time t > 0, we have k arrivals after an Erlang(k,λN ) time. Moreover, we will have a departure no sooner than after an exponential time of rate N (which cor-responds to the worst case in which all servers are busy), and we will have a message no sooner than after an expo-nential time of rate N α

1−λ (which corresponds to the worst

case in which all servers are idle). Then, we get the uniform bound P k arrivals in t, TtN ≥ P Erl(k, N λ) < exp N α 1 − λ+ N ≥ P Erl(k, λ) < exp α 1 − λ+ 1 , which is a positive number, independent of N .

We now state a simple lemma regarding the number of probability distributions over N . It implies that unless many servers have the same probability, a distribution will have many distinct permutations.

Lemma 12. If p ∈ ΠN has at most u servers with the

same probability of being selected, then #{σ(p) : σ ∈ SN} ≥ max ( N u ! , dN/ue! ) . The proof consists of a simple counting argument and is thus omitted.

The next step is to characterize the probability distribu-tions on the set of servers induced by the dispatching rule function fN when we take CN to be logarithmic in N .

Lemma 13. Consider a symmetric dispatching policy, with CN= C log2(N ), and dispatching rule function f

N_{, and let}

r be a positive integer. Then for every m ∈ M, for every subset {n1, . . . , nr} ∈ P(N ), and for every (qn1, . . . , qnr) ∈

Qr_{, there exist at least N −C −r servers with the same}

proba-bility of being selected under fN(m, {(n1, q1), . . . , (nr, qr)}).

Proof. Fix (q1, . . . , qr) ∈ Qr. First, note that there are N

r different subsets {n1, . . . , nr}, and N

C_{different memory}

states m. Then, with (q1, . . . , qr) ∈ Qr fixed, the argument

of fN can only take N_rNC

different values, that lead to at most N_rNC

different distributions. Let p ∈ ΠN be one of

those distributions, i.e., p = fN(m, {(n1, q1), . . . , (nr, qr)})

for some m ∈ M and {n1, . . . , nr} ∈ P(N ). Then, σ(p) =

fN_(σ

M(m), {(σ(n1), q1), . . . , (σ(nr), qr)}) is also a possible

distribution due to the symmetry hypothesis. As a result, #{σ(p) : σ ∈ SN} ≤

N r

!

NC, (8)

which is the upper bound for the total number of different distributions implied by the limited number of different ar-guments for fN.

On the other hand, if p has at most u servers with the same probability of being selected, Lemma 12 provides the lower bound #{σ(p) : σ ∈ SN} ≥ max ( N u ! , dN/ue! ) , which combined with Equation (8) yields

max ( N u ! , dN/ue! ) ≤ N r ! NC. (9) Now, if C + r < u < N − C − r, then N_u > N C+r+1 > N rN C

for N large enough, which contradicts Equation (9). If u ≤ C + r, then dN/ue! ≥ dN/(C + r)e! > N_rNC

for N large enough, which also contradicts Equation (9). Thus u ≥ N − C − r.

We are now ready to prove our main result.

Proof of Theorem 5. Consider the system in steady state, fix some time T > 0 and let CN(T ) be the event that there are at least C + 1 arrivals before the next message or departure. This event has probability bounded away from zero due to Lemma 11, and it is an event that depends only on the Poisson processes of arrivals, departures, and mes-sages. We will show that, for N large enough, a policy will send at least one of those C + 1 jobs to a non empty queue, with probability bounded away from zero.

Without loss of generality, we may assume that the ex-pected delay is finite. This implies that there is no accumu-lation of jobs in the system, and that the expected number of busy servers is λN (because our system is in steady state). Then, by the Markov inequality, we have that the probabil-ity of QN(T ) having at least N (λ − ) busy servers is greater than /(1 − λ + ), for all ∈ (0, λ). Furthermore, since the message rate is at most αN , and since the number of sam-pled queues is independent from the arrivals (only depends

(12)

on νN_{), it follows that the expected number of queues}

sam-pled at the time of an arrival is less than or equal to α/λ. Therefore, using the Markov inequality once more, we get that the probability of sampling less than β servers is lower bounded by 1−α/(λβ), which is positive for β > α/λ. Then, by the independence of the sampling, the probability that the number of sampled servers for each of the first C + 1 arrivals after time T are fewer than β, is lower bounded by (1 − α/(λβ))C+1. Moreover, due to the symmetry hypothe-sis, the sampling distribution νN is uniform over all subsets of the same size. This implies that we have a further sub-set of sample paths C0(T ) ⊂ C(T ), with probability bounded away from zero uniformly in N , such that for the first C + 1 arrivals after time T , all sampled servers are busy. From now on, we will focus on those sample paths.

Recall that a policy defines a continuous time Markov pro-cess, QN_{(t), M}N_{(t), over the state space Q}N_{× M. The}

evolution of this process between time T and the time of the (C + 1)-st arrival, for all sample paths in C0(T ), can be described in terms of some basic random variables, as follows.

Let tk be the time of the k-th arrival after time T , let

{U (k)}C+1

k=1 be a collection of independent random subsets

distributed as νN (the sampled servers at the time of each arrival), let {V (k)}C+1

k=1 be a collection of i.i.d. random

vari-ables, uniform in [0, 1] (the seeds for the randomized selec-tion of the destinaselec-tions), and let {W (k)}C+1_k=1 be a collection of i.i.d. exponential random variables with parameter 1 (the workloads of each incoming job). All the randomness in our system is modeled by these random variables, and the pro-cess QN(t), MN(t) between times T and tC+1 will be a

deterministic function of them.

To begin with, the remaining workload of the jobs that are being processed, decreases at a rate of one unit of work per unit of time. Moreover, for all sample paths in C0(T ), the memory process MN(t) is only updated at the time of an ar-rival. Consider the mapping FfN : [0, 1] × M × B → N , such

that F_fN(V (k), m, {(n1, q1), . . . , (nr, qr)}) is distributed

ac-cording to fN(m, {(n1, q1), . . . , (nr, qr)}), and such that

σ FfN(V (k), m, {(n1, q1), . . . , (nr, qr)}) =

FfN(V (k), σM(m), {(σ(n1), q1), . . . , (σ(nr), qr)}), (10)

for all σ ∈ SN. Then, let

X(k) = FfN V (k), MN t−k , n U (k), QN t−k o (11) be the random destination of the new job. At the time of the k-th arrival, the following steps occur sequentially but in zero time.

1. A random subset of servers is chosen (U (k)).

2. Based on U (k) and the state of their queues, a desti-nation for the new job is chosen (X(k)) using the seed V (k).

3. The memory is updated according to MN(tk) = gdN MN(t−k), n U (k), QN(t−k) o , X(k), (12) and a new job with workload W (k) is added to queue X(k) in QN.

Recall that for all sample paths in C0(T ), the memory pro-cess is constant between arrivals, and thus its evolution af-ter time T is completely deaf-termined by the previous up-date equation, for which we have MN_(t−

1) = M

N_{(T ) and}

MN(t−_k) = MN(tk−1) for all k ≥ 2.

For every sample path ω, let Ek(ω) be a least cardinality

set such that fN _MN_(t k−1, ω),

U (k, ω), QN_(t−

k, ω) is

uniform (or zero) in Ek(ω)c, and let EN(C + 1)(ω) be the

union of those Ek(ω) for all 1 ≤ k ≤ C + 1. The sets Ek(ω)

will have cardinality of at most C + β by Lemma 13, for all ω ∈ C0(T ).

Using the construction of QN_{(t), M}N_{(t) described above,}

we can define a mapping G, that takes as arguments • the initial condition MN

(T, ω), QN(T, ω),

• the arrival times {tk(ω)}C+1k=1, and the corresponding

workloads {W (k, ω)}C+1_k=1,

• the subsets of sampled servers {U (k, ω)}C+1

k=1 and seeds

{V (k, ω)}C+1 k=1,

and yields the set EN_{(C + 1)(ω).}

Using the symmetry hypothesis in Equations (11) and (12), the symmetry of the map FfN from Equation (10),

and proceeding recursively, it can be checked that the pre-vious mapping G is “symmetric”. In particular, for every permutation σ ∈ SN such that σ(U (k, ω)) = U (k, ω), if

1 ≤ k ≤ C + 1, we have that σ EN_{(C + 1)(ω) is obtained}

when the argument of G is

• the initial condition σM MN(T, ω) , QN(T, ω),

• the arrival times {tk(ω)}C+1k=1, and the corresponding

workloads {W (k, ω)}C+1_k=1,

• the subsets of sampled servers {U (k, ω)}C+1

k=1 and seeds

{V (k, ω)}C+1 k=1.

We omit the details of this claim due to space constraints. Note that, in order to obtain σ EN_{(C + 1), we only need}

to change the initial memory state to σM MN(T ). Then,

due to the fact that we have only NC memory states, we can only obtain at most NC_{different sets σ(E}N_{(C + 1)) by}

varying σ among those that have σ(U (k, ω)) = U (k, ω), if 1 ≤ k ≤ C + 1, i.e., we have that

#nσEN(C + 1): σ ∈ Ao≤ NC

, (13)

where

A = {σ ∈ SN: σ(U (k)) = U (k), if 1 ≤ k ≤ C + 1} . (14)

Recall that for all ω ∈ C0(T ), the number of sampled servers (#U (k, ω)) is at most β, so these permutations leave at most β servers fixed for each one of the C + 1 arrivals, which translates into at most (C + 1)β servers fixed in total.

Now consider the collection of sample paths C01(T ) ⊂ C 0

(T ) where at least one of the sets Ek(ω) is empty. For those

sample paths, at least one of the jobs is dispatched accord-ing to a uniform distribution, and because we had at least N (λ − ) non empty queues with probability bounded away from zero, at least one job is sent to a non empty queue with probability bounded away from zero (this last prob-ability is conditioned on C10(T )). If this event has overall

(13)

on C10(T )), we are done. On the other hand, for all the

sam-ple paths in C0(T )\C01(T ), none of the Ek(ω) are empty. In

order to derive a contradiction, assume that the first C + 1 jobs are sent to an empty queue for a collection of sample paths with “high probability” (in this context, “high prob-ability” refers to probabilities converging to 1 as N → ∞). This implies that the jobs are sent to servers in EN_{(C + 1)}

with high probability. Otherwise, their destination would be distributed uniformly over Ekc (which has cardinality of at

least N − β − C by Lemma 13), and they would go to a non empty queue with probability bounded way from zero, for the same reason as before. Then, each Ek must contain at

least 1 idle server with high probability (and at most C + β by Lemma 13), and thus EN_{(C + 1) must contain between}

C + 1 and (C + β)(C + 1) idle servers. This means that, if a = #EN(C + 1), then #nσEN(C + 1): σ ∈ SN o = N a ! ≥ N C + 1 ! . If instead of using all permutations σ ∈ SN, we take the

subset A defined in Equation (14), then, if b = #

C+1 S k=1 U (k) and a0= # EN_{(C + 1)\}C+1S k=1 U (k) , we have that #{σ(EN(C + 1)) : σ ∈ A} = N − b a0 ! ≥ N − (C + 1)β C + 1 ! , because all sampled servers U (k) (which are at most (C + 1)β) are busy, and thus disjoint with the idle servers in EN(C + 1) (which are at least C + 1). However, we also had an upper bound for this quantity in Equation (13), which leads to a contradiction because NC _< N −(C+1)β

C+1 for N

large enough. This concludes the proof.

8. CONCLUSIONS AND FUTURE WORK

The main objective of this paper is to find necessary and sufficient conditions on the amount of resources (messages and memory) available to a central dispatcher, in order to achieve a vanishing queueing delay as the system size in-creases. This is done by defining a unified framework for a broad class of dispatching policies and by proving two sep-arate results: First, we show that when we have a limited amount of memory and a modest budget of messages per unit of time, all dispatching policies result in queueing de-lay uniformly bounded away from zero. Second, we present two variants of a simple symmetric pull-based dispatching policy for which it is sufficient to have a little more memory or messages in order to have vanishing queueing delay. As an added bonus, we analyze a third variant of the same dis-patching policy with positive queueing delay. We show that by wisely exploiting an arbitrarily small message rate (but still proportional to the arrival rate) we obtain a queueing delay which is finite and uniformly upper bounded even in heavy traffic, a significant qualitative improvement over the M/M/1 queueing delay obtained in the absence of messages. There are several interesting directions for future research. In light of the symmetry assumption in Theorem 5, an imme-diate open question is whether the result still holds without this assumption. Our proof heavily relies on the symme-try assumption and is difficult to generalize. Second, we

could relax the assumption of homogeneous servers, con-sider pools of servers with different service rates, and study the tradeoffs between resources and the stability region of a policy. In this setting we expect a result similar to our lower bound for queueing delay, stating that a symmetric resource constrained policy cannot be stable for every stabi-lizable system. Last but not least, it would be interesting to extend some or all of these results to the case of general job size distributions (e.g., heavy tailed) and/or different service disciplines such as processor sharing or LIFO, as these are prevalent in many applications.

9. REFERENCES

[1] N. Alon, E. Lubetzky, and O. Gurel-Gurevich. Choice-memory tradeoff in allocations. In Proceedings of FOCS, 2009.

[2] Y. Azar, A. Broder, A. Karlin, and E. Upfal. Balanced allocations. SIAM Journal on Computing,

29(1):180–200, 1999.

[3] R. Badonnel and M. Burgess. Dynamic pull-based load balancing for autonomic servers. In Network Operations and Management Symposium, 2008. [4] I. Benjamini and Y. Makarychev. Balanced

allocations: memory performance tradeoffs. The Annals of Applied Probability, 22(4):1642–1649, 2012. [5] D. Bertsimas, D. Gamarnik, and J. Tsitsiklis.

Performance of multiclass markovian queueing networks via piecewise linear lyapunov functions. The Annals of Applied Probability, 11(4):1384–1428, 2002. [6] P. Billingsley. Convergence of Probability Measures.

Wiley, second edition, 1999.

[7] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied Probability, 14(3):502–525, 1982.

[8] C. Lenzen and R. Wattenhofer. Tight bounds for parallel randomized load balalcing. Distributed Computing, pages 1–16, 2014.

[9] S. Lobanov and O. Smolyanov. Ordinary differential equations in locally convex spaces. Uspekhi Mat. Nauk, 49:93–168, 1994.

[10] Y. Lu, Q. Xie, G. Kliot, A. Geller, J. Larus, and A. Greenberg. Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation, 68(11):1056–1071, Nov. 2011. [11] M. Mitzenmacher. The power of two choices in

randomized load balancing. PhD thesis, U.C. Berkeley, 1996.

[12] M. Mitzenmacher., B. Prabhakar., and D. Shah. Load balancing with memory. In Proceedings of FOCS, 2002. [13] G. Stamoulis and J. Tsitsiklis. Optimal distributed

policies for choosing among multiple servers. In Proceedings of the CDC, pages 815–820, 1991. [14] A. Stolyar. Pull-based load distribution in large-scale

heterogeneous service systems. Queueing Systems, 80(4):341–361, 2015.

[15] N. Vvedenskaya, R. Dobrushin, and F. Karpelevich. Queueing system with selection of the shortest of two queues: an asymptotic approach. Problems of Information Transmission, 32(1):15–27, 1996. [16] L. Ying, R. Srikant, and X. Kang. The power of

slightly more than one sample in randomized load balancing. In Proceedings of INFOCOM, 2015.