Comparing Markov Chains: Aggregation and Precedence Relations Applied to Sets of States, with Applications to Assemble-to-Order Systems

(1)

HAL Id: hal-00361430

https://hal.archives-ouvertes.fr/hal-00361430v2

Submitted on 20 Aug 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Comparing Markov Chains: Aggregation and

Precedence Relations Applied to Sets of States, with

Applications to Assemble-to-Order Systems

Ana Bušić, Ingrid Vliegen, Alan Scheller-Wolf

To cite this version:

Ana Bušić, Ingrid Vliegen, Alan Scheller-Wolf. Comparing Markov Chains: Aggregation and

Prece-dence Relations Applied to Sets of States, with Applications to Assemble-to-Order Systems.

Mathe-matics of Operations Research, INFORMS, 2012, 37 (2), �10.1287/moor.1110.0533�. �hal-00361430v2�

(2)

Comparing Markov chains: Aggregation and precedence

relations applied to sets of states, with applications to

assemble-to-order systems

Ana Buˇsi´

c

ana.busic@inria.fr

INRIA/ENS

23, Avenue d’Italie, CS 81321, 75214 Paris Cedex 13, France

Ingrid Vliegen

i.m.h.vliegen@utwente.nl

School of Management and Governance, University of Twente

P.O.Box 217, 7500 AE Enschede, the Netherlands

Alan Scheller-Wolf

awolf@andrew.cmu.edu

Tepper School of Business, Carnegie Mellon University

5000 Forbes Ave., Pittsburgh, PA 15213, USA

August 20, 2010

Abstract

Solving Markov chains is in general difficult if the state space of the chain is very large (or infinite) and lacking a simple repeating structure. One alternative to solving such chains is to construct models that are simple to analyze and provide bounds for a reward function of interest. We present a new bounding method for Markov chains inspired by Markov reward theory: Our method constructs bounds by redirecting selected sets of transitions, facilitating an intuitive interpretation of the modifications of the original system. We show that our method is compatible with strong aggregation of Markov chains; thus we can obtain bounds for an initial chain by analyzing a much smaller chain. We illustrate our method by using it to prove monotonicity results and bounds for assemble-to-order systems.

Keywords: Markov chains; precedence relations; bounding models; aggregation; supermodular-ity; assemble-to-order (ATO) systems

Mathematics Subject Classification (MSC2010): 60E15, 60J10; Secondary: 90B25, 90B05

1 Introduction.

In Markov chain modeling, one often faces the problem of combinatorial state space explosion: modeling a system completely requires an unmanageable - combinatorial - number of states. Many high-level formalisms, such as queueing networks or stochastic Petri nets, have been developed to simplify the specification and storage of the Markov chain. However, these models rarely have closed-form solutions, thus one must resort to numerical methods. Unfortunately, numerical methods are inefficient when the size of the state space becomes very large or for models with infinite state space that do not exhibit a special repeating structure that admits a matrix analytic

(3)

approach (Neuts [31]). Typically, the latter approach is quite limited if the state space is infinite in more than one dimension. An alternative approach to cope with state space explosion is to design new models that (i) provide bounds for a specific measure of interest (for instance the probability of a failure in a complex system); and (ii) are simpler to analyze than the original system.

Establishing point (i): the relationship between the original and new (bounding) systems may be based on different arguments. Potentially the most general way of obtaining bounds is by stochastic comparison, which gives bounds for a whole family of reward functions (for instance increasing or convex functions). Furthermore, stochastic comparison provides bounds for both the steady-state and transient behavior of the studied model. Many results have been obtained using strong stochastic order (i.e. generated by increasing functions) and coupling arguments (Lindvall [21]). Recently, an algorithmic approach has been proposed (Fourneau and Pekergin [11]) to construct stochastic bounds, based on stochastic monotonicity; this stochastic monotonicity provides simple algebraic sufficient conditions for the stochastic comparison of Markov chains. Ben Mamoun et al. [3] showed that an algorithmic approach is also possible using increasing convex ordering that allows one to compare variability. The clear advantage of stochastic comparison is its generality: it provides bounds for a whole family of rewards, both for the steady-state and transient behavior of the studied system. Its drawback is that, due to its generality, it requires strong constraints that may not apply to the system of interest. For more details on the theoretical aspects of stochastic comparison we refer the reader to M¨uller and Stoyan [29], and Shaked and Shantikumar [34].

For this reason more specialized methods than stochastic comparison have also been developed, which apply only to one specific function, and only in the steady-state. Muntz et al. [30] proposed an algebraic approach to derive bounds of steady-state rewards without computing the steady-state distribution of the chain, founded on results of Courtois and Semal [6, 7] on eigenvectors of non-negative matrices. This approach was specially designed for reliability analysis of highly reliable systems, and requires special constraints on the structure of the underlying Markov chain. This approach was further improved and generalized by various authors (Lui and Muntz [27], Semal [33], Carrasco [5], Mahevas and Rubino [28]), but the primary assumption for its applicability is still that there is a very small portion of the state space that has a very high probability of occurrence, while the other states are almost never visited.

Similarly, Van Dijk [41] (see also Van Dijk and Van der Wal [42]) proposed a different method for comparing two chains in terms of a particular reward function, often referred to as the Markov reward approach. This method allows the comparison of mean cumulated and stationary rewards for two given chains. A simplified version of the Markov reward approach, called the precedence relation method, was proposed by Van Houtum et al. [44]. The origin of the precedence relation method dates back to Adan et al. [1], and it has been successfully applied to various problems (Van Houtum et al. [43], Tandra et al. [40], Leemans [20]). The advantage of this method is its straightforward description of the modifications of the initial model.

The precedence relation method consists of two steps. Precedence relations are first established on the states of the system, based on the reward function (or family of functions) one wants to study. Then an upper (resp. lower) bound for the initial model can be obtained simply by redirecting the transitions to greater (resp. smaller) states with respect to the precedence relations established in the first step. This can also be thought of as replacing transitions in the original model with ones to greater (resp. smaller) states. A significant drawback of the precedence relation method is that transitions can be redirected only to greater states (in the case of an upper bound) or only to smaller states (lower bound): This method does not allow one to obtain bounding models by changing the probability of transitions, or by redirecting some transitions to greater states and others to smaller states, which is typically needed, for example, if one wants to keep the mean behavior of a part of the system (for instance arrivals to a queue), but change its variability. One small example of such a system is given in Section 3.

We propose a generalization of precedence relations to sets of states. By establishing precedence relations on sets of states, instead of individual states, one can avoid the above problems as long as the new destination states considered together are greater (resp. smaller) than the original set of destinations. As the modification of the probability of a transition can also be seen as replacement

(4)

of one transition by two new transitions, one of which is a loop, this significantly increases the applicability of the precedence relation method, by allowing one to compare systems with different variability.

We now discuss point (ii): how to derive models that are simpler to solve. In the context of stochastic comparison, different types of bounding models have been used: models having closed form solutions, models that are easier to analyze using numerical methods, and aggregation (Fourneau and Pekergin [11], Fourneau et al. [10]). To our knowledge, the precedence relation method has been combined only with the first two simplifications. We show here that it is also compatible with aggregation. Thus we prove the validity of applying the precedence relation method on sets of states, in concert with the simplifying technique of aggregation.

We illustrate our new technique to prove new results for Assemble-to-Order (ATO) systems. To facilitate this, we first present a general framework for ATO systems, and then a detailed description of a class within this framework that we focus on: ATO models with Partial Order Service (ATO-POS). We then prove a monotonicity property with respect to supermodularity for ATO-POS systems, also discussing how this monotonicity extends, or fails to extend, to other classes within our ATO framework. We then apply precedence relations on sets of states to two specific problems in the ATO-POS class: ATO-POS models with individual returns (see [8] for example), and the service tools problem. This latter problem, introduced by Vliegen and Van Houtum [45], models a single location multi-item inventory system in which customers demand different sets of service tools, needed for a specific maintenance action, from a stock point.

Considering the ATO models with POS, we show that in addition to deriving bounding models, our new method can be used to establish monotonicity results with respect to model parameters. We illustrate this in Section 6.1 to prove monotonicity of order fill rates with respect to coupling in arrivals for ATO-POS systems with state dependent replenishment times. This extends results in Dayanik et al. [8] to the case of uncapacitated ATO systems. Then in 6.2 we apply our technique on the service tool problem, proving that the approximation proposed by Vliegen and Van Houtum [45] provides a bound for their original model. Our bounding model has a state space that is highly reduced compared to Vliegen and Van Houtum’s original system: its dimension is equal to the number of different types of tools (I), while the original model has dimension 2I_.

This paper is organized as follows. In Section 2 we give an overview of the precedence relation method proposed by Van Houtum et al. [44]. In Section 3 we show the limits of this method and we propose and prove the validity of our generalization. Section 4 is devoted to aggregation and its connections with our method. In Section 5 we give an overview of known results for our general class of ATO systems and we establish a monotonicity property for supermodular rewards for a class of state dependent ATO models. This property is then used in Section 6 to obtain our results for ATO-POS and service tools systems. Finally, Appendix A contains a supermodularity characterization on a discrete lattice, necessary for our proofs in Sections 5 and 6.

2 Precedence relations.

Let {Xn}n≥0be a discrete time Markov chain (DTMC) on a countable state space X and denote

by P its transition matrix. A probability measure π on X is stationary if πP = π. It is attractive if for any probability measure ν on X , the sequence of Ces`aro averages of νPn _{converges weakly}

to π, i.e. π(x) = lim t→∞ 1 t t−1 X n=0 (νPn)(x), x ∈ X .

A DTMC is said to be stable if it has a unique and attractive stationary probability measure. This implies in particular that the graph of the Markov chain has a unique strongly connected component (recurrent class) that will be reached with probability one from any other state. We consider in the following only stable DTMC.

(5)

For a given reward (or cost) function r : X → R, the mean stationary reward is given by: a = X

x∈X

r(x)π(x).

Directly computing the stationary distribution π is often difficult if, for instance, the state space is infinite in many dimensions or finite, but prohibitively large. The main idea of the precedence relation method proposed by Van Houtum et al. [44] is to obtain upper or lower bounds for a without explicitly computing π. By redirecting selected transitions of the original model, the graph of the chain is modified to obtain a new chain that is significantly easier to analyze. For example, one might truncate the chain by blocking the outgoing transitions from a subset of states.

Some special care needs to be taken in order to ensure that the reward function of the new chain provides a bound on the reward function of the original chain. We denote by vt(x) (resp.

by_evt(x)) the expected cumulated reward over the first t periods for the original (resp. modified)

chain when starting in state x ∈ X :

vt+1(x) = r(x) +

X

y∈X

P [x, y]vt(y), t ≥ 0, (1)

where v0(x) := 0, ∀x ∈ X . Then for any state x that belongs to the unique recurrent class, the

mean stationary reward satisfies:

a = lim

t→∞

vt(x)

t . If we can show that

vt(x) ≤v_et(x), ∀x, ∀t ≥ 0, (2)

then we have also the comparison of mean stationary rewards: a = lim t→∞ vt(x) t ≤ limt→∞ e vt(x0) t =ea,

where x and x0 belong respectively to the unique recurrent class of the initial and the modified chain.

In order to establish (2), a precedence relation is defined on state space X as follows: x y if vt(x) ≤ vt(y), ∀t ≥ 0.

When talking about rewards, we will often say that a state x is less attractive than y (for the reward function r) if x y.

The following theorem states that redirecting (i.e. replacing with) transitions to less (more) attractive states results in an lower (upper) bound for mean stationary reward (Van Houtum et al. [44, Theorem 4.1]):

Theorem 1. Let {Xn} be a DTMC and let {Yn} be a chain obtained from {Xn} by replacing

some transitions (x, y) with transitions (x, y0) such that y y0. Then: vt(x) ≤v_et(x), ∀x, ∀t ≥ 0.

If both chains are stable, then the mean stationary rewards satisfy a ≤_ea.

The above theorem allows one to easily construct bounding models by redirecting possibly only a few transitions. Van Houtum et al. [44] illustrated their approach on the example of a system with the Join the Shortest Queue routing. In the following section we illustrate some of the limits of the precedence relation approach before proposing its generalization.

(6)

3 Precedence relations on sets of states.

The precedence relation method allows one to redirect transitions: the destination of the tran-sition is modified, while its probability remains the same. Redirecting trantran-sitions to less (more) attractive states results in an lower (upper) bound. However, the following simple example shows that we cannot use the precedence relation method to compare models with the same average arrival rate, but different variabilities. In that case, the modified model is obtained by redirecting simultaneously some transitions to more and some to less attractive states, so Theorem 1 no longer applies.

Example 1 (Single queue with batch arrivals). We consider a single queue with two types of jobs: • Class 1 jobs arrive individually following a Poisson process with rate λ1.

• Class 2 jobs arrive by batches of size 2, following a Poisson process with rate λ2.

We assume a single exponential server with rate µ, and let x denote the number of jobs in the system. Then the following events can occur in the system:

event rate transition condition type 1 arrival λ1 x + 1

-type 2 arrival λ2 x + 2

-service µ x − 1 x > 0

Without loss of generality, we assume that λ1+ λ2+ µ = 1. Thus we can consider λ1, λ2 and µ

as the probabilities of the events in the corresponding discrete time (uniformized) chain. Suppose that we are interested in the mean number of jobs. The appropriate reward function is thus r(x) = x, ∀x. The corresponding t-period rewards satisfy:

vt+1(x) = r(x) + λ1vt(x + 1) + λ2vt(x + 2) + µvt(x − 1)1{x>0}+ µvt(x)1{x=0}, x ≥ 0, t ≥ 0,

with v0(x) := 0, ∀x ≥ 0. Denote respectively by A1, A2 and S the t-period rewards in new states

after an arrival of type 1, an arrival of type 2 and a service in state x: • A1(x, t) = vt(x + 1), • A2(x, t) = vt(x + 2), • S(x, t) = vt(x − 1)1{x>0}+ vt(x)1{x=0}. Then: vt+1(x) = r(x) + λ1A1(x, t) + λ2A2(x, t) + µS(x, t), x ≥ 0, t ≥ 0. (3) For all t ≥ 0: vt(x) ≤ vt(x + 1), x ≥ 0, (4)

i.e. x x + 1, x ≥ 0. This can be easily shown by induction on t. Assume (4) holds for a given t ≥ 0 (for t = 0, v0 := 0 is obviously non-decreasing). We can consider separately one period

rewards, arrivals of each type, and service.

• One period rewards. v1= r is obviously non-decreasing.

• Type 1 arrivals. For x ≥ 0, A1(x + 1, t) − A1(x, t) = vt(x + 2) − vt(x + 1) ≥ 0, since vtis

non-decreasing.

• Type 2 arrivals. For x ≥ 0, A2(x + 1, t) − A2(x, t) = vt(x + 3) − vt(x + 2) ≥ 0, since vtis

non-decreasing.

• Service. For x ≥ 0, S(x + 1, t) − S(x, t) = vt(x) − vt(x − 1)1{x>0}− vt(x)1{x=0}= (vt(x) −

(7)

It follows from relation (3) that vt+1is also non-decreasing, so (4) holds for all t ≥ 0.

Now, suppose some class 1 jobs become class 2 jobs, keeping the total arrival rate constant. This means that these jobs arrive less often (only half of the previous rate), but they arrive in batches of size 2 (Figure 1). Then:

λ0₁= λ1− and λ 0

2= λ2+

2,

where 0 < ≤ λ1. The total arrival rate is the same in both models, but the arrival process of

the second system is more variable.

λ₂ 1 λ µ µ λ2 1 λ + ε/2 − ε

Figure 1: Batch arrivals.

Different transitions for both models are shown in Figure 2. Note that a part of the transition rate that corresponds to the arrivals of type 1 is replaced by a new transition that corresponds to the arrivals of type 2, but the rate is divided by two. This can be also seen as replacing one transition with rate by two transitions, each with rate /2: a transition to state x + 2 and a loop (see Figure 2). Thus we cannot apply Theorem 1, since it allows neither replacing only a part of a transition, nor replacing one transition with two new ones: one going to a more and one to a less attractive state.

00 00 11 11 0 0 1 1 00 11 0011 0 0 1 1 ₀₁ ₀₀00₁₁11 00 11 µ µ _λ1λ2 x 00 00 11 11 0 0 1 1 00 11 0011 0 0 1 1 ₀₁ ₀₀00₁₁11 00 11 µ µ _λ1λ2_{− ε}+ ε/2 ε/2 x

Figure 2: Batch arrivals: redirecting transitions.

In the following we propose a more general method that allows us to replace a transition, or more generally a set of transitions, by another set of transitions having the same aggregate rate. Also, only a portion of some transitions may be redirected.

3.1 Generalization of precedence relations.

To aid intuition, we will introduce the main ideas by considering a single state x ∈ X . (The general result will be proved later, in Theorem 2.) Assume we want to replace (redirect) the outgoing transitions from x to a subset A with transitions to another subset B. For instance, in Figure 3 we want to replace transitions to A = {a1, a2} (on the left) with transitions to B = {b1, b2, b3}

(on the right). We might also have some transitions from x to states that are not in A and that we do not want to redirect (transitions to states u and v in Figure 3).

Furthermore, we might want to redirect transitions only partially: in Figure 4 only the half of probability of transitions to A is replaced by transitions to B. Thus in order to describe the redirection of a set of transitions we will need to provide:

(8)

0 0 1 1 01 00 00 11 11 00 11 0 1 00 00 11 11 00 11 0 0 1 1 x 1 3 2 b b b a1 a2 0.1 0.4 0.25 0.25 u v 0 0 1 1 01 00 00 11 11 00 11 0 1 00 00 11 11 00 11 0 0 1 1 x 1 3 2 b b b a1 a2 0.1 0.4 0.1 0.1 0.3 u v

Figure 3: Redirecting the sets of transitions.

• the weight factor ∆ (the amount of each transition to be redirected; the same scalar ∆ is applied to all transitions to states in A). Because we can include only part of a transition in set A, this common ∆ is not restrictive.

0 0 1 1 ₀₁ 00 00 11 11 00 11 0 1 00 00 11 11 00 11 0 0 1 1 x 1 3 2 b b b a1 a2 0.1 0.4 0.25 0.25 u v 0 0 1 1 ₀₁ 00 00 11 11 00 11 0 1 00 00 11 11 00 11 0 0 1 1 x 1 3 2 b b b a1 a2 0.1 0.4 0.125 0.125 0.15 0.05 0.05 v u

Figure 4: Partial redirection of transitions.

Information on sets A and B and the corresponding transition probabilities will be given by two vectors α = (α(z))z∈X and β = (β(z))z∈X. Since the information on the amount to be

redirected will be given by a weight factor, we only need the relative probabilities of transitions to the respective sets A and B. Therefore it is convenient to renormalize the vectors α and β. The modifications in Figures 3 and 4 can now be described by vectors α = 0.5δa1+ 0.5δa2 and

β = 0.2δb1+ 0.2δb2+ 0.6δb3, where for any y ∈ X , δy denotes the Dirac measure in y:

δy(z) =

1, z = y,

0, z 6= y , z ∈ X . The weight factor ∆ is equal to 0.5 in Figure 3 and to 0.25 in Figure 4.

We now formally generalize the previous example. Let α and β be two stochastic vectors: α(z) ≥ 0, β(z) ≥ 0, ∀z ∈ X and ||α||1 = ||β||1 = 1 (where ||α||1 := P_z∈Xα(z) is the usual

1-norm). Let {Xn} be a DTMC with transition probability matrix P and t-period reward functions

vt, satisfying the following relation:

X z∈X α(z)vt(z) ≤ X z∈X β(z)vt(z), ∀t ≥ 0. (5)

Let A and B denote the supports of vectors α and β respectively:

A = supp(α) = {z ∈ X : α(z) > 0}, B = supp(β) = {z ∈ X : β(z) > 0}. For the example in Figures 3 and 4, A = {a1, a2} and B = {b1, b2, b3}.

If (5) holds, we will say that the set of states A is less attractive than the set B with respect to probability vectors α and β, and we will denote this:

(9)

We will show that if A α,β B, replacing the outgoing transitions to A (with probabilities α) by

the outgoing transitions to B (with probabilities β) leads to an upper bound for t-period rewards (and thus also for the mean stationary reward, when it exists). Before giving this result in Theorem 2, note that relation (5) is indeed a generalization of precedence relations of states:

Remark 1. Suppose x y, for some x, y ∈ X . Set α = δx and β = δy. Then (5) becomes:

vt(x) ≤ vt(y), t ≥ 0,

which is equivalent to x y by definition.

To see that (5) indeed is more general than the precedence relation method (Van Houtum et al. [44]), let α = δx and β = 12δy+12δz, x, y, z ∈ X . Then (5) becomes:

vt(x) ≤

1 2vt(y) +

1

2vt(z), t ≥ 0. We can write this {x} _δ

x,12δy+12δz {y, z}. By taking y = x + 1 and z = x − 1 this is exactly the

relation we need to prove in Example 1. The proof of this relation for Example 1 will be given in Section 3.2.

Replacing the outgoing transitions that correspond to the set A and probabilities α by the tran-sitions that correspond to the set B and probabilities β (called (α, β)-redirection in the following), can be also represented in matrix form. The matrix Tα,β(x) defined as:

Tα,β(x)[w, z] =

β(z) − α(z), w = x, 0, w 6= x,

describes the (α, β)-redirection of the outgoing transitions from state x ∈ X . The transition matrix of the modified chain after (α, β)-redirection of the outgoing transitions from state x ∈ X is then given by:

e

P = P + ∆α,β(x)Tα,β(x),

with the weight factor ∆α,β(x), 0 ≤ ∆α,β(x) ≤ 1. (Note that if ∆α,β(x) = 0, we do not modify

the chain.) In order for eP to be a stochastic matrix, the weight factor ∆α,β(x) must satisfy:

0 ≤ P [x, y] + ∆α,β(x)(β(y) − α(y)) ≤ 1, y ∈ X , (6)

which can be also written as: ∆α,β(x) ≤ min min y : α(y)>β(y) _{P [x, y]} α(y) − β(y) , min y : α(y)<β(y) 1 − P [x, y] β(y) − α(y) .

Without loss of generality, we can assume that the supports of α and β are disjoint: A ∩ B = ∅. Indeed, if there exists y ∈ X such that α(y) > 0 and β(y) > 0, then we can define new vectors α0 = _1−c1 (α − cδy) and β0 = _1−c1 (β − cδy), where c = min{α(y), β(y)}. Relation (5) is then

equivalent to: X z∈X α0(z)vt(z) ≤ X z∈X β0(z)vt(z), t ≥ 0.

Assuming A and B are disjoint, relation (6) has an intuitive interpretation given as Proposition 1: one can only redirect existing transitions.

Proposition 1. For vectors α and β with supports A ∩ B = ∅ condition (6) is equivalent to: α(y)∆α, β(x) ≤ P [x, y], y ∈ X . (7)

Proof. Relation (7) follows trivially from (6) as α(y) > 0 and A ∩ B = ∅ imply β(y) = 0. In order to see that (7) implies (6), we will consider the following cases for an arbitrary y ∈ X :

(10)

• α(y) > 0. Then β(y) = 0 so relation (6) becomes: 0 ≤ P [x, y] − ∆α,β(x)α(y) ≤ 1. As we

assumed that 0 ≤ ∆α,β(x) ≤ 1, and 0 ≤ α(y) ≤ 1, the right inequality is trivial and the left

one is simply relation (7).

• β(y) > 0. Then α(y) = 0 so relation (6) becomes: 0 ≤ P [x, y] + ∆α,β(x)β(y) ≤ 1. The left

inequality is trivial. For the right one we have, using the fact that P is a stochastic matrix and β(y) ≤ 1: P [x, y] + ∆α,β(x)β(y) ≤ 1 − X z6=y P [x, z] + ∆α,β(x) ≤ 1 −X z6=y ∆α,β(x)α(z) + ∆α,β(x) = 1 + ∆α,β(x)  1 −X z6=y α(z)   = 1 + ∆α,β(x)α(y) = 1,

where the second inequality follows from (7). • Finally the case α(y) = β(y) = 0 is trivial.

Until now we have considered only one state x and only one relation (α, β). Typically, we will redirect outgoing transitions for a subset of the state space, and we may need more than one relation. Let R be a set of relations that are valid for our model (i.e. they satisfy (5)). We will denote by Rx ⊂ R the set of all relations that will be used for a state x ∈ X . (If the outgoing

transitions for a state x ∈ X do not change, we will set Rx := ∅.) Note that the same relation

(α, β) could be applied to different states x and x0 (i.e. (α, β) ∈ Rx∩ Rx0). In that case the

corresponding weight factors ∆(α,β)(x) and ∆(α,β)(x0) need not to be equal. Also, there could be

different relations (α, β) and (α0, β0) that have the same supports A and B; we might even have that A (α,β) B, but B (α0_,β0₎ A (if supp(α) = supp(β0) = A and supp(β) = supp(α0) = B). Thus

our method can be made arbitrarily general. The following theorem states that under conditions similar to (5) and (6) (for all states and for a family of precedence relations), the t-period rewards e

vtof the modified chain satisfy:

vt(x) ≤evt(x), x ∈ X , t ≥ 0.

Theorem 2. Let {Xn} be a DTMC with transition probability matrix P and a reward r that is

bounded from below:

∃m ∈ R, r(x) ≥ m, ∀x ∈ X . (8) Denote by vt, t ≥ 0, the corresponding t-period rewards. Let R be a set of pairs of stochastic

vectors such that for all (α, β) ∈ R: X y∈X α(y)vt(y) ≤ X y∈X β(y)vt(y), t ≥ 0. (9)

Let Rx⊂ R, x ∈ X (the precedence relations that will be applied to a state x ∈ X ). Let {Yn} be

a DTMC with transition probability matrix eP given by: e P = P +X x∈X X (α,β)∈Rx ∆α,β(x)Tα,β(x),

where the factors ∆α,β(x), x ∈ X , (α, β) ∈ Rx, satisfy:

0 ≤ P [x, y] + X

(α,β)∈Rx

(11)

(i.e. 0 ≤ eP [x, y] ≤ 1, x, y ∈ X ).

Then the t-period rewards _evt of the modified chain satisfy:

vt(x) ≤evt(x), x ∈ X , t ≥ 0. (11) Symmetrically, if (9) is replaced by:

X y∈X α(y)vt(y) ≥ X y∈X β(y)vt(y), t ≥ 0, (12)

for all (α, β) ∈ R, then the t-period rewards_evt of the modified chain satisfy:

vt(x) ≥evt(x), x ∈ X , t ≥ 0. (13) Proof. We will prove (11) by induction on t. For t = 0 we have v0(x) =ve0(x) := 0, x ∈ X , so (11) is trivially satisfied. Suppose (11) is satisfied for t ≥ 0. Then for t + 1 we have:

e vt+1(x) = r(x) + X y∈X e P [x, y]_evt(y) ≥ r(x) +X y∈X e P [x, y]vt(y) = r(x) +X y∈X  P [x, y] + X (α,β)∈Rx ∆α,β(x)Tα,β(x)[x, y]  vt(y).

Relation (10) implies that for all y ∈ X the series absolutely converges (in R+∪ {+∞}) if vt is

bounded from below. (Note that if r is bounded from below then vt is bounded from below for

each t.) Thus, simplifying and interchanging summation:

e vt+1(x) ≥ vt+1(x) + X (α,β)∈Rx ∆α,β(x) X y∈X

(β(y) − α(y)) vt(y)

≥ vt+1(x).

Remark 2. If for all x ∈ X

|Rx| < ∞ and for all (α, β) ∈ Rx, |supp(α) ∪ supp(β)| < ∞, (14)

then the result of Theorem 2 holds without condition (8). In other words, if for all x ∈ X , the initial and the modified chain differ only in a finite number of the outgoing transitions, then the reward function r can be unbounded.

Corollary 1. Under the same conditions as in Theorem 2, if both the initial and the modified chain are stable, then the mean stationary reward of the modified chain {Yn} is an upper bound

(a lower bound in case of (12)) for the mean stationary reward of the original chain {Xn}.

3.2 Proving the relations.

Thus the steps in order to prove a bound are to first identify the set R, and then to prove the corresponding relations for the t-period rewards. We will illustrate this steps on our simple example of a queue with batch arrivals, discussed in Example 1.

Example 2. Consider again the two models from Example 1. We will show that:

(12)

where vt denotes the t-period rewards for the original chain and _evt for the modified chain for

reward function r(x) = x, ∀x. For each x ≥ 0, we want to replace a part of the transition that goes to x + 1 by two new transitions that go to x and x + 2. We will define vectors αxand βxas

follows:

αx= δx+1, βx=

1

2(δx+ δx+2) .

Let Rx= {(αx, βx)}, x ∈ X . Then R = ∪x∈XRx= {(αx, βx) : x ∈ X }. Furthermore, let:

∆(x) := ∆αx,βx(x) = , x ∈ X ,

with 0 < ≤ λ1. Let P be the transition probability matrix of the original discrete time model.

The transition matrix of the modified chain is then given by: e P = P +X x∈X X (α,β)∈Rx ∆α,β(x)Tα,β(x) = P + X x∈X Tαx,βx(x).

Relation (9) for (αx, βx), x ≥ 0, is equivalent to convexity of functions vt, t ≥ 0:

vt(x + 1) ≤

1

2vt(x + 2) + 1

2vt(x), x ≥ 0, t ≥ 0.

Thus, if we prove that vt, t ≥ 0 are convex, then Theorem 2 implies (15), as our reward function

r is positive, and (10) holds from the definition of .

In the proof of convexity of vt, t ≥ 0, we will also use the fact that for each t ≥ 0, the function

vtis non-decreasing, which was shown in Example 1, relation (4).

Assume that vt is convex for a given t ≥ 0 (for t = 0, v0:= 0 is obviously convex). Then for

t + 1 we have (see (3) in Example 1):

vt+1(x) = r(x) + λ1A1(x, t) + λ2A2(x, t) + µS(x, t), x ≥ 0.

We consider separately one period rewards, arrivals of each type, and service. • One period rewards. v1= r is obviously convex.

• Type 1 arrivals. For x ≥ 0, A1(x + 2, t) + A1(x, t) − 2A1(x + 1, t) = vt(x + 3) + vt(x + 1) −

2vt(x + 2) ≥ 0, since vtis convex.

• Type 2 arrivals. For x ≥ 0, A2(x + 2, t) + A2(x, t) − 2A2(x + 1, t) = vt(x + 4) + vt(x + 2) −

2vt(x + 3) ≥ 0, since vtis convex.

• Service. For x ≥ 0, S(x + 2, t) + S(x, t) − 2S(x + 1, t) = vt(x + 1) + vt(x − 1)1{x>0} +

vt(x)1{x=0}− 2vt(x) = 1{x>0}(vt(x + 1) + vt(x − 1) − 2vt(x)) + 1{x=0}(vt(x + 1) − vt(x)) ≥ 0,

since vtis non-decreasing and convex.

Thus vt+1is convex, so by induction on t, vtis convex for all t ≥ 0. Applying Theorem 2 we have

vt(x) ≤evt(x), x ≥ 0, t ≥ 0, that is the mean number of jobs in the system increases if we have more variable arrivals.

Remark 3. The goal of this example is only to illustrate the generalization of the precedence rela-tion method. Note that the above result can be also obtained by using stochastic recurrences and icx-order (see for instance Baccelli and Br´emaud [2]) or using event based dynamic programming (Koole [18], Koole [19]).

A primary use of precedence relations is to enable bounds to be established by analyzing simpler (smaller) systems. As mentioned in the introduction, one common simplification of a chain is to reduce its state space using aggregation. We examine this technique next.

(13)

4 Aggregation.

In this and the following sections we assume that the state space of the chain is finite, as we will use results in Kemeny and Snell [16] on the aggregation of finite Markov chains. Let C = {Ck}k∈K

be a partition of the state space X into macro-states:

∪k∈KCk = X , Ci∩ Cj= ∅, ∀i 6= j.

Definition 1. (Kemeny and Snell [16]) A Markov chain X = {Xn}n≥0is strongly aggregable (or

lumpable) with respect to partition C if the process obtained by merging the states that belong to the same set into one state is still a Markov chain, for all initial distributions of X0.

There are necessary and sufficient conditions for a chain to be strongly aggregable:

Theorem 3. (Matrix characterization, Kemeny and Snell [16, Theorem 6.3.2]) A DTMC X = {Xn}n≥0 with probability transition matrix P is strongly aggregable with respect to C if and only

if:

∀i ∈ K, ∀j ∈ K, X

y∈Cj

P [x, y] is constant for all x ∈ Ci. (16)

Then we can define a new (aggregated) chain Y = {Yn}n≥0 with transition matrix Q. For all

i, j ∈ K:

Q[i, j] = X

y∈Cj

P [x, y], x ∈ Ci.

There are many results on aggregation of Markov chains, however they primarily consider the stationary distribution. Surprisingly, we were not able to find the following simple property, so we provide it here with a proof.

Proposition 2. Let X = {Xn}n≥0 be a Markov chain satisfying (16) and Y = {Yn}n≥0 the

aggregated chain. Let r : X → R be a reward function that is constant within each macro-state, i.e. there exist rk ∈ R, k ∈ K such that for all k ∈ K:

r(x) = rk, ∀x ∈ Ck.

Denote by vt and wt the t-period rewards for chains X and Y . Then for all k ∈ K:

vt(x) = wt(k), x ∈ Ck, t ≥ 0. (17)

Proof. We will show (17) by induction on t.

Suppose that (17) is satisfied for t ≥ 0 (for t = 0 this is trivially satisfied). Then for t + 1 and k ∈ K: vt+1(x) = r(x) + X y∈X P [x, y]vt(y) = rk+ X j∈K X y∈Cj P [x, y]vt(y).

By the induction hypothesis vt(y) = wt(j), j ∈ K, y ∈ Cj, and from Theorem 3, for x ∈ Ck,

P y∈CjP [x, y] = Q[k, j], j ∈ K. Thus: vt+1(x) = rk+ X j∈K Q[k, j]wt(j) = wt+1(k).

By taking the limit, the above result also gives us the equality of mean stationary rewards. Corollary 2. Let X and Y be two Markov chains satisfying the assumptions of Proposition 2. If both chains are stable (which is equivalent to having only one recurrent class in the case of a finite chain), then the mean stationary rewards satisfy a =_ea (where a = limt→∞vt(x)_t , for any x

from the unique recurrent class of the chain X, and_ea = limt→∞ wt(k)

t for any k from the unique

(14)

5 Assemble-to-order systems.

Although assemble-to-order (ATO) systems typically refer to systems in which components are kept to stock and the end-product is only assembled when a customer demand occurs, the same model can also be used in much more general settings. The ATO model can be used to model processes at, for instance, supermarkets, Dell (Kapuscinski et al. [14]), Amazon (Xu et al. [46]), maintenance companies (Vliegen and van Houtum [45]) and for assortment planning (K¨ok et al. [17]).

In this section, we first present a general framework for ATO systems, in which these different applications can be included (Section 5.1). Then, in Section 5.2, we give a detailed model de-scription for one instance of our framework, the so-called ATO models with partial order service (POS). For this specific model we prove a monotonicity property with respect to supermodularity (Section 5.3). Afterwards, in Section 5.4, we discuss the other instances within the framework, and whether or not the same property holds. In Section 6, we apply this monotonicity property to prove results for two specific examples from our framework.

5.1 Framework.

ATO systems have received a lot of attention in literature over the last decades. For an overview of research in this area, see Song and Zipkin [39] and, more recently, Bijvank [4].

In these overview papers (Song and Zipkin [39], and Bijvank [4]), several distinguishing charac-teristics are mentioned. A first distinction is between one-period models, periodic review models, and continuous review ATO models. We focus on continuous-review models. A second character-istic is the replenishment of the components, which is influenced by the service capacity and the lead time distribution of the replenishment times. We focus on exponential replenishment times. Third, the way an out-of-stock situation is handled is important. Three options are possible: A demand can be backordered; fulfilled partly (just the components that are available), referred to as partial order service (POS); or lost fully, referred to as total order service (TOS). We discuss all three of these options below. Finally, Song and Zipkin [39] distinguish between studies on optimal policies and studies on performance evaluation for a given policy. In this work, we focus on the performance evaluation of ATO systems. As is common in the ATO literature, we assume at most one component of each type will be demanded for any item (the unit demand cases).

One characteristic not yet studied extensively in the field of ATO systems is the effect of cou-pling in returns/replenishments of components used in the assembly of the end-product. It is usually assumed that components have independent replenishment times. However, when service tools for maintenance actions are used, they are returned together (Vliegen and van Houtum [45]). This is also true for other products that are borrowed or rented, i.e., a rental car together with the child seat or the GPS system. But even for replenishment systems, a coupling between the replenishment moments of components might exist: a supplier might use the same shipment for sending all components. Therefore, we distinguish between two different situations for the replen-ishment/return of components: components are either returned individually (and independently) or jointly. This distinction can be seen on the vertical axis of our framework (see Table 1). On the horizontal axis of the framework, we distinguish the different policies for an out-of-stock situation, as also seen in Song and Zipkin [39]: backordering, total order service, and partial order service. Thus six different returns/out of stock combinations can be distinguished. We now shortly discuss the related literature of all these pairs.

A1. In the upper left corner (A1), ATO systems with individual returns and total order service are considered, for instance, Dell computers (Kapuscinski et al. [14]). A customer wants to have a computer with some specific elements. If one of these parts is not available, the customer does not want to have the computer at all: the demand for all parts is lost. After parts are used in the assembly of a computer, they are replenished with part specific replenishment rates. This kind of model has also been studied by Song et al. [37]. They present an exact evaluation of ATO systems with backordering (backlog queue is infinite) and lost sales (with both POS and TOS; backlog queue equal to zero) by using a matrix geometric approach. This exact method, however, needs a

(15)

Returns/Out-of-stock 1. TOS 2. POS 3. Backordering

A. Individual Song et al. [37] Song et al. [37] Song et al. [37]

Kapuscinski et al. [14] Dayanik et al. [8] Song and Yao [38] Iravani et al. [13] Iravani et al. [13] Lu et al. [25]

Lu et al. [26] Lu [23]

B. Joint Kelly [15] Vliegen and Van Houtum [45]

Table 1: A framework for ATO systems.

considerable amount of time for large problem instances. Iravani et al. ([13]) extended the model of Song et al. by allowing customers to be selective and/or flexible. Selective customers are not going to buy the final product unless all key items are available. If all key-items are available, the product is bought, even if other non-key-items are missing. Flexible customers are willing to substitute one or more non-available items by an item that is available. The ATO-TOS model is a special case of the model of Iravani et al. ([13]).

B1. In the lower left corner (B1), models are considered that have joint returns of resources used plus total order service. The best known example of this is telecommunication systems, or specifically loss networks (Kelly [15]). In these networks, demands arrive, for example a phone call, that need several links to be simultaneously available. If all links are available, the call is completed. After the call is finished, all links simultaneously become available again. When one or more of the links is not available, the call is not completes,. and the demand for all links is lost (total order service). Although loss networks have a product-form solution (Ross [32]), exactly computing the blocking probabilities for this system is known to be a #P-complete problem (Louth et al. [22]) due to the normalizing constant.

A2. In the upper middle section (A2), ATO systems with individual returns and partial order service are considered. A example from practice is grocery shopping: Customer come to a supermarket with a list of things they want to buy. When one of the items is not available, the customer still takes the rest of the items home. He or she, however, will be less satisfied after leaving the supermarket. Furthermore, there is no direct relationship between the items that were demanded together and their replenishment time; especially when items have different suppliers, there is no coupling in returns. In the literature, this kind of model has been studied by, among others, Song et al. [37]. As already discussed, Song et al. [37] studied a model that could apply to both POS and TOS, and presented an exact, but time-consuming, evaluation for this model. Dayanik et al. [8] studied bounds on the performance of the system. Furthermore, the ATO-POS model is another special case of the model discussed in Iravani et al. ([13]).

B2. In the lower middle section (B2), ATO systems with joint returns and partial order service are considered. An example of this element of the framework is the service tool problem as considered in Vliegen and van Houtum [45]. In this problem, to perform a maintenance action, several service tools are needed at the same time. Whenever one or more tools are not present, they are sent by an emergency shipment to be able to start the maintenance action as soon as possible. For the supply location under consideration the demand for these emergency shipped tools is lost. Furthermore, after usage all tools return to the location they were sent from together. So, there is also a coupling in returns. Vliegen and van Houtum [45] developed three different approximations, of which the third one (a weighted average of approximation 1 and 2) leads to accurate and, for small instances, efficient results. For larger instances, however, these approximations are still time consuming.

A3. In the upper right corner (A3), ATO systems with individual returns and backordering are considered. Within this portion of the framework (and also in B3), a distinction needs to be made with regard to the order in which backorders are filled. While most papers assume the First-Come-First-Served (FCFS) rule, Lu et al. [24] analyze the class of no-holdback (NHB) rules. Under NHB rules, a demand is backordered if and only if at least one of its components is out of

(16)

stock. In this case, components are not put aside as committed stock as is done in FCFS. In our framework, we consider the FCFS rule only.

A real-life example for section A3 is a mail-order or e-commerce retailer. Often customer orders consisting of multiple items are shipped only when all items are present; the whole order therefore might be delayed/backordered. This model is among others studied by Song et al. [37] in the case where the backlog queue is infinitely long. Song and Yao [38] study a single-item model for which they derive an exact evaluation as well as easy-to-computer bounds. Lu et al. [25] extend this model to a multi-item setting and derive the joint queue-length distribution, and Lu et al. [26] study the optimal component inventories under a given budget so at to minimize the weighted average backorders. Furthermore, they derive bounds and approximations for the expected number of backorders, formulate surrogate optimization problems and develop algorithms to solve these problems. Lu [23] generalizes the model of Lu et al. [26] to a situation with general assumptions of demand patterns and replenishment leadtime distributions.

B3. In the lower right corner (B3), ATO systems with joint returns and backordering are considered. This situation can be seen in online retail settings, where items are supplied from the same supplier. In this case, the customer order, which is either fulfilled or backlogged, triggers a replenishment order for the items demanded. If these items are delivered by the same supplier, a coupling in the return epoch might exist. We are not aware of any papers in this part of the framework that assume exponential lead times. So, the model of this part of our framework has not been applied so far. However, for a slightly different situation, where equal deterministic replenishment times are assumed (in special cases of Song [35], Song [36]), joint returns in an ATO system with backordering do exist.

We can conclude that for all sections of our framework bounding models and/or approximations have been developed, or might be useful. For loss networks (B1), however, a product-form solution exists, which makes the exact calculation somewhat easier. We therefore focus on the other sections of the framework.

In the next subsection, we will give a detailed model description for the ATO model with POS and individual returns (A2). For this specific model we prove a monotonicity property with respect to supermodularity in Section 5.3. Afterwards, in Section 5.4, we discuss the other categories of the framework, and whether or not the same property holds.

5.2 Model description ATO-POS (A2).

We consider an assemble-to-order system with POS and individual returns, that we will call the ATO-POS model. We discuss some possible extensions to other variants (backordering and TOS) in Section 5.4.

The set of items is denoted by I and the number of item types I = |I|. We assume a base stock replenishment policy and denote by S = {S1, . . . , SI} the base stock vector. The state

of the system is given by vector x = (x1, . . . , xI), where xi is the number of items of type i in

replenishment. We assume exponential demands and replenishments. There is a server for each item type and the replenishment rate of item i may be state dependent: the service rate for item type i in state x ∈ X is equal to µi(x). Furthermore, we assume that replenishment rates for type

i items depend only on xi, i.e.:

for all x, y ∈ X such that xi= yi, µi(x) = µi(y),

and we will write µi(xi) for µi(x). We will assume further that

X

i∈I

µi(xi) > 0, ∀x 6= (0, . . . , 0). (18)

We can have demands for sets of items: λA denotes the demand rate of subset ∅ 6= A ⊂ I.

Then λ = P

∅6=A⊂IλA denotes the total demand rate. We assume partial order fulfillment: if

some of demanded items are not on stock, then the present items are sent from stock and the missing items are considered as lost sales. A system with 2 items is given in Figure 5.

(17)

µ2(x2) µ1(x1) S1− x1 λ2 λ1,2 λ1 S2− x2

Figure 5: A system with 2 item types.

Markov chain description. The Continuous Time Markov Chain (CTMC) of the ATO-POS model has I-dimensional state space:

X = {x = (x1, . . . , xI) | 0 ≤ xi≤ Si, i ∈ I} ,

where xi denotes the number of items of type i in replenishment. In the following, for i ∈ I, we

denote by ei the state with all the components equal to 0 except the component i that is equal

to 1.

There are two types of transitions:

• Demands. For each subset ∅ 6= A ⊂ I, a demand of set A occurs with rate λA. The new

state is described by operator dA: X → X :

dA(x) = x +

X

i∈A

1_{xi<Si}ei.

• Replenishment. For each item type i ∈ I, the new state after replenishment of item i is described by operator ri: X → X :

ri(x) = x − 1{xi>0}ei.

Denote by γi the maximal service rate for item type i:

µi(xi) ≤ γi, x ∈ X .

Then the outgoing rate for each state is smaller than P

∅6=A⊂IλA+Pi∈Iγi. We will take the

somewhat unconventional uniformization constant: ν = X ∅6=A⊂I |A|λA+ X i∈I γi.

The choice of this uniformization constant will be very useful in Section 6, where we will establish a monotonicity property for the ATO-POS model (A2) with respect to the amount of coupling in the system.

We consider a Discrete Time Markov Chain (DTMC) after uniformization with parameter ν. To simplify notation and without loss of generality, we assume that ν = 1. Then a demand of item set A in the uniformized chain occurs with probability λA and the replenishment probability of

item i is state-dependent and equal to µi(xi) in state x ∈ X . Figure 6 illustrates the state space

and transitions for four different states in the 2-item case (the loops are omitted). Relation 18 implies that the Markov chain has only one recurrent class (and it is finite so this implies stability), so that the mean stationary rewards will be always well defined for our ATO-POS model.

(18)

00 00 11 11 00 11 0011 00 00 11 11 00 00 11 11 0 0 1 1 00 00 11 11 0 0 1 1 00 00 11 11 00 11 0 0 1 1 00 00 11 11 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 00 11 00 11 00 11 00 11 00 11 x1 S1 S2 x2 µ1(2) µ1(1) λ2 λ2 λ1 λ1 µ2(4) λ1+ λ1,2 µ1(5) λ2+ λ1,2 λ1,2 λ1,2 µ1(3) µ2(2) µ2(2)

Figure 6: State space for the 2-item case.

5.3 ATO-POS model (A2) and supermodular rewards.

Let r : X → R be any reward function and denote by vt: X → R, t ≥ 0, its cumulative t-period

reward: vt+1(x) = r(x)+ X ∅6=A⊂I λAvt(dA(x))+ X i∈I µi(xi)vt(ri(x))+  ν − X ∅6=A⊂I λA− X i∈I µi(xi)  vt(x), t ≥ 0, (19) with v0(x) := 0, x ∈ X . To simplify the discussion, denote by DA(x, t) : X × N0 → R and

Ri(x, t) : X × N0→ R respectively the operators for demand and replenishment terms:

DA(x, t) = λAvt(dA(x)), ∅ 6= A ⊂ I

Ri(x, t) = µi(xi)vt(ri(x)), i ∈ I.

Furthermore, we define the uniformization operator U : X × N0→ R as:

U (x, t) =  ν − X ∅6=A⊂I λA− X i∈I µi(xi)  vt(x) =   X ∅6=A⊂I (|A| − 1)λA+ X i∈I (γi− µi(xi))  vt(x). Then: vt+1(x) = r(x) + X ∅6=A⊂I DA(x, t) + X i∈I Ri(x, t) + U (x, t), x ∈ X , t ≥ 0, (20) with v0(x) := 0, x ∈ X .

We now prove the monotonicity property of the ATO-POS model with respect to supermodu-larity:

Proposition 3. If the reward function r is supermodular, then for all t ≥ 0, t-period reward vtis

also a supermodular function.

Proof. After Proposition 7 in Appendix A, proving supermodularity of vtis equivalent to showing

that for all i, j ∈ I, i 6= j, and for all x ∈ X such that x + ei+ ej∈ X :

vt(x + ei+ ej) + vt(x) ≥ vt(x + ei) + vt(x + ej), ∀t ≥ 0. (21)

We will show relation (21) by induction on t. Suppose that relation (21) holds for a given t ≥ 0 (the case t = 0 is trivial). We will show that then it also holds for t + 1. Let i, j ∈ I, i 6= j, be arbitrary and fixed. We need to show that for all x ∈ X such that x + ei+ ej ∈ X :

(19)

We will prove this by considering demands separate from replenishments and uniformization terms. For demands we will prove the following relations for all x ∈ X such that x + ei+ ej∈ X :

DA(x + ei+ ej, t) + DA(x, t) ≥ DA(x + ei, t) + DA(x + ej, t), ∅ 6= A ⊂ I. (23)

Due to the state-dependent replenishment rates, we need to consider replenishments and uni-formization terms together. The uniuni-formization term U (x, t) can be decomposed into the following I + 1 parts: U (x, t) = UD(x, t) + X k∈I Uk(x, t), (24) where: UD(x, t) =   X ∅6=A⊂I (|A| − 1)λA  vt(x), Uk(x, t) = (γk− µk(xk))vt(x), k ∈ I.

We will prove the following relations involving replenishments and uniformization terms for all x ∈ X such that x + ei+ ej∈ X :

Rk(x + ei+ ej, t) + Uk(x + ei+ ej, t) + Rk(x, t) + Uk(x, t)

≥ Rk(x + ei, t) + Uk(x + ei, t) + Rk(x + ej, t) + Uk(x + ej, t), k ∈ I, (25)

and

UD(x + ei+ ej, t) + UD(x, t) ≥ UD(x + ei, t) + UD(x + ej, t). (26)

Then (22) follows from (20), (23), (24), (25), (26), and supermodularity of the reward function r.

Demands. Consider a demand of an arbitrary and fixed subset A ⊂ I, A 6= ∅, and assume that λA> 0 (the claim is trivial otherwise). Denote by x0 = x +Pa∈A1{xa<Sa}ea. We have 3 different

cases:

• Case 1: i 6∈ A and j 6∈ A. Then (23) is equivalent to:

vt(x0+ ei+ ej) + vt(x0) ≥ vt(x0+ ei) + vt(x0+ ej),

which follows by induction hypothesis for state x0.

• Case 2: i ∈ A and j 6∈ A (case i 6∈ A and j ∈ A is symmetric). Then we have: vt(x0+ 1{xi<Si−1}ei+ ej) + vt(x0) ≥ vt(x0+ 1{xi<Si−1}ei) + vt(x0+ ej).

If xi = Si− 1, then this trivially holds. If xi< Si− 1, then the above relation follows by the

induction hypothesis for state x0. • Case 3: i ∈ A and j ∈ A. Then we have:

vt(x0+ 1{xi<Si−1}ei+ 1{xj<Sj−1}ej) + vt(x 0_{) ≥ v}

t(x0+ 1{xi<Si−1}ei) + vt(x + 1{xj<Sj−1}ej).

If at least one of xi= Si− 1 or xj= Sj− 1 holds, then this is trivially satisfied. If xi< Si− 1

and xj< Sj− 1, then the above relation follows by the induction hypothesis for state x0.

Replenishments. Consider item k ∈ I. We have 2 different cases: • Case 1: k 6∈ {i, j}. Then (25) is equivalent to:

µk(xk)vt(x + ei+ ej− 1{xk>0}ek) + (γk− µk(xk))vt(x + ei+ ej)

+ µk(xk)vt(x − 1{xk>0}ek) + (γk− µk(xk))vt(x)

≥ µk(xk)vt(x + ei− 1{xk>0}ek) + (γk− µk(xk))vt(x + ei)

(20)

By the induction hypothesis for state x − 1{xk>0}ek:

vt(x + ei+ ej− 1{xk>0}ek) + vt(x − 1{xk>0}ek)

≥ vt(x + ei− 1{xk>0}ek) + vt(x + ej− 1{xk>0}ek),

and by the induction hypothesis for state x:

vt(x + ei+ ej) + vt(x) ≥ vt(x + ei) + vt(x + ej),

so (27) holds.

• Case 2: k = i (k = j is symmetric). Then (25) becomes:

µi(xi+ 1)vt(x + ej) + (γi− µi(xi+ 1))vt(x + ei+ ej)

+ µi(xi)vt(x − 1{xi>0}ei) + (γi− µi(xi))vt(x)

≥ µi(xi+ 1)vt(x) + (γi− µi(xi+ 1))vt(x + ei)

+ µi(xi)vt(x + ej− 1{xi>0}ei) + (γi− µi(xi))vt(x + ej).

This is equivalent to:

(γi− µi(xi+ 1)) (vt(x + ei+ ej) + vt(x) − vt(x + ei) − vt(x + ej) | {z } I ) + µi(xi)(vt(x + ej) + vt(x − 1{xi>0}ei) − vt(x) − vt(x + ej− 1{xi>0}ei) | {z } II ) ≥ 0.

For part I, by the induction hypothesis for state x:

vt(x + ei+ ej) + vt(x) − vt(x + ei) − vt(x + ej) ≥ 0.

For part II:

– If xi> 0, then by the induction hypothesis for state x − ei:

vt(x + ej) + vt(x − ei) − vt(x) − vt(x + ej− ei) ≥ 0.

– If xi= 0, then:

vt(x + ej) + vt(x) − vt(x) − vt(x + ej) = 0.

Uniformization terms. Assume P

∅6=A⊂I(|A| − 1)λA > 0 (the claim is trivial otherwise). Then

(26) is equivalent to:

vt(x + ei+ ej) + vt(x) ≥ vt(x + ei) + vt(x + ej), (28)

which is satisfied by the induction hypothesis for state x.

It is easy to check that the proof remains valid if we invert the inequalities. Thus, the ATO-POS model is also monotone with respect to submodularity:

Proposition 4. If the reward function r is submodular, then for all t ≥ 0, t-period reward vt is

(21)

Examples. We give now some examples of supermodular reward functions.

Order fill rates for individual demand streams. Denote by OFRA : X → {0, 1} the

order fill rate for the demand stream for set A ⊂ I, A 6= ∅: OFRA(x) =

Y

i∈A

1{xi<Si}, x ∈ X . (29)

Lemma 1. Reward function OFRA is supermodular for all A ⊂ I, A 6= ∅.

Proof. Let A ⊂ I, A 6= ∅, and i, j ∈ I, i 6= j. We need to show that for all x ∈ X such that x + ei+ ej∈ X (see Proposition 8):

OFRA(x + ei+ ej) + OFRA(x) ≥ OFRA(x + ei) + OFRA(x + ej). (30)

Suppose first that i, j ∈ A. We have 4 different cases:

• There is a k ∈ A\{i, j} such that xk = Sk. Then OFRA(x + ei + ej) = OFRA(x) =

OFRA(x + ei) = OFRA(x + ej) = 0, and relation (30) clearly holds.

• For all k ∈ A\{i, j}, xk< Sk, xi = Si− 1 and xj = Sj− 1. Then OFRA(x) = 1 and all the

other terms are equal to 0, thus relation (30) holds.

• For all k ∈ A\{i, j}, xk < Sk, xi = Si− 1 and xj < Sj− 1 (xi < Si− 1 and xj = Sj− 1 is

symmetrical). Then OFRA(x) = OFRA(x + ej) = 1 and the other two terms are equal to 0,

so the both sides of relation (30) are equal to 1.

• Finally, if xk < Sk, ∀k ∈ A\{i, j}, xi< Si− 1, and xj< Sj− 1, then the both sides of (30)

are equal to 2.

If i 6∈ A, then OFRA(x + ei+ ej) = OFRA(x + ej) and OFRA(x) = OFRA(x + ei) so (30) clearly

holds.

Aggregated order fill rate. Denote by OFR : X → {0, 1} the aggregate order fill rate: OFR(x) = 1

λ X

∅6=A⊂I

λAOFRA(x), x ∈ X . (31)

Lemma 2. The aggregated order fill rate OFR is supermodular.

Proof. A linear combination of supermodular functions is supermodular.

Since the proofs for other reward functions are straightforward analogs to the proofs of Lemma 1 and 2, further proofs are omitted. Other supermodular reward functions, interesting in the framework of ATO systems, are:

• Item fill rates for individual items. Denote by IFRi: X → {0, 1} the item fill rate for

item i ∈ I:

IFRi(x) = 1xi<Si, x ∈ X (32)

• Aggregated item fill rate. Denote by IFR : X → {0, 1} the aggregated item fill rate: IFR(x) =X

i∈I

fi1xi<Si, x ∈ X, (33)

where fi≥ 0 is the weight factor for item i,Pifi= 1. This would be an interesting reward

function when the availability of the items is not equally important, and the experience of the customers is not solely determined by the availability of all items.

(22)

• Average amount of backorders/queue length for individual items. Denote by BOi:

X → R+ _{the average amount of backorders for item i ∈ I:}

BOi(x) = (xi− Si)+, x ∈ X (34)

This would be an interesting reward function for the right portion of the framework, ATO systems with backordering (A3 and B3).

• Aggregate amount of backorders/queue length. Denote by BO : X → R+ _the

aggre-gate average amount of backorders: BO(x) =X

i∈I

fi(xi− Si)+, x ∈ X, (35)

where fi ≥ 0 is the weight factor for item i,P_ifi= 1. Also this reward function is interesting

for the right portion of the framework, ATO systems with backordering (A3 and B3).

5.4 Further discussion and extensions.

In Section 5.3, we proved a monotonicity property with respect to supermodularity (Proposition 3) or submodularity (Proposition 4) for the ATO-POS model (A2). We consider here the possible extensions of these results to the individual return model with backordering (A3) and TOS (A1). For the lower part (B) of the framework, supermodularity is not well-defined. The monotonicity property shown for the ATO-POS model therefore does not make sense for the models in this part. However, the precedence relation method combined with aggregation can be used to prove bounds for the performance of the models, as is shown in Section 6.2.

Backordering (A3). Assume now that demands that cannot be immediately fulfilled are back-ordered. More precisely, the items that are available are sent immediately to the customer (or kept apart as reserved) and the missing items are sent as soon as they become available. The state-space of this model, referred to as ATO-B in the following, is:

X = {x = (x1, . . . , xI) | xi≥ 0, i ∈ I} ,

where xi denotes the number of items of type i in replenishment, as before.

Returns remain the same as in the ATO-POS model, while demands become simpler as we no longer have the boundary effect at xi= Si for i ∈ I. The operator dAbecomes:

dA(x) = x +

X

i∈A

ei.

In this case, similar results as in propositions 3 and 4 hold for the ATO-B model:

Proposition 5. Let X be a Markov chain of the ATO-B model. If the reward function r is supermodular (resp. submodular), then for all t ≥ 0, t-period reward vt is also a supermodular

(resp. submodular) function.

Proof. The proof is similar to the proof of Propositions 3. The models differ only in demands, so we only need to prove the analog to relation (23), for any subset A ⊂ I, A 6= ∅, such that λA> 0.

As there is no boundary effect for demands, we no longer need to consider three different cases. Denote x0= x +P

a∈Aea. Then for the ATO-B model relation (23) is equivalent to:

vt(x0+ ei+ ej) + vt(x0) ≥ vt(x0+ ei) + vt(x0+ ej),

(23)

TOS (A1). Unfortunately, supermodularity does not hold for ATO systems with TOS. The primary reason for this is that whenever one item is out-of-stock this influences the demand for other items.

To illustrate this, we now present a small example of this fact; see Table 5.4. In this example, we compare two situations, both including three items with S = {2, 2, 2}. Situation 1 has more coupling in demands than situation 2, while the aggregate demand for each item remains the same. Specifically in situation 2, the demand for set {1, 3} is partly split into separate demands for item {1} and item {3}. This decoupling leads to decreased service for sets {1, 3} and {1, 2, 3} (see Table 5.4), since the stock levels for items 1 and 3 are typically lower in situation 2. The reason for this is that the unavailability of one item does not lead to a lost demand for the other item as frequently. However, since the service level for set {1, 2, 3} is also lower in situation 2, the availability of item 2 will be higher. Therefore, the fill rate for set {2} will be higher (see Table 5.4). While in the other models (ATO-POS and ATO-B) the decoupling of demand leads to a situation where either the service levels decrease or stay equal, decoupling in the ATO-TOS model can also lead to an increase in service for some demand streams.

Input Situation 1 Situation 2 µ 1 1 λ1 0 0.3 λ2 1 1 λ3 0 0.3 λ1,2 0 0 λ1,3 0.5 0.2 λ2,3 0 0 λ1,2,3 0.5 0.5 Output Situation 1 Situation 2 OFR1 - 0.8549 OFR2 0.7132 0.7159 OFR3 - 0.8549 OFR1,3 0.7662 0.7143 OFR1,2,3 0.5521 0.5362 OFR 0.6862 0.7153 Table 2: Counterexample for ATO-TOS model.

6 Applications.

In this section we illustrate our analytical technique - applying precedence relations on sets of states - on two applications. In Section 6.1, we consider ATO-POS model and we prove a monotonic-ity property with respect to the amount of coupled demands, valid for any supermodular reward function. This result has been previously established in Dayanik et al. [8] for capacitated (i.e. single server) ATO-POS model and for order fill rates. We extend here their result to the more general case of state dependent replenishment times (this covers many standard assumptions, such as capacitated server, ample server, or many server case) and for general supermodular rewards. In Section 6.2, we consider the service tool problem, introduced by Vliegen and Van Houtum [45]. We apply our generalized precedence relation method together with aggregation to establish a formal proof of bounds conjectured in [45] based on numerical experiments.

(24)

6.1 ATO-POS and coupling in demands.

Let X be a Markov chain of the ATO-POS model as described in Section 5.2. Consider a new ATO-POS model Y that is obtained from X by modifying its demand rates in order to decrease the coupling in demands. More formally, for each A ⊂ I such that |A| > 1 let A be a constant

0 ≤ A≤ λA. We can define the chain Y by modifying the demand rates to:

λ0_A = λA− A,

λ0_i = λi+

X

{A⊂I | |A|>1, i∈A}

A.

We have the following monotonicity property of ATO-POS model with respect to the amount of coupling in the system:

Proposition 6. For any supermodular reward function, the chain Y gives lower bounds for mean cumulated and stationary rewards of the chain X.

Proof. Chain Y is obtained from X by replacing the demands of sets of items by individual demands. To describe this transformation formally, recall that for each state x ∈ X and each A ⊂ I, A 6= ∅, dA(x) = x +Pi∈A1{xi<Si}ei denotes the state after a demand of set A. For each

x ∈ X and each A ⊂ I such that |A| > 1, we will define probability vectors αx,A and βx,A as

follows:

αx,A(z) =

1

|A| δdA(x)(z) + (|A| − 1)δx(z) , βx,A(z) =

1 |A|

X

i∈A

δd{i}(x)(z), z ∈ X ,

and the weight factor ∆x,A = A. Let P be the transition matrix of the original chain X. The

transition matrix P0 _{of the chain Y is then given by:}

P0= P +X

x∈X

X

A⊂I, |A|>1

|A|Tαx,A,βx,A(x).

Let r : X → R be any supermodular reward function and denote by vtits cumulative t-period

reward, defined by relation (19). Similarly, denote by wtthe t-period rewards for the chain Y . If

we show that for all x ∈ X and for all A ⊂ I such that |A| > 1, functions vtsatisfy:

X z∈X αx,A(z)vt(z) ≥ X z∈X βx,A(z)vt(z), t ≥ 0, (36)

then by Theorem 2 it follows that:

vt(x) ≥ wt(x), x ∈ X , t ≥ 0. (37)

For x ∈ X and A ⊂ I such that |A| > 1, relation (36) is equivalent to: vt(x + X i∈A 1{xi<Si}ei) + (|A| − 1)vt(x) ≥ X i∈A vt(x + 1{xi<Si}ei), t ≥ 0. (38)

By Proposition 3, for any t ≥ 0, vt is a supermodular function. From Proposition 7 3 in

Appendix A, it follows that for all ∅ 6= A ⊂ I and x ∈ X such that x +P

i∈Aei ∈ X : vt(x + X i∈A ei) + (|A| − 1)vt(x) ≥ X i∈A vt(x + ei), t ≥ 0. (39)

(25)

00 00 11 11 00 11 0011 00 00 11 11 00 00 11 11 0 0 1 1 00 00 11 11 0 0 1 1 00 00 11 11 00 11 0 0 1 1 00 00 11 11 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 00 11 00 11 00 11 00 11 00 11 x1 S1 S2 x2 µ1(2) µ1(1) λ2+ λ1,2 λ2+ λ1,2 λ1+ λ1,2 µ2(4) λ1+ λ1,2 µ1(5) λ2+ λ1,2 µ1(3) µ2(2) µ2(2) λ1+ λ1,2

Figure 7: State space of ISA model.

In the extreme case, if we split all the joint demands into individual demands (i.e. if we take A = λA, for all A ⊂ I such that |A| > 1), we obtain the independent stock assumption (ISA)

model. In Figure 7 we give the state space and transitions for four different states in the 2-item case ISA model.

We can also split a demand of set A into demands for sets A1, . . . , AK such that A = ∪Kk=1Ak

and Ai∩ Aj = ∅, i 6= j. Again using supermodularity characterization (53) (Proposition 7 in

Appendix A) and the general precedence relation method, it can be easily proved that this is also a lower bound for the original model, which is tighter than the ISA model. This lower bound is also easier to solve than the original model as it allows us to decompose the problem into smaller subproblems.

More precisely, let I1, . . . , IK be any decomposition of the item set I (i.e. I = ∪Kk=1Ik and

Ii∩Ij= ∅, i 6= j). Then for any subset ∅ 6= A ⊂ I, we can define the sets Ak := A∩Ik, 1 ≤ k ≤ K.

In the bounding model, a demand of set A is decomposed into demands of sets A1, . . . , AK. (If

some of the sets Ak, 1 ≤ k ≤ K are empty sets, we can ignore them.) This bounding model can

then be decomposed into K subproblems where subproblem k involves only items in set Ik. These

bounds are generalization to the state dependent replenishment case of setwise bounds, derived in Dayanik et al. [8] for the capacitated replenishment case.

Remark 4. The results remain valid for the backordering case (A3) of our framework, by using Proposition 5 for the ATO-B model (instead of Proposition 3) in the proof of Proposition 6. We only modify a finite number of transitions per state, so the result holds even for unbounded super-modular functions. However, in the case of the ATO-B model, stability is no longer guaranteed by relation (18), so special care should be taken to check whether the mean stationary rewards are well defined for given model parameters, which will typically be implied by

µi(xi) > X i∈A λA, xi> ki, ∀i ∈ I, for some ki≥ 0, i ∈ I.

6.2 Service tools.

The service tool model was introduced by Vliegen and Van Houtum [45]. In our framework in Table 1, this model corresponds to the B2 case.

As the items that are delivered together return together, we need to keep track of the sets of items delivered together. Consider a very simple case of two different item types. Then the state of the system is given by a vector (n{1}, n{2}, n{1,2}) where n{1} (resp. n{2}) is the number of

(26)

items of type 1 (resp. 2) at the customer that were delivered individually, and n{1,2}is the number

of sets {1, 2} at the customers that were delivered together and that will return together. Given, for example stock levels S1 = 1, S2 = 2, all possible states of the system are: (0, 0, 0), (1, 0, 0),

(0, 1, 0), (1, 1, 0), (0, 0, 1), (0, 2, 0), (1, 2, 0) and (0, 1, 1). Note that if set {1, 2} is demanded, and item type 2 is out of stock, this becomes a demand for item type 1 (and similarly if item 1 is out of stock). The Markov chain for this case is given in Figure 8.

0,1,0 0,2,0 1,1,0 1,0,0 0,0,0 1,2,0 0,0,1 0,1,1

Figure 8: Markov chain for the service tools model for I = 2, S1 = 1 and S2= 2.

Markov chain description. For the general I-item case, the state of the system is given by a vector n = (nA)∅6=A⊂I where nA≥ 0 is the number of sets A at the customer that were delivered

together. For each i ∈ I, we denote by ξi(n) the total number of items of type i at the customer:

ξi(n) =

X

A⊂I

1{i∈A}nA. (40)

The state space of the model is then:

E =n = (nA)∅6=A⊂I : ξi(n) ≤ Si, ∀i ∈ I .

For each A ⊂ I, A 6= ∅, we will denote by eA the state in which all the components are equal to

0, except the component A that is equal to 1. We have the following transitions:

• Demands. For each subset ∅ 6= A ⊂ I, a demand of set A occurs with rate λA. For all i 6∈ A

the amount of items of type i at the customer stays the same. In state n ∈ E , we can deliver an item of type i only if ξi(n) < Si. The new state is described by operator φA: E → E :

φA(n) = n + e{i∈A : ξi(n)<Si}.

• Returns. In state n ∈ E, for each subset ∅ 6= A ⊂ I such that nA > 0, a return of set A

occurs with rate µnA. The new state is described by operator ψA: E → E :

ψA(n) = n − 1{nA>0}eA.

We will consider the uniformized chain with uniformization constant

ν = X ∅6=A⊂I |A|λA+ I X i=1 Si ! µ.