Robustness Properties for Simulations of Highly Reliable Systems

(1)

Robustness Properties for Simulations of Highly Reliable Systems

Bruno Tuffin

^∗

, Pierre L’Ecuyer

^†

, and Werner Sandmann

^‡

Abstract

Importance sampling (IS) is the primary technique for constructing reliable estimators in the context of rare-event simulation. The asymptotic robustness of IS estimators is often qualified by properties such as bounded relative error (BRE) and asymptotic optimality (AO). These properties guarantee that the estimator’s relative error remains bounded (or does not increase too fast) when the rare events becomes rarer. Other recently introduced characterizations of IS estimators are bounded normal approximation (BNA), bounded relative efficiency (BREff), and asymptotic good estimation of mean and variance.

In this paper we introduce three additional property named bounded relative error of empirical variance (BREEV), bounded relative efficiency of empirical variance (BREffEV), and asymptotic optimality of empirical variance (AOEV), which state that the empirical variance has itself the BRE, BREff and AO property, respectively, as an estimator of the true variance. We then study the hierarchy between all these different characterizations for a model of highly-reliable Markovian systems (HRMS) where the goal is to estimate the failure probability of the system. In this setting, we show that BRE, BREff and AO are equivalent, that BREffEV, BREEV and AOEV are also equivalent, and that these two properties are strictly stronger than all other properties just mentioned. We also obtain a necessary and sufficient condition for BREEV in terms of quantities that can be readily verified from the parameters of the model.

1 Introduction

Rare event simulation has received a lot of attention due to its frequent occurrence in areas such as reliability, telecommunications, finance, and insurance, among others [3, 11, 12]. In typical rare-event settings, Monte Carlo simulation is not viable unless special “acceleration” techniques are used to make the important rare events occur frequently enough for moderate sample sizes. The two main families of techniques for doing that are splitting [8, 13, 22] and importance sampling (IS) [3, 9, 11].

Asymptotic analysis of rare-event simulations is usually made in an asymptotic regime where rarity is controlled by a parameter ε > 0; the rare events become increasingly rare when ε → 0 and we are interested in asymptotic properties of a given (unbiased) estimator Y in the limit. (Some authors use a parametermthat goes to infinity instead, but this is equivalent; it suffice to takeε = 1/mto recover our framework.) Asymptotic characterizations of estimators in this setting include the widely-used concepts of bounded relative error (BRE) and asymptotic optimality (AO) [11, 12], as well as the lesser-known properties of bounded relative efficiency (BREff) [5], bounded normal approximation (BNA) and asymptotic good estimation of the mean (AGEM) and of the variance (AGEV) (also called probability and variance well- estimation) [19, 20].

BRE means that the relative error (the standard deviation divided by the mean) of the estimatorY =Y(ε) remains bounded whenε →0. AO requires that when the mean converges to zero exponentially fast inε, the standard deviation converges at the same exponential rate. In general, this is a weaker condition than BRE [11, 16]. BREff generalizes BRE by taking into account the computational time associated with the

∗IRISA/INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France, btuffin@irisa.fr

†DIRO, Université de Montreal, C.P. 6128, Succ. Centre-Ville, Montréal (Québec), Canada, H3C 3J7, lecuyer@iro.umontreal.ca

‡Otto-Friedrich-Universit¨at Bamberg, Feldkirchenstr. 21, D-96045 Bamberg, Germany, werner.sandmann@wiai.uni-bamberg.de

(2)

estimatorY, which may vary withε. BNA implies that if we approximate the distribution of the average of ni.i.d. copies ofY by the normal distribution (e.g., to compute a confidence interval), the quality of the approximation does not degrade whenε → 0. AGEM and AGEV have been defined in the context of estimating a probability in a HRMS, and basically mean that the sample paths that contribute the most to the estimator and its second moment are not rare under the sampling scheme that is examined. The main goals of splitting and IS, from the asymptotic viewpoint, is to design estimators that enjoy some (or all) of these properties when the original (crude or naive) estimator does not satisfy them.

An important difficulty often lurking around in rare-event simulation is that of estimating the variance of the mean estimator: reliable variance estimators are typically more difficult to obtain than reliable mean estimators, because the rare events have a stronger influence on the variance than on the mean. Variance estimators are important because we need them to assess the accuracy of our mean estimators, e.g., via confidence intervals. They are also frequently used when we compare the efficiencies of alternative mean estimators; poor variance estimators can easily yield misleading results in this context. This motivates our introduction of three additional characterizations of estimators: bounded relative error of empirical variance (BREEV), bounded relative efficiency of empirical variance (BREffEV), and asymptotically optimal empiri- cal variance (AOEV). BREEV means that the empirical variance has the BRE property while AOEV means that it has the AO property.

In this paper, we focus on IS and its application to an important HRMS model studied by several authors [4, 10, 11, 14, 15, 17, 19, 20], and used for reliability analysis of computer and telecommunication systems.

In this model, a smaller value of the rarity parameterεimplies a smaller failure rate for the system’s components, and we want to estimate the probability that the system reaches a “failed” state before it returns to a state where all the components are operational. This probability converges to 0 whenε→0.

In general, IS consists in simulating the original model with different (carefully selected) probability laws for its input random variables, and counter-balancing the bias caused by this change of measure with a weight called the likelihood ratio. For the HRMS model, we actually simulate a discrete-time Markov chain whose transitions correspond to failures and repairs of individual components and IS generally increases [decreases] the probabilities of the failure [repair] transitions.

For this particular HRMS model, specific conditions on the model parameters and IS probabilities have been obtained for the BRE property [15], for BNA [19, 20], and for AGEM and AGEV [20]. It is also shown in [20] that BNA implies AGEV, which implies BRE, which implies AGEM, which implies BRE, and that for each implication the converse is not true. In this paper we extend this hierarchy to incorporate AO, BREEV, and AOEV. We show that in our context, BRE, BREff and AO are equivalent, BREEV, BREffEV and AOEV are equivalent, and the latter three properties are strictly stronger than all the others. We also obtain a necessary and sufficient condition on the model parameters and the IS measure for BREEV, BREffEV and AOEV to hold.

The remainder of the paper is organized as follows. In Section 2, we give formal definitions of the asymptotic characterizations discussed so far: BRE, AO, BREff, BNA, AGEV, AGEM, BREEV, BREffEV, and AOEV, in a general rare-event framework. In Section 3, we recall the basic definition of IS in its general form. In Section 4, we describe the HRMS model and how IS is applied to this model. Section 5 is devoted studying to asymptotic robustness properties in the HRMS context. We establish a complete hierarchy between these properties and derive easily verifiable conditions for BREEV, BREffEV and AOEV.

Finally, in Section 6, we conclude and highlight perspectives for further research.

The following notation is used all along the paper. For a functionf : (0,∞)→R, we say thatf(ε) = o(ε^d)iff(ε)/ε^d→0asε→0;f(ε) =O(ε^d)if|f(ε)| ≤c1ε^dfor some constantc1>0for allεsufficiently small;f(ε) =O(ε^d)if|f(ε)| ≥c2ε^dfor some constantc2>0for allεsufficiently small; andf(ε) = Θ(ε^d) iff(ε) =O(ε^d)andf(ε) =O(ε^d).

2 Asymptotic Robustness Properties in a Rare-Event Setting

Rare-event framework. We want to estimate a positive valueγ=γ(ε)that depends on a rarity param- eterε >0. We assume thatγis a monotone (strictly) increasing function ofεand thatlimε→0⁺γ(ε) = 0.

(3)

We have at our disposal a family of estimatorsY =Y(ε)such thatE[Y(ε)] =γ(ε)for eachε >0. Recall that the variance and relative error ofY(ε)are defined by

σ²() = Var[Y(ε)] =E[(Y(ε)−γ(ε))²] and

RE[Y(ε)] = (Var[Y(ε)])^1/2/γ(ε).

In applications,γ(ε)is usually a performance measure in the model, defined as a mathematical expecta- tion, and some model parameters are defined as functions ofε. For example, in queuing systems, the service time and inter-arrival time distributions and the buffer sizes might depend onε, while in Markovian reliability models, the failure rates and repair rates might be functions ofε. The convergenceγ(ε) →0can be exponential, polynomial, etc.; this depend on the application and how the model is parameterized. Note that in all cases, the limit asγ(ε)→0⁺is the same as the limit asε→0⁺, because of the strict monotonicity.

We now define several properties that the family of estimators{Y(ε), ε >0}can have. In these definitions (and elsewhere) we use the shorthand notationY(ε)to refer to this family (a slight abuse of notation).

We write “→0” to mean “→0⁺.” In typical rare-event settings, these properties do not hold for the naive Monte Carlo estimators and the aim is to construct alternative unbiased estimators (e.g., via IS or other methods) for which they hold.

Bounded relative error.

Definition 1 (BRE) The estimatorY(ε)has the BRE property if lim sup

ε→0RE[Y()]<∞. (1)

When computing a confidence interval onγ(ε)based on i.i.d. replications onY(ε)and the (classical) central-limit theorem, for a fixed confidence level, the width of the confidence interval is (approximately) proportional to the standard deviationσ(ε). The BRE property means that this width decreases at least as fast asγ(ε)whenε→0.

Asymptotic optimality. For several rare-event applications where γ(ε) decreases exponentially fast (e.g., in queueing and finance), it has not been possible to find practical BRE estimators so far, but estimators with the (weaker) AO property have been constructed by exploiting the theory of large deviations [1, 7, 11, 12, 18]. AO means that whenγ²(ε)converges to zero exponentially fast, the second moment E[Y²(ε)]also converges exponentially fast and at the same exponential rate. This is the best possible rate; it cannot converge at a faster rate because we always haveE[Y²(ε)]−γ²(ε) =σ²(ε)≥0.

Definition 2 (AO) The estimatorY(ε)is AO if

ε→0lim

lnE[Y²(ε)]

lnγ(ε) = 2. (2)

AO is generally weaker than BRE [11, 16]. But there are situations where the two are equivalent; this is what will happen in our HRMS setup in Section 4. The following examples illustrate the two possibilities.

Example 1 Suppose that γ(ε) = exp[−k/] for some constant k and that our estimator has σ²(ε) = q(1/ε) exp[−2k/ε]for some polynomial functionq. Then, the AO property is easily verified, whereas BRE does not hold becauseRE²[Y(ε)] =q(1/ε)→ ∞whenε→0.

Example 2 Suppose now thatγ²(ε) = q1(ε) = ε^t¹ +o(ε^t¹) andE[Y²(ε)] = q2(ε) = ε^t² +o(ε^t²).

That is, both converge to 0 at a polynomial rate. Clearly,t2 ≤ t1, becauseE[Y²(ε)]−γ²(ε) ≥ 0. We

(4)

have BRE if and only if (iff)q2(ε)/q1(ε)remains bounded whenε → 0, ifft2 = t1. On the other hand,

−lnq1(ε) =−ln(ε^t¹(1 +o(1))) =−t1ln(ε)−ln(1 +o(1))and similarly forq2(ε)andt2. Then,

ε→0lim

lnE[Y²(ε)]

lnγ(ε) = lim

ε→0

t2lnε

(t1/2) lnε = 2t2

t1

.

Thus, AO holds ifft2=t1, which means that BRE and AO are equivalent in this case.

Bounded relative efficiency.

Definition 3 (BREff) Lett(ε)be the expected computational time to generate the estimatorY(ε), whose variance isσ²(ε). The relative efficiency ofY(ε)is defined by

REff[Y(ε)] = γ(ε)²

σ²(ε)t(ε) = 1 RE²[Y(ε)]t(ε).

We will say thatY(ε)has bounded relative efficiency (BREff) iflim infε→0REff[Y(ε)]>0.

BREff basically looks at the BRE property, but for a given computational budget. Indeed, the computation time may vary withε; this has to be encompassed in the BRE property.

Example 3 Ift(ε) = Θ(1), then BREff and BRE are equivalent properties.

Example 4 In [5], an example with BREff but without BRE is exhibited. In that example,t(ε) =O(ε)but RE[Y(ε)] =O(ε⁻¹). Conversely, we might have examples such thatt(ε) =O(ε⁻¹)andRE[Y(ε)] = Θ(1) so that BRE is verified, but not BREff.

Bounded normal approximation. We mentioned earlier the computation of a confidence interval on γ(ε)based on the central-limit theorem. This type of confidence interval is reliable if the sample average has approximately the normal distribution, so it is relevant to examine the quality of this normal approximation whenε→0. An error bound for this approximation is provided by the following version of the Berry-Esseen theorem [2]:

Theorem 1 (Berry-Esseen) LetY1, . . . , Ynbe i.i.d. random variables with mean 0, varianceσ², and third absolute momentβ3 =E[|Y1|³]. LetY¯nandS²_nbe the empirical mean and variance ofY1, . . . , Yn, and let Fndenote the distribution function of the standardized sum (or Student statistic)

S_n^∗ =√

nY¯n/Sn.

Then, there is an absolute constanta <∞such that for allx∈Rand alln≥2,

|Fn(x)−Φ(x)| ≤ aβ3

σ³√n,

whereΦis the standard normal distribution function. The classical result usually hasσin place ofSnin the definition ofS_n^∗[6]; in that case one can takea= 0.8[21].

This result motivated the introduction of the BNA property in [19], which requires that the Berry-Esseen bound remainsO(n^−1/2)whenε→0.

Definition 4 (BNA) The estimatorY(ε)is said to have the BNA property if lim sup

ε→0

E

|Y(ε)−γ(ε)|³

σ³(ε) <∞. (3)

(5)

The BNA property implies that√

n|Fn(x)−Φ(x)|remains bounded as a function of ε, i.e., that the approximation error ofFn by the normal distribution remains inO(n^−1/2). The reverse is not necessarily true, however. Perhaps it could seem more natural to define the BNA property as meaning that√n|Fn(x)− Φ(x)|remains bounded, but we keep Definition 4 because it has already been adopted in several papers and because it is often easier to obtain necessary and sufficient conditions for BNA with this definition.

If a confidence interval of level1−αis obtained using the normal distribution while the true distribution isFn, the error of coverage of the computed confidence interval does not exceed2 sup_x∈_R|Fn(x)−Φ(x)|. If that confidence interval is computed from an i.i.d. sampleY1(ε), . . . , Yn(ε)ofY(ε), BNA implies that the coverage error remains inO(n^−1/2)whenε→0, with a hidden constant that does not depend onε, so it is controlled.

Bounded relative error, bounded relative efficiency, and asymptotic optimality of the empir- ical variance. The next properties concern the stability of the empirical variance as an estimator of the true varianceσ²(ε). LetY1(ε), . . . , Yn(ε)be an i.i.d. sample ofY(ε), wheren≥2. The empirical mean and empirical variance areY¯n(ε) = (Y1(ε) +· · ·+Yn(ε))/nand

S²_n=S_n²(ε) = 1 n−1

n

X

i=1

(Yi(ε)−Y¯n(ε))².

When the variance and/or the relative error of an estimator are estimated by simulation in a rare-event setting, it happens frequently thatS_n²(ε)takes a very small value (orders of magnitude smaller than the true variance, because the important rare events did not happen) with large probability1−p(ε), and an extremely large value with very small probabilityp(ε), wherep(ε) → 0when ε → 0. This gross underestimation of the variance leads to wrong conclusions on the accuracy of the simulation, with high probability. This motivates the following definition.

Definition 5 (BREEV and AOEV) The estimatorY(ε)has the BREEV property if lim sup

ε→0RE[S_n²()]<∞. (4)

It has BREffEV property if

lim inf

ε→0REff[S_n²()]>0. (5)

It has the AOEV property if

ε→0lim

lnE[S_n⁴(ε)]

lnσ²(ε) = 2. (6)

A classical result states that Var[S_n²] = 1

n

E[(Y(ε)−E[Y(ε)])⁴]−n−3 n−1σ⁴

. (7)

Thus, the BREEV, BREffEV, and AOEV properties are linked with the fourth moment ofY(ε).

Asymptotic good estimation of the mean and of the variance. AGEM and AGEV are two additional robustness properties introduced in [20], under the name of “well estimated mean and variance,”

in the context of the application of IS to an HRMS model. Here we provide more general definitions of these properties. We assume thatY(ε)is a discrete random variable, which takes valueywith probability p(ε, y) = P[Y(ε) =y], fory∈R. We also assume that its mean and variance are polynomial functions of ε:γ(ε) = Θ(ε^t¹)andσ²(ε) = Θ(ε^t²)for some constantst1 ≥0andt2 ≥0. AGEM and AGEV state that the sample paths that contribute to the highest-order terms in these polynomial functions are not rare.

Definition 6 (AGEM and AGEV) The estimatorY(ε)has the AGEM property ifyp(ε, y) = Θ(^t¹)implies thatp(ε, y) = Θ(1)(or equivalently, thaty = Θ(^t¹)). It has the AGEV property if[y−γ(ε)]²p(ε, y) = Θ(^t²)implies thatp(ε, y) = Θ(1)(or equivalently, that[y−γ(ε)]²= Θ(^t²)).

(6)

These properties means that for the realizations y of Y that provide the leading contributions to the estimator, the contributions decrease only because of decreasing values ofy, and not because of decreasing probabilities. In a setting where IS is applied andY is the product of an indicator function by a likelihood ratio (this will be the case in Sections 4.2 and 5), this means that the value of the likelihood ratio when yp(ε, y)contributes to the leading term must converge at the same rate at this leading term whenε→0.

3 Importance Sampling

The aim of IS is to reduce the variance by simulating the model with different probability laws for its input random variables and correcting the estimator by a multiplicative weight called the likelihood ratio to recover an unbiased estimator. In rare-event simulation, the probability laws are changed so that the rare events of interest occur more frequently under the new probability measure. We briefly recall the basic definition of IS; for comprehensive overviews see, e.g., [3, 11, 12].

In a general measure theoretic setting, IS is based on the application of the Radon-Nikodym theorem, and the likelihood ratio corresponds to the Radon-Nikodym derivative. All applications of IS are special cases of this setting.

Consider two probability measuresPandP^∗on a measurable space(Ω,A), wherePis absolutely continuous with respect toP^∗, which means that for all A ∈ A,P^∗{A} = 0 ⇒ P{A} = 0. Then, the Radon-Nikodym theorem guarantees that forP-almost allω ∈ Ω, the Radon-Nikodym derivativeL(ω) = (dP/dP^∗)(ω)exists, and that

P{A}= Z

A

L(ω)dP^∗(ω) for allA∈ A.

In the context of IS,P^∗ is called the IS measure and we refer to the random variable L = L(ω) as the likelihood ratio. IfY = Y(ω) is a random variable defined on (Ω,A), and ifdP^∗(ω) > 0 whenever Y(ω)dP(ω)>0, then

E_P[Y] = Z

Y(ω)dP(ω) = Z

Y(ω)L(ω)dP^∗(ω) =E_P^∗[Y L].

As a special case, consider a discrete time Markov chain {Xj, j ≥ 0} with a discrete state space S, initial distributionµ over S, and probability transition matrix P. This defines a probability measure over the sample paths of the chain. We are interested in a random variable Y = g(X0, X1, . . . , Xτ) whereτ is a random stopping time andg is a real-valued function. Letµ^∗ be another initial distribution and let P^∗ be another probability transition matrix such that µ^∗(x0)Qτ

j=1P^∗(xj−1, xj) > 0 whenever g(x0, x1, . . . , xτ)µ(x0)Qτ

j=1P(xj−1, xj) > 0. Let P^∗ be the corresponding probability measure on the Markov chain trajectories. When the sample path is generated fromP^∗, the likelihood ratio that corresponds to a change from(µ,P)to(µ^∗,P^∗)and realization(X0, . . . , Xτ)is the random variable

L(ω) =L(X0, X1, . . . , Xτ) = µ(X0)Qτ

j=1P(Xj−1, Xj) µ^∗(X0)Qτ

j=1P^∗(Xj−1, Xj) ifµ^∗(X0)Qτ

j=1P^∗(Xj−1, Xj)6= 0, and0otherwise. Hence, E_P[g(X0, . . . , Xτ)]

=

∞

X

n=0

X

(x0,x1,...,xn)∈Sⁿ

1{τ=n}g(x0, x1, . . . , xn)µ(x0)

n

Y

j=1

P(xj−1, xj)

=

∞

X

n=0

X

(x0,x1,...,xn)∈Sⁿ

1{τ=n}g(x0, x1, . . . , xn)L(x0, x1, . . . , xn)µ^∗(x0)

n

Y

j=1

P^∗(xj−1, xj)

= E_P∗[g(X0, . . . , Xτ)L(X0, . . . , Xτ)].

(7)

Note thatP^∗(X0, . . . , Xτ) = 0is required only if g(X0, . . . , Xτ)P(X0, . . . , Xτ) = 0. An IS estimator generates a sample pathX0, . . . , Xτ usingP^∗and computes

Y =g(X0, X1, . . . , Xτ)L(X0, X1, . . . , Xτ) (8) as an estimator ofE_P[Y] =E_P[g(X0, X1, . . . , Xτ)].

4 Importance Sampling for a Highly Reliable Markovian System

4.1 The Model

We consider an HRMS withc types of components andni components of typei, for i = 1, . . . , c. Each component is either in a failed state or an operational state. The state of the system is represented by a vector x= (x⁽¹⁾, . . . , x^(c)), wherex⁽ⁱ⁾is the number of failed components of typei. Thus, we have a finite state spaceSof cardinality(n1+ 1)· · ·(nc+ 1). We suppose thatSis partitioned in two subsetsU andF, where U is a decreasing set (i.e., ifx∈ Uandx≥y ∈ S, theny ∈ U) that contains the state1= (0, . . . ,0)in which all the components are operational. We say thaty≺xwheny≤xandy6=x.

We assume that the times to failure and times to repair of the individual components are independent exponential random variables with respective rates

λi(x) =ai(x)ε^bⁱ^(x)=O(ε) and µi(x) = Θ(1)

for type-icomponents when the current state isx, whereai(x)is a strictly positive real number andbi(x) a strictly positive integer for eachi. The parameterε1represents the rarity of failures; the failure rates tend to zero whenε→0. Failure propagation is allowed: from statex, there is a probabilitypi(x, y)(which may depend onε) that the failure of a type-icomponent directly drives the system to statey, in which there could be additional component failures. Thus, the net jump rate fromxtoyis

λ(x, y) =

c

X

i=1

λi(x)pi(x, y) =O(ε).

Similarly, the repair rate from statexto stateyisµ(x, y)(with possible grouped repairs), whereµ(x, y)does not depend onε(i.e., repairs are not rare events when they are possible). The system starts in state1and we want to estimate the probabilityγ(ε)that it reaches the setFbefore returning to state1. Estimating this probability is relevant in many practical situations [11, 12].

This model evolves as a continuous-time Markov chain (CTMC) (Y(t), t ≥ 0}, where Y(t) is the system’s state at timet. Its canonically embedded discrete time Markov chain (DTMC) is{Xj, j ≥ 0}, defined byXj =Y(ξj)forj = 0,1,2, . . ., whereξ0= 0and0< ξ1< ξ2 <· · ·are the jump times of the CTMC. Since the quantity of interest here,γ(ε), does not depend on the jump times of the CTMC, it suffices to simulate the DTMC. This chain{Xj, j≥0}has transition probability matrixPwith elements

P(x, y) =P[Xj=y|Xj−1=x] =λ(x, y)/q(x) if the transition fromxtoycorresponds to a failure and

P(x, y) =µ(x, y)/q(x) if it corresponds to a repair, where

q(x) =X

y∈S

(λ(x, y) +µ(x, y))

is the total jump rate out ofx, for allx, yinS. We will usePto denote the corresponding measure on the sample paths of the DTMC.

(8)

LetΓdenote the set of pairs (x, y) ∈ S²for whichP(x, y) > 0. Our final assumptions are that the DTMC is irreducible onS and that for every state x ∈ S,x 6= 1, there exists a state y ≺ xsuch that (x, y)∈Γ(that is, at least one repairman is active whenever a component is failed).

Again, our goal is to estimateγ(ε) =P[τF < τ₁], whereτF = inf{j >0 :Xj ∈ F}andτ₁= inf{j >

0 :Xj =1}. It has been shown [17] that for this model, there is an integerr >0such thatγ(ε) = Θ(ε^r), i.e., the probability of interest decreases at a polynomial rate whenε→0.

4.2 IS for the HRMS Model

Naive Monte Carlo estimatesγ(ε)by simulating samples paths with the transition probability matrixPand counting the fraction of those paths for whichτF < τ₁. But sinceγ(ε) = Θ(ε^r), the relative error of this estimator increases toward infinity whenε→0and something else must be done to obtain a viable estimator.

Several IS schemes have been proposed in the literature for this HRMS model; see, e.g., [4, 15, 17]. Here we limit ourselves to the so-called simple failure biasing (SFB), also named Bias1. SFB changes the matrix Pto a new matrixP^∗defined as follows. For statesx∈ F ∪ {1}, we haveP^∗(x, y) =P(x, y)for ally∈ S, i.e., the transition probabilities are unchanged. For any other statex, a fixed probabilityρis assigned to the set of all failure transitions, and a probability1−ρis assigned to the set of all repair transitions. In each of these two subsets, the individual probabilities are taken proportionally to the original ones. Under certain additional assumptions, this change of measure increases the probability of failure when the system is up, in a way that failure transitions are no longer rare events, i.e.,P^∗[τF < τ₁] = Θ(1).

For a given sample path ending at stepτ= min(τF, τ₁), the likelihood ratio for this change of measure can be written as

L=L(X0, . . . , Xτ) = P[(X0, . . . , Xτ)]

P^∗[(X0, . . . , Xτ)] =

τ

Y

j=1

P(Xj−1, Xj) P^∗(Xj−1, Xj)

and the corresponding (unbiased) IS estimator ofγ(ε)is given by (8), with g(X0, . . . , Xτ) = 1{τF<τ₁}. Thus, the random variableY(ε)of Section 2 is

Y(ε) = 1{τF<τ₁}L(X0, . . . , Xτ). (9) We will now examine the robustness properties of this estimator under the SFB sampling.

5 Asymptotic Robustness Properties for the HRMS Model Under IS

A characterization of the IS schemes for the HRMS model that satisfy the BRE property was obtained in [15]. AO is weaker than BRE in general. However, our first result states that for the HRMS model, the two are equivalent. This was mentioned without proof in [11].

Theorem 2 In our HRMS framework, with SFB, AO and BREff are equivalent to BRE.

Proof. Recall thatγ(ε) = Θ(ε^r)for some integerr ≥ 0. It has also been shown in [19] that for this model,E[Y²(ε)] = Θ(ε^s)for somes ≤ 2r, whereY(ε)is defined in (9). Also t(ε) = Θ(1)for static changes of measure such as SFB. The equivalence between AO and BRE then follows from Example 2, and

the equivalence between BREff and BRE follows from Example 3.

A characterization of IS measures that satisfy BNA for the HRMS model is given in [19, 20] and the following relationships between measures of robustness was proved in [20]:

Theorem 3 In our HRMS framework, BNA implies AGEV, which implies BRE, which implies AGEM. For each of these implications, the converse is not true.

(9)

Our next results characterize the BREEV and AOEV in the HRMS framework. They require additional notation. We will restrict our change of measure for IS to a classIof measuresP^∗ defined by a transition probability matrixP^∗with the following properties: whenever(x, y)∈ΓandP(x, y) = Θ(ε^d), ifyx6= 1, thenP^∗(x, y) =O(ε^d−1), whereas ifxyor ifyx=1, thenP^∗(x, y) =O(ε^d). This classIwas introduced in [19]; these measures increase the probability of each failure transition from a statex6=1and satisfy the assumptions of Lemma 1 of [19] (we will use this lemma to prove our next results). From now on, we assume thatP^∗satisfies these properties.

We define the following sets of sample paths:

∆m = {(x0,· · ·, xn) : n≥1, x0=1, xn∈ F, xj 6∈ {1, F}and(xj−1, xj)∈Γfor1≤j≤n, andP{(X0,· · ·, Xτ) = (x0,· · ·, xn)}= Θ(ε^m)};

∆m,k = {(x0,· · ·, xn)∈∆m : P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}= Θ(ε^k)};

∆⁰_t = [

{m,k:m−k=t}

∆m,k;

and letsbe the integer such thatσ_P²^∗(ε) = Θ(ε^s).

A necessary and sufficient condition on P^∗ for BREEV is as follows. This result means that a path cannot be too rare under the IS measureP^∗to verify BREEV. Similar results were obtained under the same conditions for BRE in [15], where it is shown thatk≤2m−ris needed when∆m,k 6=∅, and for BNA in [19, 20], where the necessary and sufficient condition isk≤3m/2−3s/4.

Theorem 4 For an IS measureP^∗ ∈ I, we have BREEV if and only if for all integerskandmsuch that m−k < rand all(x0,· · ·, xn)∈∆m,k,

P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}=O(ε^4m/3−2s/3).

In other words, we must havek≤4m/3−2s/3whenever∆m,k6=∅.

Proof. (a) Necessary condition. Suppose that there existk, m∈ Nand(x0,· · ·, xn)∈ ∆m,ksuch that k= 4m/3−2s/3 +k⁰withk⁰>0andm−k < r. This means thatP^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}= Θ(ε4m/3−2s/3+k⁰). Then we have

E[(Y(ε)−γ(ε))⁴] ≥ [L(x0,· · ·, xn)−γ(ε)]⁴P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}

= Θ(ε^4(m−k)+k)

= Θ(ε^2s−3k⁰).

ThusE[(Y(ε)−γ(ε))⁴]/σ⁴_P^∗(ε) =O(ε^−2k⁰), which is unbounded whenε→0.

(b) Sufficient condition. Let(x0,· · ·, xn)∈∆⁰_tsuch thatt < r(i.e.,m−k < r). Since P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)} = O(ε^4m/3−2s/3)

for all(x0,· · ·, xn)∈∆m,kifm−k < r, we have

(L(x0,· · ·, xn)−γ(ε))⁴P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)} = Θ(ε^4m)

Θ(ε^4k)Θ(ε^k) =O(ε^2s).

Using the fact thatP

t<r|∆⁰_t|<∞and the first part of Lemma 1 of [19], we have X

t<r

X

(x0,···,xn)∈∆⁰t

(L(x0,· · ·, xn))⁴P^∗{(X0,· · ·, XτF) = (x0,· · ·, xn)}=O(ε^2s).

Since

∞

X

t=r

X

(x0,···,xn)∈∆⁰t

P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)} ≤1

(10)

and X

(x0,···,xn)∈∆⁰t

P^∗{(X0,· · ·, XτF) = (x0,· · ·, xn)} ≤1for allt, Lemma 1 of [19] implies that

∞

X

t=r

X

(x0,···,xn)∈∆⁰t

(L(x0,· · ·, xn)−γ(ε))⁴P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}

≤

∞

X

t=r

X

(x0,···,xn)∈∆⁰t

(γ(ε)⁴+ 4γ(ε)³κη^tε^t+ 6γ²(ε)κ²η^2tε^2t+ 4γ(ε)κ³η^3tε^3t+κ⁴η^4tε^4t)· P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}

= γ⁴(ε)

∞

X

t=r

X

(x0,···,xn)∈∆⁰t

P^∗{(X0,· · ·, XτF) = (x0,· · ·, xn)}

+

∞

X

t=r

(4γ³(ε)κη^tε^t+ 6γ²(ε)κ²η^2tε^2t+ 4γ(ε)κ³η^3tε^3t+κ⁴η^4tε^4t)· X

(x0,···,xn)∈∆⁰t

P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)}

≤ γ⁴+ 4γ³(ε)κ

∞

X

t=r

(ηε)^t+ 6γ²(ε)κ²

∞

X

t=r

(η²ε²)^t4γ(ε)κ³

∞

X

t=r

(η³ε³)^t+κ⁴

∞

X

t=r

(η⁴ε⁴)^t

= Θ(ε^4r) + Θ(ε^4r) + Θ(ε^4r) + Θ(ε^4r) + Θ(ε^4r)

= O(ε^2s),

because2r≥s.

Theorem 5 BREEV, BREffEV, and AOEV are equivalent.

Proof. This follows again directly from Example 2, using the fact thatσ²(ε) = Θ(ε^s)andE[S_n⁴(ε)] =

Θ(ε^t)witht≤2s, and from Example 3 sincet(ε) = Θ(1).

Next we show that BREEV and AOEV are the strongest properties in our list.

Theorem 6 BREEV implies BNA.

Proof. This is a direct consequence of the necessary and sufficient conditions over the paths for the BNA and BREEV properties. These conditions are that for allkandmsuch thatm−k < r, whenever

∆m,kis non-empty, we must have k ≤ 4m/3−2s/3 for BREEV andk ≤ 3m/2−3s/4for BNA. But 4m/3−2s/3 = 8/9(3m/2−3s/4), so the theorem is proved if we always have3m/2−3s/4 ≥0, i.e.,

2m≥s, which is true since2m≥2r≥s.

The following counter-example show that the converse is not true: there exist systems and IS measures P^∗for which BNA is verified but not BREEV.

Example 5 Consider the example of Figure 1, using SFB failure biasing as shown in Figure 2. The states where the system is down are colored in grey.

For this model, as it can be easily seen in Figure 1, r = 6and ∆6 is comprised of the single path (<2,2>, <1,2>, <0,2>). Moreover,s= 12and the sole path in∆such that

P²{(X0,· · ·, Xτ) = (x0,· · ·, xn)}

P^∗{(X0,· · ·, Xτ) = (x0,· · ·, xn)} = Θ(ε¹²)

is the path in∆6 for which Figure 2 shows that it isΘ(1)under probability measureP^∗. It can also be readily checked thatk≤3m/2−3s/4for all paths, meaning that BNA is verified.

However, the path (< 2,2 >, < 2,1 >, < 2,0 >) is in ∆m,k withm = 14and k = 12. Then 12 =k >4m/3−2s/3 = 32/3,so the necessary and sufficient condition of Theorem 4 is not verified.

(11)

0,2 2,2

2,1

0,0 1,1

0,1 1,2

² ⁶

≈1/2 ≈1/2

≈1/2

1/2 1/2

≈1

≈1 ⁴ ⁸

≈1/2

≈1

≈1/2 ⁴

≈1/2

≈1

1,0 2,0

¹²

⁴

Figure 1: A two-dimensional model with its transition probabilities.

2,2

2,1

0,0 1,1

0,1 1,2

≈ρ0

≈1/2 ≈1/2

≈1/2

1/2 1/2

≈1

1−ρ0

≈ρ0² ≈ρ0²

(1−ρ0)/2

≈1

≈1/2 ρ0/2

(1−ρ0)/2

1−ρ0

≈1

1,0 2,0

¹²

ρ0/2 0,2

Figure 2: A two-dimensional example with SFB transition probabilities.

6 Conclusions

We have extended the hiecharchy of robustness properties of IS estimators for an HRMS model by adding AO and the newly defined BREEV, BREffEV, and AOEV, that assert the stability of the relative size of the confidence interval for independent samples, as rarity increases. The complete hierarchy can be summarized as:

(BREEV⇔BREffEV⇔AOEV)⇒BNA⇒AGEV⇒(BRE⇔BREff⇔AO)⇒AGEM.

All these properties have some practical relevance and understanding the links between them is certainly of high interest. BREEV is the strongest, but it may be difficult to verify in some applications. A direction of future research is to study this hierarchy of properties in more general (or different) settings; for example in situations whereγ(ε)converges to zero exponentially fast. It is already known that AO is generally not equivalent to BRE in this case. What about the other implications in the hierarchy?

References

[1] S. Asmussen. Large deviations in rare events simulation: Examples, counterexamples, and alternatives.

In K.-T. Fang, F. J. Hickernell, and H. Niederreiter, editors, Monte Carlo and Quasi-Monte Carlo

(12)

Methods 2000, pages 1–9, Berlin, 2002. Springer-Verlag.

[2] V. Bentkus and F. G¨otze. The Berry-Esseen bound for Student’s statistic. The Annals of Probability, 24(1):491–503, 1996.

[3] J. A. Bucklew. Introduction to Rare Event Simulation. Springer-Verlag, New York, 2004.

[4] H. Cancela, G. Rubino, and B. Tuffin. MTTF estimation by Monte Carlo methods using Markov models. Monte Carlo Methods and Applications, 8(4):312–341, 2002.

[5] H. Cancela, G. Rubino, and B. Tuffin. New measures of robustness in rare event simulation. In M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, editors, Proceedings of the 2005 Winter Simulation Conference, pages 519–527, 2005.

[6] W. Feller. An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley, New York, second edition, 1971.

[7] P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer-Verlag, New York, 2004.

[8] P. Glasserman, P. Heidelberger, P. Shahabuddin, and T. Zajic. A large deviations perspective on the efficiency of multilevel splitting. IEEE Transactions on Automatic Control, AC-43(12):1666–1679, 1998.

[9] P. W. Glynn and D. L. Iglehart. Importance sampling for stochastic simulations. Management Science, 35:1367–1392, 1989.

[10] A. Goyal, P. Shahabuddin, P. Heidelberger, V. F. Nicola, and P. W. Glynn. A unified framework for simulating Markovian models of highly reliable systems. IEEE Transactions on Computers, C-41:36–

51, 1992.

[11] P. Heidelberger. Fast simulation of rare events in queueing and reliability models. ACM Transactions on Modeling and Computer Simulation, 5(1):43–85, 1995.

[12] S. Juneja and P. Shahabuddin. Rare event simulation techniques: An introduction and recent advances.

In S. G. Henderson and B. L. Nelson, editors, Stochastic Simulation, Handbooks of Operations Re- search and Management Science. Elsevier Science, circa 2006. chapter 11, to appear.

[13] P. L’Ecuyer, V. Demers, and B. Tuffin. Rare-events, splitting, and quasi-Monte Carlo. ACM Transac- tions on Modeling and Computer Simulation, 2006. submitted.

[14] E. E. Lewis and F. B¨ohm. Monte Carlo simulation of Markov unreliability models. Nuclear Engineer- ing and Design, 77:49–62, 1984.

[15] M. K. Nakayama. General conditions for bounded relative error in simulations of highly reliable Marko- vian systems. Advances in Applied Probability, 28:687–727, 1996.

[16] W. Sandmann. Relative error and asymptotic optimality in estimating rare event probabilities by im- portance sampling. In Proceedings of the OR Society Simulation Workshop (SW04) held in cooperation with the ACM SIGSIM, Birmingham, UK, March 23–24, 2004, pages 49–57. The Operational Research Society, 2004.

[17] P. Shahabuddin. Importance sampling for the simulation of highly reliable Markovian systems. Man- agement Science, 40(3):333–352, 1994.

[18] D. Siegmund. Importance sampling in the Monte Carlo study of sequential tests. The Annals of Statis- tics, 4:673–684, 1976.

[19] B. Tuffin. Bounded normal approximation in simulations of highly reliable Markovian systems. Journal of Applied Probability, 36(4):974–986, 1999.

[20] B. Tuffin. On numerical problems in simulations of highly reliable Markovian systems. In Proceedings of the 1st International Conference on Quantitative Evaluation of SysTems (QEST), pages 156–164, University of Twente, Enschede, The Netherlands, September 2004. IEEE CS Press.

[21] P. van Beek. An application of Fourier methods to the problem of sharpening the Berry-Esseen inequal- ity. Zur Wahrscheinlichkeitstheorie verwandte Gebiete, 23:187–196, 1972.

[22] M. Vill´en-Altamirano and J. Vill´en-Altamirano. On the efficiency of RESTART for multidimensional systems. ACM Transactions on Modeling and Computer Simulation, 16(3):251–279, 2006.