Software Reliability Modeling

(1)

HAL Id: hal-00853317

https://hal.archives-ouvertes.fr/hal-00853317

Submitted on 22 Aug 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Software Reliability Modeling

James Ledoux

To cite this version:

James Ledoux. Software Reliability Modeling. Hoang Pham. Handbook of Reliability Engineering, Springer-Verlag London, pp.213-234, 2003, Reliability, 1-85233-453-3. �hal-00853317�

(2)

Software Reliability Modeling

James LEDOUX

Centre de Math´ematiques INSA & IRMAR

20 Avenue des Buttes de Co¨esmes

35043 Rennes Cedex

France

July 27, 2002

1 Introduction

This chapter proposes an overview of some aspects of Software Reliability (SR) engineering. Most systems are now driven by software. So that, it is well-recognized that assessing reliabil-ity of software applications is a major issue in reliabilreliabil-ity engineering, particularly in terms of cost. But predicting software reliability is not easy. Perhaps the major difficulty is that we are concerned primarily with design faults which is a very different situation from that tackled by the conventional hardware theory. A fault (or bug ) refers to the manifestation in the code of a mistake made by the programmer or designer with respect to the specification of the software. Activation of a fault by an input value leads to an incorrect output. Detection of such an event corresponds to an occurrence of a software failure. Input values may be considered as arriving to the software randomly. So although software failure may be not generated stochastically, it may be detected in such a manner. Therefore, this justifies the use of stochastic models of the underlying random process that governs the software failures. We briefly recall in Section 2, basic concepts of stochastic modeling for reliability. Two approaches are used in SR modeling. The prevalent is the so-called black-box one, in which only the interactions of the software with

(3)

the environment are considered. Following [1] and [2], we use in Section 3 the self-exciting

point processes as basic tool to model the failure process. That enables an overview of most

of published SR models. A second approach, called the white-box one, incorporates in models, information on the structure of the software. This is presented in Section 4. Section 5 proposes basic techniques for calibrating black-box models. The last section tries to give an account for the current practices in SR modeling and to point out some challenging issues for future research.

Note that this chapter does not aspire to cover the whole topic of SR engineering. In partic-ular, we do not discuss: fault prevention, fault removal, fault tolerance which are three methods to achieve reliable software. We focus here on methods to forecast failure times. For a more complete view, we refer to [3], the handbooks [4] and [5]. We have used the two recent books [2] and [6] to prepare this chapter. We also recommend to read the short paper [7], that describes, in particular, the available software reliability toolkits (with additional relevant reference [8]). Fi-nally, the bibliography of the chapter gives a good account for journals which propose research and tutorial papers on SR.

2 Basic concepts of stochastic modeling

Reliability of a software is defined in [9] as a measure of the continuous delivery of the correct

service by the software under a specified environment. This is a measure of the time to failure.

2.1 Metrics with regard to the first failure

Metrics of the first time to failure of a system are standard from [10], [11] and are now recalled. The first failure time is a random variable (rv)

T

with distribution function

(4)

If

F

has a probability density function (pdf)

f

then we define the hazard rate of the rv

T

by

r

(

t

) =

_R

f

(

₍

t

_t

)

₎

; t

0 :

with

R

(

t

) = 1

F

(

t

) =

Pf

T > t

g. We will also use the term failure rate. Function

R

(

t

)

is

called the survivor function of the rv

T

. Hazard rate function is interpreted to be

r

(

t

)

dt

Pf

t < T

t

+

dt

j

T > t

g

Pfa failure occurs in

]

t;t

+

dt

]

given that no failure occurred up to time

t

g

Thus, the phenomenon of reliability growth (“wear-out”) may be represented by an decreasing (increasing) hazard rate.

When

F

is continuous, hazard rate function characterizes the probability distribution of

T

through the exponentiation formula

R

(

t

) = exp

Z t 0

r

(

s

)

ds

:

Finally, the mean time to failure, denoted by

MTTF

, is the expectation

E

[

T

]

of the waiting time of the first failure. Note that

E

[

T

]

is also

R +1

0

R

(

s

)

ds

.

A basic model for the nonnegative rv

T

is the Weibull distribution with parameters

; >

0

:

f

(

t

) =

t

1

exp

t

1

]0;+1[

(

t

)

R

(

t

) = exp

t

; r

(

t

) =

t

1

;

MTTF = 1

1= Z +1 0

u

1=

exp(

u

)

du:

Note that the hazard rate is increasing for

>

1

, decreasing for

<

1

, constant for

= 1

. For

= 1

we obtain the exponential model with parameter

.

2.2 Stochastic process of times of failure

The failure process can be thought of as a point process (pp), i.e. a sequence of rvs

(

T

_i

)

_i0

(5)

define the sequence of rv

X

_i

=

T

_i

T

_i 1 for

i

1

.

X

iis

i

th inter-failure time. We define the

counting process

N

(

)

associated with a pp by

N

(

t

) =

X

i0

1

]0;t]

(

T

i

) (

N

(0) = 0)

:

N

(

t

)

is the number of observed failures up to time

t

. A pp will refer to any of

(

T

i

)

,

(

X

i

)

or

N

(

)

. Standard metrics associated with a counting process are [11]:

– the mean value function:

M

(

t

) =

E

[

N

(

t

)]

– the rate of occurrence of failures at time

t

:

ROCOF(

t

) =

dM

_dt

(

t

)

.

In such a context, we define the (conditional) reliability function at time

t

0

by

R

t

(

s

) =

Pf

N

(

t

+

s

)

N

(

t

) = 0

j

N

(

t

)

;T

1

;::: ;T

N(t)

g

; s

0 :

This is a measure of the continuous delivery of correct service during the mission interval

]

t;t

+

s

]

. At time

t

=

T

i, this function is nothing else but the conditional survivor function of rv

X

i+1

=

T

i+1

T

i given

T

1

;::: ;T

i. This will be denoted by

R

i

(

s

)

. We also define the

(conditional) mean time to failure at time

t

,

MTTF(

t

)

, by

MTTF(

t

) =

Z +1 0

R

t

(

s

)

ds:

The mean time to failure at

t

=

T

_iwill also denoted by

MTTF

_iand is

E

[

X

_i+1 j

T

1

;::: ;T

i

]

.

During the operational life of a software, repairs are carried out when it fails to perform correctly. In such a case, time to repair, time to reboot the system and others factors affect the dependability of a product. Thus, we may define the software availability as a measure of the delivery correct service with respect to the alternation correct and incorrect service. Availability is highly dependent on the maintenance policies of the software. We do not go into further

(6)

details on dependability in operational phase. Indeed, we focus here on the reliability attribute of the software as most of the literature on software reliability modeling does. We refer to [4, Chap 2] for an account for dependability during the operational phase.

3 Black-box software reliability models

In this section, only dynamic models will be discussed. That is, we are only concerned with models which consider failure process as a stochastic process. In other words, time is an essen-tial component of the description of the models. On the other hand, static models are essenessen-tially capture-recapture models. For a good account for static models, we refer to [12, Chap 5], [6]. A recent evaluation of capture-recapture models in software engineering context is [13]. Our overview of dynamic models closely follows [1, Chap 2], [14], [2], [15]. We assume throughout this section that any corrective action is instantaneous and each detected fault is removed.

A basic way to represent time evolution in confidence in a software is as follows. At instant

0

, the first failure occurs at time

t

1 according a rv

X

1

=

T

1 with hazard rate

r

1. Given time

T

1

=

t

1, we observe a second failure at time

t

2 at rate

r

2. Function

r

2 is the hazard rate of the

inter-failure rv

X

2

=

T

2

T

1 given

T

1

=

t

1. Choice of

r

2 is based on the fact that one fault

was detected at time

t

1. At time

t

2, a third failure occurs at

t

3 with failure rate

r

3. Function

r

3 is the hazard rate of the rv

X

3

=

T

3

T

2 given

T

1

=

t

1

;T

2

=

t

2 and is selected according

to the “past” of the failure process at time

t

2: two observed failures at times

t

1 and

t

2. And

so on. It is expected that, due to a fault removal activity, confidence in the software’s ability to deliver a proper service will be improved during its lifecycle. Therefore, a basic model in SR has to capture a phenomenon of reliability growth. Reliability growth will basically follow

(7)

from a sequence of inequalities of the following form

r

i+1

(

t t

i

)

r

i

(

t

i

)

on

t

i (1)

and/or from selection of decreasing hazard rates

r

i

(

)

. We illustrate this “modeling process” on

the celebrated Jelinski-Moranda model (JM) [16]. We assume a priori that software includes only a finite number

N

of faults. The first hazard rate is

r

1

(

t

;

;N

) =

N

where

is some

nonnegative parameter. From time

T

1

=

t

1, a second failure occurs with the constant failure

rate

r

2

(

t

;

;N

) =

(

N

1)

,

:::

In a more formal setting, the two parameters

N

and

will be

encompassed in what we call a background history F

0, which is any background information

that we may have about the software. Then “the failure rate” of the software is represented by the function 8

t

0 ; r

C

(

t

;

F 0

) =

+1 X i=1

r

i

(

t T

i 1

;

F 0

)

1

[T i 1;Ti [

(

t

)

(2)

which is called the concatenated failure rate function in [2]. An appealing graphical display of a path of this stochastic function is in Figure 1 for (JM). We can rewrite (2) as

r

C

(

t

;

;N

)

N

0 t

1

t

2

t

3

t

Figure 1: Concatenated failure rate function for (JM)

(

t

;

F

0

;N

(

t

)

;T

1

;::: ;T

N(t)

) =

(

N N

(

t

))

(3)

Function

(

)

will be called the stochastic intensity of the pp

N

(

)

. We see that stochastic

(8)

of failure results in a failure rate whose value decreases of amount

. This suggests that no new fault is inserted during a corrective action and any bug contributes in the same manner to the “failure rate” of the software.

To go further, we replace our intuitive presentation in a stochastic modeling framework. Justification for what follows is that almost all published software reliability models can be interpreted in the foregoing framework. Specifically, this allows a complete overview of the stochastic properties of the panoply of available models without referring to their original pre-sentations.

3.1 Self-Exciting Point Processes

A slightly more formal presentation of the previous construction of the point process

(

T

i

)

would be: we have to specify all the conditional distributions

L

(

X

i jF

0

;T

i 1

;:::T

1

)

; i

1 :

The sequence of conditioningF 0

;

fF 0

;T

1

g

;

fF

0

;T

1

;T

2

g

;:::

should be thought of as the natural

or internal history on the pp at times

0 ;T

1

;T

2

;:::

respectively. So that, function

r

iis the hazard

rate of the conditional distributionL

(

X

i jF

0

;T

i 1

;::: ;T

1

)

, that is, when it has a pdf

r

i

(

t

;

F 0

;T

i 1

;::: ;T

1

) =

f

Xi jF 0;Ti 1;:::;T1

(

t

)

R

Xi jF 0;Ti 1;:::;T1

(

t

)

:

(4)

This leads to the following expression of the stochastic intensity

(

t

;

F 0

;N

(

t

)

;T

1

;::: ;T

N(t)

) =

+1 X i=1

f

XijF0;T i 1;:::;T1

(

t T

i 1

)

R

Xi jF 0;Ti 1;:::;T1

(

t T

i 1

)

1

[T i 1;Ti [

(

t

)

:

(5)

If we turn back to the (JM) model, we haveF 0

=

f

;N

gand

f

Xi jF 0;Ti 1;:::;T1

(

t

) =

(

N

(

i

1))exp(

(

N

(

i

1))

t

)

1

[0;+1[

(

t

)

:

(9)

Continuing in this way, this should lead to the martingale approach for analyzing pp, that es-sentially adheres to the concepts of compensator and stochastic intensity with respect to the internal history of the counting process. In particular, the left continuous version of the stochas-tic intensity defined in (5) may be thought of as the usual predictable intensity of a pp in the martingale point of view (see e.g. [17, Th11] and references therein). van Pul gives in [18] a good account for what can be done using the so-called dynamic approach of pp. We do not go into further details here. We prefer embrace the engineering point of view developed in [15].

It is clear from (5) that the stochastic intensity is excited by the history of the pp itself. Such stochastic processes are usually called a self-exciting point process (SEPP). What follows is from [1], [2], [15].

1. Ht

=

f

N

(

t

)

;T

1

;::: ;T

N(t)

gwill denote the internal history of

N

(

)

up to time

t

.

2.

N

(

)

is said to be conditionally orderly if for any

Q

t Ht, we have

Pf

N

(

t

+

dt

)

N

(

t

)

2

jF 0

;Q

t

g

=

Pf

N

(

t

+

dt

)

N

(

t

) = 1

jF 0

;Q

t

g

O

(

dt

)

:

where

O

(

dt

)

is some real-valued function such that

lim

dt!0

O

(

dt

) = 0

.

With

Q

t

=

;and Formula (6), we get Pf

N

(

t

+

dt

)

N

(

t

)

2

j F 0

g

=

o

(

dt

)

where

o

(

dt

)

is some real-valued function such that

lim

dt!0

o

(

dt

)

=dt

= 0

. This is the usual orderliness or

regular property of a pp [11]. Conditional orderliness is to be interpreted as saying that given

Q

t andF

0, as

dt

decreases to zero, the probability of at least two failures occurring in a time

interval of length

dt

tends to zero at a rate higher than the probability that exactly one failure in the same interval does.

(10)

1.

N

(

)

is conditionally orderly;

2. there exists a nonnegative function

(

;

F 0

;

Ht

)

such that Pf

N

(

t

+

dt

)

N

(

t

) = 1

jF 0

;

Htg

=

(

t

;

F 0

;

Ht

)

dt

+

o

(

dt

)

(6) and

E

(

t

;

F 0

;

Ht

)

<

+

1for any

t >

0

3. Pf

N

(0) = 0

jF 0 g

= 1

Function

is called the stochastic intensity of the SEPP.

We must think

(

t

;

F 0

;

Ht

)

of as a function ofF

0,

t

and

N

(

t

)

;T

1

;::: ;T

N

(t). Degree to which

(

t

)

depends onHt is formalized in the notion of memory. A SEPP is of memory-

m

, if

– for

m

= 0

:

(

)

depends onHt only through

N

(

t

)

the number of observed failures at time

t

;

– for

m

= 1

:

(

)

N

(

t

)

and

T

N (t);

– for

m

2

:

(

)

N

(

t

)

;T

N

(t)

;:::T

N(t) m+1;

– for

m

=

1:

(

)

is independent of the historyHt of the pp. We also say that stochastic

intensity has no memory.

When stochastic intensity depends only on a background historyF

0, then we get a Doubly

Stochastic Poisson Process (DSPP). Thus, the class of SEPP also encompasses the family of

Poisson Processes. If intensity is a non-random constant

, we have the Homogeneous Pois-son Process (HPP). A SEPP with no memory and a deterministic intensity function

(

)

, is a

NonHomogeneous Poisson Process (NHPP). In particular, if we turn back to the concatenated

failure rate function (see (2)), then selecting a NHPP model corresponds to selecting some con-tinuous deterministic function as

r

_C. We see that, given

T

_i

=

t

_i, the hazard rate of

X

_i is

(11)

r

i+1

(

) =

r

C

(

+

t

i

) =

(

+

t

i

)

. We retrieve a well-known fact for a NHPP: the (conditional)

hazard rate between

i

th and

i

+ 1

th failure times and the intensity function

(

)

only differ

through the initial time of observation of the two functions. We list properties of SEPP [15] which are of some value for analyzing the main characteristics of models. We will omit to write the dependence inF

0.

Counting statistics for SEPP

The probability distribution of rv

N

(

t

)

is strongly related to the conditional expectation

b

(

t

;

N

(

t

)) =

E

(

t

;

Ht

)

j

N

(

t

)

:

This function is called the count-conditional intensity. For a SEPP,

b

(

;

N

(

t

))

satisfies b

(

t

;

N

(

t

)) = lim

_dt !0 Pf

N

(

t

+

dt

)

N

(

t

) = 1

j

N

(

t

)

g

dt

= lim

_dt !0 Pf

N

(

t

+

dt

)

N

(

t

)

1

j

N

(

t

)

g

dt

:

Then we can obtain the following explicit representation forPf

N

(

t

) =

n

gwith

n

1

Pf

N

(

t

) =

n

g

=

Z 0<t 1< <t n<t n Y i=1 b

(

t

i

;

i

1)exp

n X i=0 Z t i+1 ti b

(

u

;

i

)

du

dt

1

:::dt

n (7) with

t

0

= 0

and

t

n+1

=

t

.

ROCOF(

t

)

, defined as the derivate of

M

(

t

)

, is then

ROCOF(

t

) =

E

h b

(

t

;

N

(

t

))

i

=

E

[

(

t

;

Ht

)]

:

We see that notion of

ROCOF(

t

)

and stochastic intensity coincide only if intensity is a deter-ministic function of time, i.e. the pp is a NHPP.

(12)

Likelihood function for a SEPP

Assume that we observe a fixed number

i

of failures. Then the likelihood function is

f

T1;:::;Ti

(

t

1

;::: ;t

i

) =

(

t

1

; 0)

i Y k=2

(

t

k

;

k

1 ;t

1

;::: ;t

k 1

)

exp

Z t 1 0

(

s

; 0)

ds

Xi k=2 Z t k tk 1

(

s

;

k

1 ;t

1

;::: ;t

k 1

)

ds

(8)

If we observe the failure process up to time

t

, the joint distribution of

N

(

t

)

;T

1

;::: ;T

N(t)is

given in [15, Th6.2.2].

Reliability and MTTF functions

R

t

(

s

) = exp

Z t +s t

(

u

;

N

(

t

)

;T

1

;::: ;T

N(t)

)

du

In particular at instant

T

i, we obtain

R

i

(

s

) =

8 < :

exp

Rs 0

(

u

; 0)

du

if

i

= 0

exp

RT i +s Ti

(

u

;

i;T

1

;::: ;T

i

)

du

if

i

1

.

The following characterization of

0

-memory SEPP is intuitively clear from the definition of the stochastic intensity of a SEPP.

Theorem 3.2

N

(

)

is a SEPP with

0

-memory is equivalent to

N

(

)

is a Markov process.

This result explains why a very large part of SR models may be developed in a Markov frame-work (see e.g. [12], [3, Chap 10]). In particular, all NHPP models are of Markov-type. In fact, it can be shown [15] that a SEPP with

m

-memory corresponds to a process

(

T

_i

)

which is a

m

-order Markov chain, that isL

(

T

i

+1 j

T

i

;::: ;T

1

) =

L

(

T

i +1 j

T

i

;:::T

i m +1

)

for

i

m

.

(13)

Theorem 3.3 A

1

-memory SEPP with a stochastic intensity satisfying

(

t

;

N

(

t

)

;T

N(t)

) =

f

(

N

(

t

)

;t T

N(t)

)

for some real-valued function

f

, is characterized by a sequence of independent inter-failure durations

(

X

i

)

. In this case, density probability function of

X

i is

f

Xi

(

x

i

) =

f

(

i

1 ;

x

i

)exp

Z x i 0

f

(

i

1 ;

u

)

du

:

(9)

Such a SEPP was called a generalized renewal process in [19] because

T

iis the sum of indepen-dent but not iindepen-dentically distributed rv. Moreover, it is an usual renewal process when function

f

does not depend on

N

(

t

)

.

To close this presentation of self-exciting processes, we point out that we only use a “con-structive” point of view. Our purpose, here, is not to discuss the existence of point processes with a fixed concatenated failure rate function or stochastic intensity. However, we emphasize that orderliness condition and existence of the limit in (6) in Definition 3.1 are enough to spec-ify the pp (see e.g. [20]). It is also shown in [14, Th4.1] that, under conditional orderliness condition, concatenated failure rate function well-defines a SEPP with respect to operational Definition 3.1. Moreover, an easily checked criterion for conditional orderliness is given [14, Th4.2]. In particular, if hazard rates in (4) are locally bounded then conditional orderliness holds.

3.2 Classification of SR models

We obtain from the concept of memory for a SEPP, a classification of the existing models. It is appealing to define a model with a high memory. But, as usual, the pay-off is the complexity in the statistical inference and the amount of data to be collected.

(14)

3.2.1

0

-memory SEPP

A first type of

0

-memory SEPP is when

(

t

;

Ht

;

F

0

) =

f

(

N

(

t

)

;

F

0

)

. We get major common

properties of this first class of models from Subsection 3.1.

–

N

(

)

is a Markov process (a pure birth Markov process).

–

(

X

_i

)

are independent (given F

0). From (9), rv

X

i has an exponential distribution with

pa-rameter

f

(

i

1 ;

F

0

)

. This easily gives a likelihood function given inter-failure durations

(

X

i

)

. –

T

_i

=

Pi

k=1

X

k

is an Hypoexponential distributed rv as the sum of

i

independent and expo-nentially distributed rvs [21].

–

R

t

(

s

) = exp(

f

(

N

(

t

)

;

F

0

)

s

)

. The reliability function only depends on the current number

of failures

N

(

t

)

. We have

MTTF(

t

) = 1

=f

(

N

(

t

)

;

F 0

)

.

Example 3.1 (JM) Jelinski-Moranda model has been introduced in Section 3. This model has

to be considered as a benchmark model, since all authors designing a new model emphasize that their model includes JM as particular case. The stochastic intensity is given in (3) (see Figure 1 for a path). Besides properties common to the class of SEPP considered in this paragraph, we had the following additional assumptions: the software includes a finite

N

of bugs in the program and no new fault is inserted during debugging. We also noted that each fault has the same contribution to the un-reliability of the software. These assumptions are generally considered as questionable. We derive from (7) that the distribution of rv

N

(

t

)

is Binomial with parameters

N

and

1 exp(

t

)

. The main reliability metrics are:

ROCOF(

t

) =

N

exp(

t

)

;

MTTF

i

= 1

_{N i;R}

i

(

s

) = exp

(

N i

)

s

(15)

We also mention the Geometric model of Moranda [22] where the stochastic intensity is

(

t

;

Ht

;;c

) =

c

N

(t)

where

0

and

c

2

]0

;

1[

.

A second class of

0

-memory SEPP is when the stochastic intensity is actually a func-tion of time

t

,

N

(

t

)

(and F

0). In fact we only have in this category of models, SEPP with

(

t

;

Ht

;

F

0

) = (

N N

(

t

))

'

(

t

)

for some deterministic function

'

(

)

of time

t

. Note that

'

(

t

) =

gives (JM).

–

N

(

)

is a Markov process.

– Let us denote

Rt

0

'

(

s

)

ds

by

(

t

)

. We deduce from (7) that

N

(

t

)

is a Binomial distributed rv with parameters

N

and

p

(

t

) = 1 exp( (

t

))

. This leads to the term Binomial-type model in [23]. It follows that

E

[

N

(

t

)] =

Np

(

t

)

and

ROCOF(

t

) =

N'

(

t

)exp( (

t

))

.

–

R

t

(

s

) = exp

(

N N

(

t

))((

s

+

t

) (

t

))

.

– We get the likelihood function given failure times

(

T

_i

)

from (8)

f

T1;:::;Ti

(

t

1

;::: ;t

i

) =

N

!

(

N i

)! exp

(

N i

)(

t

i

)

i Y j=1

'

(

t

j

)exp( (

t

j

))

– The pdf of rv

T

i is

f

Ti

(

t

i

) =

i

N i

exp( (

t

i

))

'

(

t

i

)[exp( (

t

i

)) 1]

i 1 .

Littlewood’s model in [24] is an instance of a Binomial model with

'

(

t

) =

=

(

+

t

)

and

; >

0

.

3.2.2 NHPP model:

(

t

;

Ht

;

F

0

) =

f

(

t

;

F

0

)

and is deterministic

A large part of models is of NHPP-type. We refer to [25] and [6, Chap 5] for a complete list of such models. The very appealing properties of the counting process explains the widely use of this family of pp in SR modeling.

(16)

–

N

(

)

is a Markov process.

– We deduce from Definition 3.1 that for any

t

,

N

(

t

)

is a Poisson distributed rv with parameter

(

t

) =

Rt 0

f

(

s

;

F

0

)

ds

. In fact,

N

(

)

has independent (but nonstationary) increments, that is,

N

(

t

1

)

;N

(

t

2

)

N

(

t

1

)

;::: ;N

(

t

i 1

)

N

(

t

i

)

are independent rvs for any

(

t

1

;::: ;t

i

)

.

– The mean value function

M

(

t

)

is

(

t

)

. Then

ROCOF(

t

) =

f

(

t

;

F

0

)

. Such a model is called

a finite NHPP model if

M

(+

1

)

<

+

1and an infinite one when

M

(+

1

) = +

1. Indeed,

if

M

(+

1

)

<

+

1then we only consider a finite number of failure times with probability

1

.

–

R

_t

(

s

) = exp

((

t

+

s

) (

t

))

.

– Likelihood function given failure times

(

T

_i

)

is from (8)

f

T1;:::;Ti

(

t

1

;::: ;t

i

) = exp( (

t

i

))

i Y k=1

f

(

t

k

;

F 0

)

:

(11)

Example 3.2 (GO) This model is characterized by the following mean value function and

in-tensity function:

(

t

) =

M

(1 exp(

t

))

and

(

t

;

;M

) =

M

exp(

t

)

for

t

0

.

Parameters

and

M

are the failure rate per fault and the finite expected (initial) number of faults contained in the software. We see that, at any time

t

, the intensity function is propor-tional to the expected remaining number of faults

(

t

;

;M

) =

(

M

(

t

))

. Thus, (GO) is essentially a NHPP-version of (JM).

Example 3.3 (MO) For the Musa-Okumoto model [3], mean value function and intensity

func-tion are

(

t

) = ln(

t

+ 1)

=

and

(

t

;

) =

=

(

t

+ 1)

respectively.

is the initial value of the intensity function and

is called the failure intensity decay parameter. It is easily seen that

(

t

;

) =

exp(

(

t

))

. Thus, intensity function exponentially decreases with the

(17)

expected number of failures and shows that (MO) may be understood as a NHPP version of Moranda’s Geometric model.

As quoted in [26], using a NHPP model may appear inappropriate to describe reliability growth of a software. Indeed, this is debugging which modifies the reliability. Thus, the true intensity function changes probably in a discontinuous manner during corrective actions. How-ever, Miller showed in [27] that a binomial-type model of Subsection 3.2.1 may be transformed in a NHPP variant assuming that the initial number of faults is a Poisson distributed rv with expectation

N

. For instance, (JM) is transformed into (GO). Moreover, Miller showed that binomial-type model and its NHPP variant are indistinguishable from a single realization of the failure process. But, these two models differ as prediction model because estimates of parame-ters are different.

3.2.3

1

-memory SEPP with

(

t

;

Ht

;

F

0

) =

f

(

N

(

t

)

;t T

N(t)

;

F

0

)

N

(

)

is not Markovian. But the inter-failure durations

(

X

i

)

are independent rvs given

F

0 (see

Theorem 3.3) and the pdf of rv

X

iis given in (9).

Example 3.4 (LV) Stochastic intensity of the Littlewood-Verrall model [28] is

(

t

;

Ht

;;

(

)) =

(

N

(

t

) + 1) +

t T

N(t)

(12) for some nonnegative function

(

)

. We briefly recall the Bayesian rationale underlying the

def-inition of this model. Uncertainty about the debugging operation is represented by a sequence of stochastically decreasing failure rates

(

_i

)

_i1, that is

j st

j 1

;

i.e 8

t

2R

:

Pf

j

t

gPf

j 1

t

g

:

(13)

Thus, using stochastic order allows a decay of reliability which takes place when fault are inserted. Prior distribution of rv

_i is Gamma with parameters

et

(

i

)

. It can be shown that

(18)

t

4

t

3

t

2

t

1

(

:

)

0

Figure 2: A path of the stochastic intensity for (LV)

inequality (13) holds when

(

)

is a monotonic increasing function of

i

. Given

i

=

i, rv

X

i has an exponential distribution with parameter

_i. Unconditional distribution of rv

X

_i is a Pareto distribution with pdf

f

(

x

i

;

(

i

)) =

₍

_x

(

i

)

i

+

(

i

))

+1

:

Thus, the “true” hazard rate of

X

i is

r

i

(

t

;

(

i

)) =

=

(

t

+

(

i

))

. In [29], parameters

and are estimated using Bayesian method. The corresponding model is called a Hierarchical Bayesian model in [2] and is also a

1

-memory SEPP. If

(

)

is linear in

i

, we have

R

i

(

t

) =

(

i

+ 1)

t

+

(

i

+ 1)

;

MTTF

i

=

(

i

+ 1)

₁

:

We also mention the Schick-Wolverton’s model [30] where

(

t

;

Ht

;

;N

) =

(

N

(

t

))(

t T

N(t)

)

and

>

0 ;N

are the same parameters as for (JM).

3.2.4

m

2

-memory

Instances of such models are rare. For

m

= 2

, we have the time-series models in [31], [32]. For

m >

2

, we have the adaptative concatenated failure rate model from [33] and the Weibull models of Pham [34]. We refer to original contributions for details.

(19)

4 White-box modeling

Most work on software reliability assessment adopts the black-box view of the system, in which only the interactions with the environment are considered. The white-box (or structural) point of view is an alternative approach in which the structure of the system is explicitly taken into account. This is advocated for instance in [35], [36]. Specifically, the structure-based approach allows analyzing the sensitivity of the reliability of the system with respect to the reliability of its components. Up to recently, only a few papers proposed structure-based software reliability models. A representative sample was (in discrete time) [35], [37], [38] and (in continuous time) [39],[36], [19],[40]. An up-to-date review on the architecture-based approach is given in [41]. We will present the main features of the basic Littlewood’s model which are common to most previous cited works. The discrete time counterpart is Cheung’s model.

In a first step, Littlewood defines an execution model of the software. The basic entity is the standard software engineering concept of module as for instance in [35]. The software structure is then represented by the call graph of the setMof the modules. These modules interact by

execution control transfer and, at each instant, control lies in one and only one of the modules which is called the active one. From such a view of the system, we build up a continuous time stochastic process

(

X

_t

)

_t0which indicates the active module at each time

t

.

(

X

t

)

t0is assumed

to be a homogeneous Markov process on the setM.

In a second step, Littlewood describes the failure processes associated with execution ac-tions. Failure may happen during a control transfer between two modules or during an execution period of any module. During a sojourn of the execution process in the module

i

, failures are part of Poisson process having parameter

_i. When control is transferred from module

i

2 M

(20)

modules, the failure processes associated with each state are independent. Also, the interface failure events are independent on each other and on the failure processes occurring when a module is active.

The architecture of the software is combined with the failure behavior of the modules and that of interfaces into a single model which can then be analyzed. This method is referred as the “composite-method” according to the classification of Markov models in the white-box approach proposed in [41]. Basically, we are still interested in the counting process

N

(

)

.

Another interesting point process is obtained by assuming that the probability of a secondary failure during a control transfer is

0

. Thus, assuming that

(

i;j

) = 0

for all

i;j

2 M in

the previous context, we get a Poisson process whose parameter is modulated by the Markov process

(

X

_t

)

_t0. This is also called a Markov Modulated Poisson Process (MMPP). It is

well-known (e.g. [17]) that the stochastic intensity of a MMPP with respect to the historyHt _F 0, whereF 0

=

(

X

s

;s

0)

, is

X t. Thus,

N

(

)

is an instance of a DSPP.

Asymptotic analysis and transient assessment of distribution of rv

N

(

t

)

may be carried out in observing that the bivariate process

(

X

t

;N

(

t

))

t0is a jump Markov process with state space MN. Computation of all standard reliability metrics may be performed as in [42],[43]. We

refer to these papers for details and for calibration of the models.

It can be argued that models of Littlewood-type are inappropriate to capture a reliability growth of the software. In fact, Laprie et al. [19] has developed a method to incorporate such a phenomenon which can be used, for instance, in the Littlewood’s model to take into account reliability growth of modules in the assessment of the overall reliability (see [43]).

(21)

failure parameters decrease to

0

then

N

(

)

is asymptotically a HPP with parameter

=

X i2M

(

i

)

X j2M

Q

(

i;j

)

(

i;j

) +

i

;

where

Q

is the irreducible generator of the Markov process

(

X

t

)

t0 and

its stationary

dis-tribution.

(

i

)

is to be interpreted as the proportion of time the software passed in module

i

(over a long time period). We just discuss the case of a MMPP. Convergence to

0

of the failure parameters

i(

i

2M) may be achieved in multiplying each of them by a positive scalar

"

, and

in considering that

"

decreases to

0

. So that, the new stochastic intensity of the MMPP is

"

Xt.

As

"

tends to

0

, it is easily seen from a MMPP with a modulating two-states Markov process

(

X

t

)

t0 that, for the first failure time

T

, probability

Pf

T > t

gconverges to

1

. Therefore, we

can not obtain an exponential approximation to the distribution of rv

T

as

"

tends to

0

. Thus, we can not expect to derive a Poisson approximation to the distribution of

N

(

)

. In fact, the right

statement is: if failure parameters are much smaller than the switching rates between modules,

N

(

)

is approximately a HPP with parameter

. For a MMPP, a proof is given in [44] using

martingale theory. Moreover, the rate of convergence in total variation of finite dimensional-distributions of

N

(

)

to those of the HPP is shown to be in

"

. This last fact is important because

user has no information on the quality of the Poissonian approximation given in [36]. However, there is no rule to decide a priori if the approximation is optimistic or pessimistic (see [43]). A similar approach to [44] is used in [45] to derive Poisson approximation and rate of conver-gence for more general pp than MMPP, including the complete Littlewood’s counting process [46], the counting model of [43]. Such asymptotic results give a “hierarchical-method” [41] for reliability prediction: we solve the architectural model and superimpose the failure behavior of the modules and that of the interfaces on to the solution to predict reliability.

(22)

5 Calibration of model

Suppose that we have selected one of the black-box models of Section 3. We obtain reliability metrics which depend on the unknown parameters of the model. Thus, we hate to estimate these metrics from the failure data. We briefly review standard methods to get point-estimates in Subsections 5.1, 5.2.

One major goal of the SR modeling is to predict the future value of metrics from the gath-ered failure data. It is clear from Section 3 that a central problem in SR is to select a model because the huge number of available models. Criteria to compare SR models are listed in [3]. The authors propose quality of assumptions, applicability, simplicity and predictive validity. To assess the predictive validity of models, we need methods which are not only based on a goodness-of-fit approach. Various techniques may be used: u-plot, prequential likelihood, etc. We do not discuss this issue here. A good account for predictive validation methods are given in [47], [48], [4, Chap 4], [3], [11] where comparisons between models are also carried out. Note also that predictive quality of a SR model may be drastically improved using preprocessing of data. In particular, statistical tests have been designed to capture trend in data. Thus, reliability trend analysis allows using SR models which are adapted to reliability growth, stable reliability and reliability decrease respectively. We refer to [49], [4, Chap 10] and references therein for details. Parameters will be denoted by

(it can be multivariate).

5.1 Frequentist procedures

Parameter

is considered as taking an unknown but fixed value. Two basic methods to estimate the value of

are: method of maximum likelihood (ML), the least squares method. That is, we have to optimize with respect to

an objective function which depends on

and collected

(23)

failure data to get a point-estimate. Another standard procedure, interval estimation, gives an interval of values as estimate of

. We only present point-estimation by method of maximum likelihood on (JM) and NHPP models. Others models may be analyzed in a similar way. ML estimations possess several appealing properties that make the procedure widely used. Two of these properties are the consistency and the asymptotic normality to get confidence interval for the point-estimate. Another one is that the ML estimates of

f

(

)

(for one-to-one function

f

) is simply

f

(

b

)

where

b

is the ML estimate of

. We refer to [3, Ch 12] for a complete view of the frequentist inference procedures in SR modeling context.

Example 5.1 (JM) Parameters are

and

N

. Assume that failure data are given by observed values

x

= (

x

1

;::: ;x

i

)

of rv’s

X

1

;::: ;X

i. If

(

;N

)

are the true values of parameters, then

the likelihood to observe

x

is defined as

L

(

;N

;

x

) =

f

X1;:::;Xi

(

x

;

;N

)

(14)

where

f

_X

1;:::;Xi

(

;

;N

)

is the pdf of the joint distribution of

X

1

;::: ;X

i. The estimate

(

;

b

N

b

)

of

(

;N

)

will be the value of

(

;N

)

which maximizes the likelihood to observe data

x

:

f

X1;:::;Xi

(

x

;

b

;

N

b

) = max

;N

L

(

;N

;

x

)

. Maximizing the likelihood function is equivalent to maximizing the log-likelihood function

ln

L

(

;N

;

x

)

. From the independence of rv’s

(

X

i

)

and (14), we obtain that

ln

L

(

;N

;

x

) = ln

Yi k=1

(

N

(

k

1))exp

(

N

(

k

1))

x

k

:

Estimates

(

;

b

N

b

)

are solution of _@@

ln

L

(

;N

;

x

) =

_@N@

ln

L

(

;N

;

x

) = 0

: b

=

i

b

N

Pi k=1

x

k Pi k=1

(

k

1)

x

k i X k=1

1

b

N

(

k

1) =

N

b

i

1 P i k =1x k Pi k=1

(

k

1)

x

k

(24)

The second equation may be solved by numerical techniques and then the solution is put into the first equation to get

b

. These estimates are plugged into formulae (10) to get ML estimates of reliability metrics.

Example 5.2 (NHPP models) Assume that failure data are

t

= (

t

1

;::: ;t

i

)

the observed

fail-ure times. The likelihood function is given by (11). Thus, for (MO) model, ML estimates of parameters

0

;

1 (where

0

= 1

=;

1

=

) are solution of [3]

b

0

=

i

ln(1 +

b 1

t

i

)

;

1

b

1 i X k=1

1 1 +

b 1

t

k

=

_{(1 +}

i t

i b

1

t

i

)ln(1 +

b

1

t

i

)

:

Assume now that failure data are the cumulative numbers of failures

n

1

;::: ;n

i at some

instants

d

1

;::: ;d

i. The likelihood function is from the independence and Poisson distribution

of the increments of the counting process

N

(

)

L

(

;

t

) =

Yi k=1

(

d

k

)

(

d

k 1

))

nk nk 1

(

n

k

n

k 1

)!

exp

(

d

k

)

(

d

k 1

))

(

d

0

=

n

0

= 0)

;

where

(

)

is the mean value function of

N

(

)

given that

is the true value of the parameter.

For (GO) model, parameters are

M;

and their ML estimates are

c

M

=

_{1 exp(}

n

i

_d

i

)

d

i

exp(

d

i

)

n

i

1 exp(

d

i

) =

i X k=1

(

n

k

n

k 1

)(

d

k

exp(

d

k

)

d

k 1

exp(

d

k 1

))

exp(

d

k 1

) exp(

d

k

)

:

The second equation is solved by numerical techniques and the solution is incorporated into the first equation to get

M

c

. Estimation of reliability metrics are obtained as for (JM).

5.2 Bayesian procedure

An alternative to frequentist procedures is to use Bayesian statistics. Parameter

is considered as a value of a rv

. A prior distribution of

has to be selected. This distribution represents the a priori knowledge on the parameter and is assumed to have a pdf

(

)

. Now, from the

(25)

failure data

x

, we have to update our knowledge on

. If

L

(

;

x

)

is the likelihood function of the data, then Bayes theorem is used as updating formula: pdf of the posterior distribution of

given data

x

is

f

jx

(

) =

L

(

;

x

)

(

)

R R

L

(

;

x

)

(

)

d:

Now, we find an estimate

bof

by minimizing the so-called posterior expected loss: R

R

l

(

b

;

)

f

jx

(

)

d

where

l

(

;

)

is the loss function. With a quadratic loss function

l

(

b

;

) = (

b

)

2

, it is well-known that the minimum is obtained by the conditional expectation

b

=

E

[

j

x

] =

Z

R

f

jx

(

)

d:

It is just the mean of the posterior distribution of

. Note that all SR models involve two or more parameters, so that the previous integral must be considered as multidimensional. Therefore computation of such integrals are by numerical techniques. Since a decade, progress has been made on such methods: e.g. Monte Carlo Markov Chain methods (see e.g. [50]).

Note that prior distribution can also be parametric. In general, Gamma or Beta distribu-tions are used. So that, additional parameters need to be estimated. This may be carried out by using the ML method. For instance, this is the case for function

(

)

in the (LV) model in

[28]. However, such estimates may be derived in the Bayesian framework and we obtain a hi-erarchical Bayesian analysis of the model. Many models of Section 3 have been analyzed from a Bayesian point of view using MCMC methods like Gibbs sampling, or data augmentation method (see [51] for (JM), (LV); [52] for NHPP model, [53] for S-shaped models and reference therein).

(26)

6 Current issues

6.1 Black-box modeling

We list some issues which are not new but covering well-documented limitations of popular black-box models. Most of them have been recently addressed and practical validation is needed. We will see that the prevalent approach is to use the NHPP modeling framework. Indeed, an easy way to incorporate in a NHPP model various factors affecting the reliability of a software, is to select a suitable parameterization of the intensity function (or ROCOF). Any alternative model combining most these factors will be of value.

6.1.1 Imperfect debugging

The problem of imperfect debugging may be naturally addressed in the Bayesian framework. Reliability growth is captured through deterministically nonincreasing sequence of failure rates

(

r

i

(

))

(see (1)). In Bayesian framework, parameters of

r

i

(

)

are considered as random. So that,

we can deal with stochastically decreasing sequence of rvs

(

r

i

)

i(see (13)), which allows to take into account the uncertainty on the effect of a corrective action. An instance of this approach is given by the (LV) model (see also [29], [34]).

Note that the binomial class of models can incorporate a defective correction of a detected bug. Indeed, assume that each fault detected has a probability

p

to be removed from the soft-ware. The hazard rate after

(

i

1)

repairs is

(

N p

(

i

1))

(see [23]). But, the problem of eventual introduction of new faults is not addressed. Kremer [54] solves the case of a single in-sertion using a nonhomogeneous birth-death Markov model. This has been extended to multiple introductions in [55]. Shanthikumar and Sumita [56] proposed a multivariate model where mul-tiple removing and insertions of faults are allowed at each repair. This model involved complex

(27)

computational procedures and is not considered in literature. Recent advance in addressing the problem of eventual insertion of new faults is concerned with finite NHPP models. It consists in generalizing the basic proportionality between the intensity function and the expected number of remaining faults at time

t

of (GO) model (see Example 3.2) in

(

t

) =

(

t

)[

n

(

t

) (

t

)]

where

(

t

)

represents a time-dependent detection-rate of a fault;

n

(

t

)

is the number of faults in the software at time

t

, including those already detected and removed and those inserted during the debugging process (

(

t

) =

and

n

(

t

) =

M

in (GO)). Making

(

)

time-dependent allows

representing a phenomenon of learning process which is closely related to the changes in the efficiency of testing. This function can monotonically increase during testing period. Select a nondecreasing S-shaped curve as

(

)

gives an usual S-shaped NHPP model. Further details

may be found in [6, Chap 5] and references therein.

Another basic way to consider insertion of new faults is to use a marked point process (MPP) (e.g. [15, Chap 4]). We have the pp of failure detection times

T

1

< T

2

<

and

with each date

T

i, we associate a mark

M

i which represents the cumulative number of faults removed and inserted during the debugging phase. We retrieve a usual pp if all marks are

1

. For such a model, we are interested in the mark-accumulator process

PN (t)

i=0

M

i

(

M

0

= 0

). A basic

instance of MPP is the compound Poisson Process where

N

(

)

is a HPP and

M

i’s are i.i.d. and

independent of

N

(

)

. This may also be used to model clustering of failures (see [57]). Such

MPP are not SEPP of Section 3.1 because orderliness condition fails. This framework was used by van Pul [18] to extend models of (JM) type to incorporate the possibility of inserting new faults during repair.

(28)

6.1.2 Early prediction of software reliability

A major limitation of SR models of Section 3 for software engineering community is to provide no help for managing in the earlier phase of development (or testing) of a product. Indeed, calibration of these black-box models requires a relatively large set of failure data. This is rarely encountered in the earlier life-cycle of a software. In some sense, we are now concerned with the general topic of software quality assessment (which includes dependability concepts) with no failure data. Thus, we have to develop statistical models of quality which are not directly related to the knowledge of a part of the failure process. In such a case, model must be based on a priori information on the product: judgment of experts, quality of the development process, similar existing products, software complexity, etc. Incorporating subjective information leads naturally to Bayesian statistics. We refer to [2, Chap 5,6] for discussion in this context. Since we are mainly interested in reliability assessment, we restrict ourselves to more and less recent issues relying quality control to the software reliability.

A widespread idea is that complexity of a software is an influent factor of the reliability attributes. Much work has been devoted to quantify the software complexity through software

metrics (see e.g. [58]). Typically, we compute Halstead and McCabe metrics which are program

size and control flow measures respectively. It is worthy of note that most software complexity metrics are strongly related to the concept of structure of software code. Thus, including com-plexity factor in SR may be thought of as a first attempt to take into account the architecture of a software in reliability assessment. We turn back to this issue in Subsection 6.2. Now, how to include complexity attributes in earlier reliability analysis? Most of recent research focus on the identification of software modules which are likely fault-prone from data of various com-plexity metrics. In fact, we are faced with a typical problem of data analysis that explains why

(29)

literature on this subject is mainly concerned with procedures of multivariate analysis: linear and nonlinear regression methods, classification methods, techniques of discriminant analysis. We refer to [4, Chap 12], [59],[60] and references therein for details.

Another empirical evidence suggests that the higher the test coverage, the higher the relia-bility of the software would be. Thus, a model which incorporates information on functional testing as soon as it is available is of value. This issue is addressed in a NHPP model proposed in [61]. It consists in defining an appropriate parameterization of a finite NHPP model which relates software reliability to the measurements that can be obtained from the code during func-tional testing. Let

a

be the expected number of faults that would be detected given infinite time testing. The intensity function

(

)

is assumed to be proportional to the expected number of

remaining failures:

(

t

) = [

a

(

t

)]

(

t

)

where

(

t

)

is the hazard rate per-fault. Finally, the time-dependent function

(

t

)

is of the form

(

t

) =

dc(t)

dt

1 c

(

t

)

where

c

(

t

)

is the coverage function. That is, the ratio of the number of potential fault-sites covered by time

t

divided by the total number of potential fault-sites under consideration during testing. Function

c

(

t

)

is assumed to be continuous and monotone as function of time testing. Specific forms of function

c

(

)

allow retrieving some well-known finite failure models:

expo-nential function

c

(

t

) = 1 exp(

t

)

corresponds to the (GO); Weibull coverage function

c

(

t

) = 1 exp(

t

₎

_{corresponds to the generalized (GO) model [62]; S-shaped coverage}

function correspond to S-shaped models [25], etc. Gokhale et al propose to use a log-logistic function. We refer to [61] for details. Such a parameterization leads to estimate

a

and the pa-rameters of function

c

(

)

. The model may be calibrated according to the different phase of the

(30)

metrics (using procedures of multivariate analysis) and measure coverage during the functional testing using a coverage measurement tool (see e.g. [4, Chap 13]). Thus, we get early prediction of reliability (see [63] for an alternative using information from testing phases of similar past projects).

6.1.3 Environmental factors

Most SR models in Section 3 ignore the factors affecting software reliability. In some sense, previously issues discussed in this section can be considered as an attempt to capture some environmental factors. Imperfect debugging is related to the fact that new faults may be inserted during a repair. Complexity attributes of a software is strongly correlated to its fault-proness. Empirical investigations show that the development process, testing procedure, programmer skill, human factors, the operational profile and many others factors affect the reliability of a product (see e.g. [64], [4, Chap 13], [65], [66] and references therein). A major issue is to incorporate all these attributes into a single model. At the present time, investigation focus on functional relationship between the hazard rate

r

i

(

)

of the software and quantitative measures

of the various factors. In this context, a well-known model is the so-called Cox proportional hazard model (PHM) where

r

i

(

)

is assumed to be an exponential function of the environmental

factors:

r

i

(

t

) =

r

(

t

)exp

n X j=1

j

z

j

(

i

)

! (15) where

z

_j

(

)

’s, called the explanatory variables or covariates, are the measures of the factors and

j’s are the regression coefficients.

r

(

)

is a baseline hazard rate that gives the hazard rate when

all covariates are set to

0

. Therefore, given

z

= (

z

1

;::: ;z

n

)

, the reliability function

R

i is

R

i

(

t

j

z

) =

R

(

t

)exp

n X j=1

j

z

j

(

i

)

! (16)

(31)

with

R

(

t

) = exp

Rt

0

r

(

s

)

ds

. Formula (15) expresses the effect of accelerating or deceler-ating the time to failure given

z

. Note that covariates may be time-dependent, random. In this last case the reliability function will be the expectation of function in (16). Baseline hazard rate may be any of the hazard rates used in Section 3. The family of parameters can be estimated using ML. Note that one of the reasons for the popularity of PHM is that the unknown

_j’s may be estimated by the partial likelihood approach without putting a parametric structure on the baseline hazard rate. We refer to [67], [68], [11], [69] for general discussion on Cox regression models. Applications of PHM to software reliability modeling are given in [12, Chap 7], [70], [71] and references therein. Recently, Pham derives in [72] an enhanced proportional hazard (JM) model.

A general way to represent influence of environmental factors on reliability is to assume that stochastic intensity of the counting process

N

(

)

is a function of some

m

stochastic processes

E

1

(

t

)

;::: ;E

m

(

t

)

or covariates

(

t

;

Ht

;

F

0

) =

f

(

t;E

1

(

t

)

;::: ;E

m

(

t

)

;T

1

;::: ;T

N(t)

;N

(

t

))

whereHt is the past up to time

t

of the pp andF

0 encompasses the specification of the paths

of all covariates. Thus, function

(

t

;

Ht

;

F

0

)

may be thought of as the stochastic intensity

of a SEPP driven or modulated by the multivariate environmental process

(

E

1

(

t

)

;::: ;E

m

(

t

))

.

DSPP of Subsection 3.1 is a basic instance of such models and have been widely used in com-munication engineering and in reliability. Castillo and Siewiorek proposed in [73], a DSPP with a cyclo-stationary stochastic intensity to represent the effect of the workload (measure of system usage) on failure process. That is, intensity is a stochastic process assumed to have periodic mean and autocorrelation function. A classic form for intensity of a DSPP is

(

E

_t

)

, where

(

E

_t

)

is a finite Markov process. This is the MMPP discussed in Section 4, where

(

E

_t

)

(32)

represented the control flow structure of the software. In the same spirit of system in a random environment, ¨Ozekici and Sofer [65] use a pp whose stochastic intensity is

(

N N

(

t

))

(

E

t

)

, where

N N

(

t

)

is the remaining number of faults at time

t

and

E

tis the operation performed by the system at

t

.

(

E

_t

)

is also assumed to be a finite Markov process. Transient analysis of

N

(

)

may be carried out as in [42] from the Markov property of the bivariate process

(

N N

(

t

)

;E

t

)

. We refer to [65] for details. Note that both pp are instances of SEPP with respective stochastic intensities

E

[

(

E

t

)

jHt

]

and

(

N N

(

t

))

E

[

(

E

t

)

jHt

]

(Ht is the past of the counting process

up to time

t

). The practical purpose of such models has to be addressed. In particular, further investigations are needed to estimate parameters (see [74]).

6.1.4 conclusion

There exists other issues which are of value in SR engineering. In the black-box modeling framework, we can think about alternative to approaches reported in Section 3. The problem of SR growth assessing may be thought of as a problem of statistical analysis of data. Therefore, prediction techniques developed in this area of research can be used. For instance, some authors have considered neural networks (NN). The main interest as SR model is to be nonparametric. Thus, we rejoin discussion on statistical issues of Subsection 6.3. NN may also be used as a classification tool. For instance, identifying fault-prone modules may be performed with a NN classifier. We refer to [4, Chap 17] and references therein for an account of the NN approach. Empirical comparison of the predictive performance of NN models and recalibrated standard models (as defined in [75]) is given in [76]. NN is found to be a good alternative to the standard models.

Computing dependability metrics is not an end in itself in software engineering. A major question is the time to release a software. In particular we have to decide when to stop testing.