Fault Diagnosis with Static and Dynamic Observers

(1)

HAL Id: inria-00351836

https://hal.inria.fr/inria-00351836

Submitted on 12 Jan 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Franck Cassez, Stavros Tripakis

To cite this version:

Franck Cassez, Stavros Tripakis. Fault Diagnosis with Static and Dynamic Observers. Fundamenta Informaticae, Polskie Towarzystwo Matematyczne, 2008, 88 (4), pp.497–540. �inria-00351836�

(2)

Fault Diagnosis with Static and Dynamic Observers

∗

Franck Cassez†

CNRS/IRCCyN

1 rue de la No¨e, BP 92101, 44321 Nantes Cedex 3, France Email: franck.cassez@cnrs.irccyn.fr

Stavros Tripakis

Cadence Research Laboratories

2150 Shattuck Avenue, 10th floor, Berkeley, CA, 94704, USA

and CNRS, Verimag Laboratory, Centre Equation, 2, avenue de Vignate, 38610 Gi`eres, France Email: tripakis@cadence.com

Abstract. We study sensor minimization problems in the context of fault diagnosis. Fault diagnosis

consists in synthesizing a diagnoser that observes a given plant and identifies faults in the plant as soon as possible after their occurrence. Existing literature on this problem has considered the case of fixed static observers, where the set of observable events is fixed and does not change during execution of the system. In this paper, we consider static observers where the set of observable events is not fixed, but needs to be optimized (e.g., minimized in size). We also consider dynamic observers, where the observer can “switch” sensors on or off, thus dynamically changing the set of events it wishes to observe. It is known that checking diagnosability (i.e., whether a given observer is capable of identifying faults) can be solved in polynomial time for static observers, and we show that the same is true for dynamic ones. On the other hand, minimizing the number of (static) observable events required to achieve diagnosability is NP-complete. We show that this is true also in the case of mask-based observation, where some events are observable but not distinguishable. For dynamic observers’ synthesis, we prove that a most permissive finite-state observer can be computed in doubly exponential time, using a game-theoretic approach. We further investigate optimization problems for dynamic observers and define a notion of cost of an observer. We show how to compute an optimal observer using results on mean-payoff games by Zwick and Paterson.

∗

Preliminary versions of parts of this paper appeared in [2] and [1]. †

(3)

1. Introduction

Monitoring, Testing, Fault Diagnosis and Control. Many problems concerning the monitoring, test-ing, fault diagnosis and control of embedded systems can be formalized using finite automata over a set of observable eventsΣ, plus a set of unobservable events [15,19]. The invisible actions can often be represented by a single unobservable eventτ . Given a finite automaton over Σ ∪ {τ } which is a model of

a plant (to be monitored, tested, diagnosed or controlled) and an objective (good behaviours, what to test for, faulty behaviours, control objective) we want to check if a monitor/tester/diagnoser/controller exists that achieves the objective, and if possible to synthesize one automatically.

The usual assumption in this setting is that the set of observable events is fixed (and this in turn, determines the set of unobservable events as well). Observing an event usually requires some detection mechanism, i.e., a sensor of some sort. Which sensors to use, how many of them, and where to place them are some of the design questions that are often difficult to answer, especially without knowing what these sensors are to be used for.

In this paper we study problems of sensor minimization. These problems are interesting since ob-serving an event can be costly in terms of time or energy: computation time must be spent to read and process the information provided by the sensor, and power is required to operate the sensor (as well as perform the computations). It is then essential that the sensors used really provide useful information. It is also important for the computer to discard any information given by a sensor that is not really needed. In the case of a fixed set of observable events, it is not the case that all sensors always provide useful information and sometimes energy (used for sensor operation and computer treatment) is spent for noth-ing. As an example, consider the system described by the automatonM, Fig.1. M models a system

• • • • • • τ τ f a b b a

Figure 1. The automatonM.

that has two possible behaviors, a faulty behavior that starts with the fault event f , and a non-faulty

behavior that starts with eventb. Events a and b are assumed to be observable, whereas events f and τ

are unobservable. In this case, a simple way for an observer to detect faultf is to watch for both events a

andb: if the sequence a.b is observed, then f must have occurred; if, on the other hand, b.a is observed,

then we can be certain thatf did not occur (assuming the system under observation behaves precisely as

the modelM).

Although this is a correct way of performing diagnosis in this case, there exists a “cheaper” way. Initially, the observer turns on only thea sensor (that is, is only able to observe a but not b). When a

is observed (eventually it will, sincea occurs in both behaviors) the observer turns off the a sensor and

turns on theb sensor. If b is observed, then we can conclude that f occurred. As long as b is not observed, f has not occurred.

The former diagnosis method uses a “static” observer in the sense that it watches for the same set of events throughout the entire observation process. The latter method uses a dynamic observer, which

(4)

turns sensors on and off as necessary. The latter method can be less costly, as in the example described above.

Sensor Minimization and Fault Diagnosis. We focus our attention on sensor minimization, without looking at problems related to sensor placement, choosing between different types of sensors, and so on. We also focus on a particular observation problem, that of fault diagnosis. We believe, however, that the results we obtain are applicable to other contexts as well.

Fault diagnosis consists in observing a plant and detecting whether a fault has occurred or not. We follow the discrete-event system (DES) setting of [16] where the behavior of the plant is assumed to be known: this includes both the nominal, non-faulty behavior of the plant, as well as the faulty behavior of the plant (after a fault occurs). In particular, we assume that we have a model of the plant in the form of a finite-state automaton overΣ ∪ {τ, f }, where Σ is the set of potentially observable events, τ represents

the unobservable events, andf is a special unobservable event that corresponds to the faults.1

Checking diagnosability (whether a fault can be detected) for a given plant and a fixed set of ob-servable events can be done in polynomial time [10,23]. In the general case, synthesizing a diagnoser involves determinization and thus cannot be done in polynomial time.

We examine sensor optimization problems with both static and dynamic observers. A static observer always observes the same set of events, whereas a dynamic observer can modify the set of events it wishes to observe during the course of the plant execution.

In the static observer case, we consider both the standard setting of observable/unobservable events as well as the setting where the observer is defined as a mask which allows some events to be observable but not distinguishable (e.g., see [3]). Our first contribution is to show that the problems of minimizing the number of observable events (or distinct observable outcomes in case of the mask) are NP-complete. Membership in NP can be easily derived by reducing these problems to the standard diagnosability problem, once a candidate minimal solution is chosen non-deterministically. NP-hardness can be shown using reductions of well-known NP-hard graphs problems, namely the clique and coloring problems.

In the dynamic observer case, we assume that an observer can decide after each new observation the set of events it is going to watch. We provide a definition of the dynamic observer synthesis problem and then show that computing a dynamic observer for a given plant, can be reduced to a problem of computing strategies in a game. We further investigate optimization problems for dynamic observers and define a notion of cost of an observer. Finally we show how to compute an optimal (cost-wise) dynamic observer.

Related work. In Section3.1we give a new polynomial time algorithm to check diagnosability. This result itself (testing diagnosability in polynomial time) is not new and was already reported in [10,23]. Nevertheless the proof we give is original and very simple and applies to plants that are specified by non-deterministic automata with no assumption on non observable loops as in [10, 23]. We can also derive easily a polynomial time algorithm to compute the minimumk such that a DES is k-diagnosable

and thus improve a result from [20] but again with a very simple proof.

1 _{Different types of faults could also be considered, by having different fault events f1}_{, f}

2, and so on. Our methods can be extended in a straightforward way to deal with multiple faults. We restrict our presentation to a single fault event for the sake of simplicity. Also note that f is assumed to be unobservable. In the case where f can be directly observed, the task of fault diagnosis becomes trivial, since it suffices to report f as soon as it is observed.

(5)

NP-hardness of finding minimum-cardinality sets of observable events so that diagnosability holds under the standard, projection-based setting has been previously reported in [22]. Our result of section3.2

can be viewed as an alternative shorter proof of this result. Masks have not been considered in [22]. As we show in section4 a reduction from the mask version of the problem to the standard version is not straightforward. Thus the result in section4is useful and new.

The complexity of finding “optimal” observation masks, i.e., a set that cannot be reduced, has been considered in [11] where it was shown that the problem is NP-hard for general properties. [11] also shows that finding optimal observation masks is polynomial for “mask-monotonic” properties where increasing the set of observable (or distinguishable) events preserves the property in question. Diagnosability is a mask-monotonic property. However, the notion of “optimality” considered in this work differs from ours: [11] considers the computation of a minimal partition of the set of events w.r.t. partition refinement ordering, and not the computation of the minimum cardinality of the partitions that ensure diagnosability. Notice that optimal observation masks are not the same as minimum-cardinality masks that we consider in this paper.

In [6], the authors investigate the problem of computing a minimal-cost strategy that allows to find a subset of the set of observable events such that the system is diagnosable. It is assumed that each such subset has a known associated cost, as well as a known a-priori probability for achieving diagnosability. Dynamic observers are not considered in this work.

Dynamic observers have received little attention in the context of fault diagnosis. The only work we are aware of is the one described in [18]. Our work has been developed independently (Cf. previous versions of this paper [1,2]). Although the general goals of this work and ours are the same, namely, dynamic on/off switching of sensors in order to achieve cost savings, the setting and the approach of [18] are quite different from ours:

• The definitions of dynamic observers differ in the two settings: for instance, for the example of

section 4.2.1 of [18], there is no finite-cost solution for Problem CD of [18] (which corresponds to Problem5in our paper). In our setting the same example admits a finite-cost solution.

• the purpose of [18] is to compute one optimal observation policy (using dynamic programming). In contrast, we solve two problems: first, we compute the set of all dynamic observers that can be used to diagnose a system; second we compute all the optimal observers. We do this using well-known game-theoretic approaches.

• there is no upper bound on the complexity of the procedures given in [18]. We show that our solutions are in 2EXPTIME.

• in [18], the authors compute dynamic observers for stochastic systems which we do not consider. They also consider the control problem under dynamic observation which we do not consider. Notice that the control problem they consider is a Ramadge & Wonham control problem i.e., on languages on finite words, and thus, the method we propose in paragraph5.4could be applied as well.

Organisation of the paper. In Section 2 we fix notation and introduce finite automata with faults to model DES. In Section 3 we present a new algorithm for testing diagnosability. We also show NP-completeness of the sensor minimization problem for the standard projection-based observation setting.

(6)

In Section4we show NP-completeness of the sensor minimization problem for the mask-based setting. In Section 5 we introduce and study dynamic observers and show that the most permissive dynamic observer can be computed as the most permissive (finite-memory) winning strategy in a safety 2-player game.

We also define a notion of cost for dynamic observers in Section6and show that the cost of a given observer can be computed using Karp’s algorithm. Finally, we define the optimal-cost observer synthesis problem and show it can be solved using Zwick and Paterson’s result on graph games.

2. Preliminaries

2.1. Words and Languages

LetΣ be a finite alphabet and Στ = Σ ∪ {τ }. Σ∗ (resp. Σω) is the set of finite (resp. infinite) words overΣ and contains ε which is the empty word. We let Σ+ = Σ∗\ {ε}. A language L is any subset of Σ∗_{∪ Σ}ω_{. Given two words}_{ρ ∈ Σ}∗_and_ρ′ _{∈ Σ}∗_{∪ Σ}ω_{we denote}_ρ.ρ′_{the concatenation of}_{ρ and ρ}′_which

is defined in the usual way. |ρ| stands for the length of the word ρ (the length of the empty word is zero,

the length of an infinite word is∞) and |ρ|_λ withλ ∈ Σ stands for the number of occurrences of λ in ρ.

We also use the notation|S| to denote the cardinality of a set S. Given Σ1⊆ Σ, we define the projection

operator on finite words, π/Σ1 : Σ

∗ _{→ Σ}∗

1, recursively as follows: π/Σ1(ε) = ε and for a ∈ Σ, ρ ∈ Σ ∗_, π

/Σ1(a.ρ) = a.π/Σ1(ρ) if a ∈ Σ1and π/Σ1(ρ) otherwise.

2.2. Automata

Definition 2.1. (Finite Automaton)

An automaton A is a tuple (Q, q0, Στ, δ, F, R) with Q a set of states, q0 ∈ Q is the initial state, δ ⊆ Q × Στ_{× 2}Q_{is the transition relation and}_{F ⊆ Q, R ⊆ Q are receptively the set of final and repeated}

states. We writeq −→ qλ ′_if_q′_{∈ δ(q, λ). For q ∈ Q, Enabled(q) is the set of actions enabled at q, i.e., the}

set ofλ such that q−→ qλ ′_{for some}_q′_.

IfQ is finite, A is a finite automaton. An automaton is deterministic if for any q ∈ Q, |δ(q, τ )| = 0 and

for anyλ ∈ Σ, |δ(q, λ)| ≤ 1. A labeled automaton is a pair (A, L) where A = (Q, q0, Στ, δ, F, R) is an

automaton andL : Q → P where P is a finite set of observations.

A runρ from state s in A is a finite or infinite sequence of transitions

s0−−→ sλ1 1 −−→ sλ2 2· · · sn−1−−→ sλn n· · · (1)

s.t. λi ∈ Στ and s0 = s. If ρ is finite, of length n, and ends in sn we let tgt(ρ) = sn. The set

of finite (resp. infinite) runs from s in A is denoted Runs(s, A) (resp. Runsω(s, A)) and we define

Runs(A) = Runs(q0, A) and Runsω(A) = Runsω(q0, A). The trace of the run ρ, denoted tr(ρ), is the

word obtained by concatenating the symbolsλiappearing inρ, for those λi different fromτ . The finite

word w ∈ Σ∗ _{is accepted by}_{A if w = tr(ρ) for some ρ ∈ Runs(A) with tgt(ρ) ∈ F . The language}

of finite wordsL(A) of A is the set of words accepted by A. The infinite word w ∈ Σω _{is accepted by} A if w = tr(ρ) for some ρ ∈ Runsω(A) and there is an infinite set of indices I = {i0, i1, · · · , ik, · · · }

s.t.sl∈ R for each l ∈ I. The language of infinite words Lω(A) of A is the set of infinite words accepted

(7)

2.3. Discrete-Event Systems

In this paper we use finite automata that generate prefix-closed languages of finite words to model discrete-event systems, hence we do not always need a set of final or repeated states (we assumeF = Q

andR = ∅ if not otherwise stated).

Letf 6∈ Στ be a fresh letter that corresponds to the fault action. We also letΣτ,f = Στ ∪ {f } and A = (Q, q0, Στ,f, δ). Given V ⊆ Runs(A), Tr(V ) = {tr(ρ) for ρ ∈ V } is the set of traces of the runs

inV . A run ρ as in equation (1) isk-faulty if there is some 1 ≤ i ≤ n s.t. λi = f and n − i ≥ k.

Notice thatρ can be infinite and in this case n = ∞ and n − i ≥ k always holds. Faulty_≥k(A) is the

set of k-faulty runs of A. A run is faulty if it is k-faulty for some k ∈ N and Faulty(A) denotes the

set of faulty runs. It follows that Faulty_≥k+1(A) ⊆ Faulty_≥k(A) ⊆ · · · ⊆ Faulty_≥0(A) = Faulty(A).

Finally, NonFaulty(A) = Runs(A) \ Faulty(A) is the set on non-faulty runs of A. We let Faultytr

≥k(A) =

Tr(Faulty_≥k(A)) and NonFaultytr_{(A) = Tr(NonFaulty(A)) be the sets of traces of faulty and non-faulty}

runs.

We assume that each faulty run ofA of length n can be extended into a run of length n + 1. This is

required for technical reasons (in order to guarantee that the set of faulty runs where sufficient “logical” time has elapsed after the fault is well-defined) and can be achieved by addingτ loop-transitions to each

deadlock state ofA. Notice that this transformation does not change the observations produced by the

plant, thus, any observer synthesized for the transformed plant also applies to the original one.

2.4. Product of Automata

The product of automata with τ -transitions is defined in the usual way: the automata synchronize on

common labels except forτ . Let A1 = (Q1, q₀1, Στ₁, δ1) and A2 = (Q2, q₀2, Στ₂, δ2). The product of A1

andA2is the automatonA1× A2 = (Q, q0, Στ, δ) where: • Q = Q1× Q2,

• q0 = (q10, q02), • Σ = Σ1∪ Σ2,

• δ ⊆ Q × Στ _{× Q is defined by (q}

1, q2)−→ (qσ 1′, q′2) if:

– eitherσ ∈ Σ1∩ Σ2andqk −→kσ q′_k, fork = 1, 2,

– orσ ∈ (Σi\ Σ3−i) ∪ {τ } and qi −→iσ q′iandq′3−i= q3−i, fori = 1 or i = 2.

3. Sensor Minimization with Static Observers

In this section we address the sensor minimization problem for static observers. We point out that the result in this section was already obtained in [21] and we only give here an alternative shorter proof. We are given a finite automatonA = (Q, q0, Στ,f, δ). Such an automaton A models a plant. Notice that A

is as in Definition2.1with the addition that its alphabet includes both the unobservable non-fault event

(8)

A diagnoser is a device that observes the plant and raises an “alarm” whenever it detects a fault. We allow the diagnoser to raise an alarm not necessarily immediately after the fault occurs, but possibly some time later, as long as this time is bounded by somek ∈ N, where N is the set of non-negative integers.

We model time by counting the “moves” the plant makes (including observable and unobservable ones). If the system generates a word ρ but only a subset Σo ⊆ Σ is observable, the diagnoser can only see π_/Σ

o(ρ).

Definition 3.1. ((Σo, k)-Diagnoser)

Let A be a finite automaton over Στ,f, k ∈ N, Σo ⊆ Σ. A mapping D : Σ∗

o → {0, 1} is a (Σo,

k)-diagnoser forA if:

• for each ρ ∈ NonFaulty(A), D(π/Σo(tr(ρ))) = 0,

• for each ρ ∈ Faulty_≥k(A), D(π_/Σ_o(tr(ρ))) = 1.

A is (Σo, k)-diagnosable if there is a (Σo, k)-diagnoser for A. A is Σo-diagnosable if there is some k ∈ N s.t. A is (Σo, k)-diagnosable.

Remark 3.1. At this stage, we do not require the mapping D to be effectively computable or even

regular.

Example 3.1. LetA be the automaton shown on Fig.2. The run with one actionf is in Faulty_≥0(A),

the run f.a is in Faulty_≥1(A) and a.τ2 is in NonFaulty(A). A is neither {a}-diagnosable, nor

{b}-• 0 • 1 • 2 • 3 • 4 τ τ f a b a b

Figure 2. The automatonA.

diagnosable. This is because, for any k, the faulty run f.a.b.τk gives the same observation as the non-faulty runa.τk_{(in case}_{a is the observable event) or the non-faulty run b.τ}k_{(in case}_{b is the observable}

event). Consequently, the diagnoser cannot distinguish between the two no matter how long it waits. If botha and b are observable, however, we can define: D(a.b.ρ) = 1 for any ρ ∈ {a, b}∗ and D(ρ) = 0

otherwise. D is a ({a, b}, 2)-diagnoser for A, thus, A is {a, b}-diagnosable.

3.1. Checking Diagnosability

LetA = (Q, q0, Στ,f, →) be a finite automaton over Στ,f. In this subsection we give a polynomial time

algorithm to check diagnosability.

As reported in [10,23], checking whetherA is Σ-diagnosable can be done in polynomial time in the

size ofA, more precisely in O(|A|2). We improve the previous results of [10,23,20] on this subject with respect to the following directions:

(9)

• compared to [23], the product automaton (denotedA1× A2 in the sequel) that we build has fewer

states; moreover our result applies even if the plant is given by a non-deterministic automaton with non-observable loops.

• in [10], in case there are many types of faults, the algorithm for checking diagnosability runs in exponential time. We can use our algorithm to check each fault at a time and obtain a polynomial-time algorithm for multiple types of faults.

• to compute the maximum delay k to diagnose a system, [20] computes the strongly connected components of a product automaton and then reduces the problem of computingk, to a

shortest-path problem in a graph. This proves thatk can be computed in cubic time in the size of the plant.

We improve this result with our construction and obtain a quadratic algorithm.

Altogether, this section provides a uniform and elegant setting to check diagnosability of discrete-event systems with multiple faults types and to compute the maximum delay. Moreover the proofs are very easy and short. It also reduces the diagnosability problem to the B ¨uchi emptiness problem: very efficient algorithms [4] are known to solve this problem, for instance on-the-fly algorithms [8]. In this respect, our algorithm can be very useful because it is implemented in very efficient tools like SPIN [9] to check for B ¨uchi emptiness.

An Algorithm to Check Diagnosability. LetA = (Q, q0, Στ,f, →) be a finite automaton over Στ,f. We

assume thatΣo = Σ i.e., all the events in Σ are observable and thus τ is the only unobservable action.

By definition3.1,A is diagnosable iff ∃k ∈ N∗s.t. A is (Σ, k)-diagnosable. This is equivalent to: A is not diagnosable ⇐⇒ ∀k ∈ N∗, A is not (Σ, k)-diagnosable (2)

⇐⇒ ∀k ∈ N∗, (

∃ρ ∈ NonFaulty(A) and ∃ρ′ _{∈ Faulty} ≥k(A)

s.t. π/Σ(tr(ρ)) = π/Σ(tr(ρ′))

(3)

There is also a language-based definition of(Σ, k)-diagnosability: A is (Σ, k)-diagnosable iff π

/Σ(Faultytr≥k(A)) ∩ π/Σ(NonFaultytr(A)) = ∅ (4)

or in other words, there is no pair of runs(ρ1, ρ2) with ρ1 ∈ Faulty≥k(A), ρ2 ∈ NonFaulty(A) s.t. ρ1and ρ2give the same observations onΣ. To check diagnosability, we build a product automaton A1×A2such thatA1behaves likeA but records whether a fault occurred or not, and A2produces only the non-faulty

runs ofA. Define A1 = (Q × {0, 1}, (q0, 0), Στ, →1) s.t. • (q, n)−→1l (q′, n) iff q−→ ql ′_with_{l ∈ Σ;}

• (q, n)−→1τ (q′, 1) iff q −→ qf ′, (n is set to 1 after a fault occurs); • (q, n)−→1τ (q′, n) iff q−→ qτ ′.

DefineA2 = (Q, q0, Στ, →2) with • q−→2l q′_if_q ₋_{→ q}l ′ _and_{l ∈ Σ;}

(10)

• q−→2τ q′ ifq−→ qτ ′.

LetA1× A2 be the synchronized product ofA1 andA2: synchronization happens on common actions

except τ , i.e., actions in Σ and otherwise interleaving (Cf. section2.4). →1,2 stands for the transition relation ofA1× A2. We assume we have a predicateA1Move (resp.A2Move) on transitions ofA1× A2

which is true when A1 (resp. A2) participates in anA1× A2 transition (i.e.,A1Move(t) is true for all

transitions exceptτ transitions taken by A2). For a runρ ∈ Runs(A1× A2) we let ρ|1, be the run ofA1

defined by the sequence ofA1Move transitions ofρ and ρ|2be the sequence ofA2Move moves. Notice

that π_/Σ(tr(ρ_|1)) = π_/Σ(tr(ρ_|2)) by definition of A1× A2. Let Runs≥k(A1× A2) be the set of runs in A1× A2s.t. a (faulty) state((q1, 1), q2) is followed by at least k A1-actions. Then we have the following

lemmas:

Lemma 3.1. Letρ ∈ Runs_≥k(A1× A2). Then ρ|1 ∈ Faulty≥k(A) and ρ|2 ∈ NonFaulty(A). Moreover π

/Σ(tr(ρ|1)) = π/Σ(tr(ρ|2)).

Lemma 3.2. Let ρ1 ∈ Faulty≥k(A) and ρ2 ∈ NonFaulty(A) s.t. π/Σ(tr(ρ1)) = π/Σ(tr(ρ2)). Then

there is someρ ∈ Runs≥k(A1× A2) s.t. ρ|1= ρ1 andρ|2= ρ2.

Proof:

By definition ofA1× A2, it suffices to notice that a run in Runs≥k(A1× A2) can be split into a run of A1and a run ofA2having the same projection onΣ. ⊓⊔

We can define an automaton B which is an extended version of A1 × A2: we add a boolean variable z that is set to 0 in the initial state of B. An extended state of B is a pair (s, z) with s a state of A1 × A2. Whenever A1 participates in an A1 × A2-action, z is set to 1, and when only A2 makes

a move inA1× A2, z is set 0. We denote −→B the new transition relation between extended states;

(s, z)−−→Bσ (s′, z′) if there is a transition t : s−−→1,2σ s′, andz′ = 1 if A1Move(t), and z′ = 0 otherwise.

Define the set of states RB = {(((q, 1), q′), 1) | ((q, 1), q′) ∈ A1× A2}. The B¨uchi automaton B is the

tuple ((Q × {0, 1} × Q) × {0, 1}, ((q0, 0), q0, 0), Στ, −→B, ∅, RB) with RB the set of repeated states. The following theorem holds:

Theorem 3.1. Lω_{(B) 6= ∅ ⇐⇒ A is not Σ-diagnosable.}

Proof:

=⇒ Assume Lω(B) 6= ∅. Let ρ ∈ Lω(B): ρ has infinitely many faulty states and infinitely many A1-actions because of the definition ofRB. Letk ∈ N. Let ρ[ik] be a finite prefix of ρ that contains

more thank A1-actions (for anyk this ikexists becauseρ contains infinitely many A1-actions). ρ[ik] ∈

Runs≥k(A1× A2) and by Lemma3.1, it follows thatρ[ik]|1∈ Faulty≥k(A) and ρ[ik]|2∈ NonFaulty(A)

and π_/Σ(tr(ρ_|1)) = π_/Σ(tr(ρ_|2)). Thus equation (3) is satisfied andA is not diagnosable.

⇐= Conversely assume A is not Σ-diagnosable. Then by equation (3) and Lemma3.2, for anyk ∈ N,

there is a runρk∈ Runs≥k(A1× A2). As A1× A2is finite, there must be a state(q1, 1), q2inA1× A2

which is the source of an infinite number of runs ρ′_k having more thank A1-actions. As A1× A2 is

of finite branching, by K ¨onig’s Lemma, there is a infinite run in A1 × A2 containing infinitely many

(11)

This gives a procedure to check whether an automatonA is diagnosable or not. The product

(exten-ded) system built fromA1× A2has size4 · |A|2i.e.,O(|A|2). Checking emptiness of a B¨uchi automaton

havingn states and m transitions can be done in O(n + m). Thus, checking diagnosability of a finite

(non-deterministic) automaton can be done inO(|A|2) i.e., is also polynomial.

In the case of a finite number of failure types, it suffices to designA1×A2for each type of faults [23]

and check for diagnosability. Thus checking diagnosability with multiple type of faults is also polyno-mial.

Computation of the Maximal Delay to Detect Faults. To compute the smallest k s.t. A is (Σ,

k)-diagnosable we proceed as follows: in case A is Σ-diagnosable, there is no infinite faulty run with

infinitely manyA1-actions inA1× A2; this implies that each run beginning in a faulty state ofA1× A2

is followed by a finite number of A1-actions and this number is bounded by a constant α. Indeed,

otherwise, we could construct an infinite faulty run with a infinite number ofA1-actions andA would

not be diagnosable. Assumek is the maximum number of A1-actions that can follow a faulty state in A1× A2. ThenA is (Σ, k + 1)-diagnosable: otherwise there would be a faulty run with k + 1 A1-actions

after a faulty state inA1× A2. AlsoA is not (Σ, k)-diagnosable because there is faulty run with k A1

-actions after a faulty state inA1× A2, which also means there is another non faulty run with the same observable actions. Thusk + 1 is the minimum value for which A is Σ-diagnosable or in other words, it

is the maximal delay after which a fault will be announced. Computingk + 1 amounts to computing the

maximum number ofA1-actions that can follow a faulty state inA1× A2.

The computation ofk amounts to computing the length of a longest path2 _{in a graph which can be} done in linear time w.r.t. size of the graph. As the size of A1 × A2 isO(|Q|2) we can compute k in

quadratic time which is better than the boundO(|Q|3_{) of [}₂₀_].

Synthesis of a Diagnoser. To compute the diagnoser we do the following: takeA1and determinize it (using the standard subset construction). The deterministic automatonA′ obtained this way has at most

2O(|Q|)_{states. If a state}_{S of the deterministic version A}′_{contains only faulty states, i.e., every}_{(q, k) ∈ S}

is such thatk = 1 then declare S as faulty.

Define the diagnoser D as follows: for a run ρ ∈ Runs(A), D(ρ) = 1 if S0

tr(ρ)

−−→ S′ inA′ and S′ is faulty; otherwiseD(ρ) = 0. D is a diagnoser for A and the maximum delay to announce a fault is k + 1

steps if k is the minimum value for which A is (Σ, k)-diagnosable. Computing the diagnoser is thus

exponential in the size ofA, i.e., can be done in 2O(|A|).

3.2. Minimum Cardinality Set for Static Observers

In this section we address the problem of finding a set of observable events Σo that allows faults to be

detected. We would like to detect faults using as few observable events as possible. We want to decide whether there is a subset Σo ( Σ such that the fault can be detected by observing only events in Σo. Moreover, we would like to find an “optimal” suchΣo(e.g., one with minimal number of events).

2

Actually it amounts to computing the maximum number of A1-actions following a faulty state in A1× A2. This can be done using a depth-first search algorithm and a tag on each faulty state s that gives the maximal number of A1-actions following s.

(12)

Problem 1. (Minimum Number of Observable Events)

INPUT: Plant modelA = (Q, q0, Στ,f, δ), n ∈ N s.t. n ≤ |Σ|.

PROBLEMS:

(A) Is there anyΣo ⊆ Σ with |Σo| = n, such that A is Σo-diagnosable ?

(B) If the answer to (A) is “yes”, find the minimum n0 such that there exists Σo ⊆ Σ with |Σo| = n0

andA is Σo-diagnosable.

If we know how to solve Problem1(A) efficiently then we can also solve Problem1(B) efficiently: we perform a binary search overn between 0 and |Σ|, and solve Problem1(A) for each suchn, until we

find the minimumn0for which Problem1(A) gives a positive answer.3 Unfortunately, Problem1(A) is

a combinatorial problem, exponential in|Σ|, as we show next.

Theorem 3.2. Problem1(A) is NP-complete.

Proof:

Membership in NP is proved using the result in section3.1: if we guess a solutionΣowe can check that A is Σo-diagnosable in time polynomial in|A|. Here we provide the proof of NP-hardness by giving a

reduction of then-clique problem. Let G = (V, E) be a (undirected) graph where V is set of vertices

and E ⊆ V × V is a set of edges (we assume that (v, v′_{) ∈ E ⇐⇒ (v}′_{, v) ∈ E). A clique in G is}

a subset V′ ⊆ V such that for all v, v′ _{∈ V}′_, _{(v, v}′_{) ∈ E. The n-clique problem asks the following:}

determine whetherG contains a clique of n vertices.

The reduction is as follows. GivenG, we build a finite automaton AGsuch that there is an-clique in G iff AGisΣo-diagnosable, where|Σ \ Σo| = n. Notice that since Σo ⊆ Σ, |Σ \ Σo| = n is equivalent

to|Σo| = |Σ| − n. Thus, there is a n-clique in G iff n events from Σ do not need to be observed i.e., iff

Problem1(A) gives a positive answer for|Σ| − n.

Starting fromG we define Σ = V . Then we define AG as shown in Fig. 3: q0 is the initial state

and the “branches”f.a.b, f.a′_.b′_,_{· · · are obtained from the pairs of nodes (a, b), (a}′_{, b}′_{), · · · that are not}

linked with an edge inG: (a, b), (a′, b′), · · · 6∈ E.

q0 • • • • • • • • • f _a _b τ f a′ b′ τ f a′′ b′′ τ

Figure 3. The automatonAG.

3

Notice that knowing n0does not imply we know the required set of observable eventsΣo. We can find (one of the possibly many)Σo by searching over all possible subsetsΣo ⊆ Σ of size n0(there are C(|Σ|, n0) such combinations) and check for each suchΣowhether A isΣo-diagnosable, using the methods described in section3.1.

(13)

If part. Assume Problem1(A) gives a positive answer for|Σ|−n. This means that there exists Σo⊆ Σ

such that|Σo| = |Σ| − n, or |Σ \ Σo| = n, and AGisΣo-diagnosable. Then we claim thatC = Σ \ Σo

is a clique inG. Assume C is not a clique. Then there must be some {a, b} ⊆ C such that (a, b) 6∈ E.

By the construction of Fig.3, there is a branchf.a.b in AGand thus bothρ = ε and ρ′ = f.a.b.τkare

runs ofAG, for anyk. As {a, b} ⊆ C, {a, b} ∩ Σo = ∅ and the observation of both runs4ρ and ρ′isε,

no diagnoser exists that can distinguish between the two runs, for anyk.

Only if part. Assume there exists a n-clique in G. Let Σo = Σ \ C. We claim that AG is(Σo,

2)-diagnosable (thus also Σo-diagnosable). Suppose not. Then there exist two runs ρ and ρ′ such that ρ′ ∈ NonFaulty(AG) and ρ = ρ1.f.ρ2, and |ρ2| ≥ 2, and π/Σo(ρ) = π/Σo(ρ

′_{). Then, ρ must be of}

the form ρ = f.a.b.τk, with(a, b) 6∈ E. Also, the only way ρ′ can be non faulty is ρ′ = ε. Then π

/Σo(ρ

′_{) = ε = π}

/Σo(ρ), thus, both a and b must be non-observable, thus {a, b} ∩ Σo = ∅ which entails {a, b} ⊆ C. This implies that (a, b) ∈ E which is a contradiction. ⊓⊔

Remark 3.2. Notice that in Problem 1, the maximum number of steps after which a fault has to be reported, k, is not specified as an input. Adding k to the set of inputs results in answers that may

generally depend onk. For instance, in the automaton E of Fig.4, if we impose that a fault be reported withink = 2 steps, then we need to observe both a and b; whereas if we leave k unspecified, we need

only observea and we can diagnose a fault after 3 steps (i.e., 2 a’s).

• • • • • • τ τ f a b a a b

Figure 4. The automatonE.

4. Sensor Minimization with Masks

So far we have assumed that observable events are also distinguishable. However, there are cases where two eventsa and b are observable but not distinguishable, that is, the diagnoser knows that a or b occurred,

but not which of the two. This is not the same as consideringa and b to be unobservable, since in that case

the diagnoser would not be able to detect occurrence ofa or b. Distinguishability of events is captured

by the notion of a mask [3].

Definition 4.1. (Mask)

A mask(M, n) over Σ is a total, surjective function M : Σ → {1, · · · , n} ∪ {ε}. M induces a morphism M∗ : Σ∗→ {1, · · · , n}∗, whereM∗(ε) = ε and M∗(a.ρ) = M (a).M∗(ρ),

fora ∈ Σ and ρ ∈ Σ∗_{. For example, if}_{Σ = {a, b, c, d}, n = 2 and M (a) = M (b) = 1, M (c) = 2,} M (d) = ε, then we have M∗(a.b.c.b.d) = 1.1.2.1 = M∗(a.a.d.c.a).

4

When a run can be uniquely defined by its sequence of actions, we omit the intermediate states and we can use the trace to characterize the run.

(14)

Definition 4.2. ((M, n), k)-diagnoser)

Let(M, n) be a mask over Σ. A mapping D : {1, · · · , n}∗ _{→ {0, 1} is a ((M, n), k)-diagnoser for A}

if:

• for each ρ ∈ NonFaulty(A), D(M∗_(π

/Σ(tr(ρ)))) = 0;

• for each ρ ∈ Faulty≥k(A), D(M∗(π/Σ(tr(ρ)))) = 1.

A is ((M, n), k)-diagnosable if there is a ((M, n), k)-diagnoser for A. A is said to be (M,

n)-diagnosable if there is somek such that A is ((M, n), k)-diagnosable.

GivenA and a mask (M, n), checking whether A is ((M, n)-diagnosable can be done in polynomial

time. In fact it can be reduced to checkingΣo-diagnosability of a modified automatonAM, withΣo = {1, ..., n}. AM is obtained from A by renaming the actions a ∈ Σ by M (a) or τ if M (a) = ε. It

can be seen that A is ((M, n)-diagnosable iff AM is {1, · · · , n}-diagnosable. Also notice that A is ((M, n), k)-diagnosable iff

M∗(π/Σ(Faultytr≥k(A))) ∩ M∗(π/Σ(NonFaultytr(A))) = ∅.

As in the previous section, we are mostly interested in minimizing the observability requirements while maintaining diagnosability. In the context of diagnosis with masks, this means minimizing the numbern of distinct outputs of the mask M . We thus define the following problem:

Problem 2. (Minimum Mask)

INPUT: Plant modelA = (Q, q0, Στ,f, δ), n ∈ N s.t. n ≤ |Σ|.

PROBLEM:

(A) Is there any mask(M, n) such that A is (M, n)-diagnosable ?

(B) If the answer to (A) is “yes”, find the minimumn0such that there is a mask(M, n0) such that A is (M, n0)-diagnosable.

As with Problem1, if we know how to solve Problem2(A) efficiently we also know how to solve Problem2(B) efficiently: again, a binary search onn suffices.

We will prove that Problem2is NP-complete. One might think that this result follows easily from Theorem3.2. However, this is not the case. Obviously, a solution to Problem1provides a solution to Problem 2: assume there exists Σo such thatA is (Σo, k)-diagnosable and Σo = {a1, ..., an}; define

a mask M : Σ → {1, · · · , n} such that M (ai) = i and for any a ∈ Σ \ Σo, M (a) = ε. Then, A

is((M, n), k)-diagnosable. However, a positive answer to Problem 2(A) does not necessarily imply a positive answer to Problem1(A), as shown by the example that follows.

Example 4.1. Consider again the automaton A of Fig. 2. Let M (a) = M (b) = 1. Then A is ((M, 1), 2)-diagnosable because we can build a diagnoser D defined by: D(ε) = 0, D(1) = 0, D(12_{.ρ) =} 1 for any ρ ∈ 1∗. However, as we said before, there is no strict subset of{a, b} that allows A to be

dia-gnosed.5

5

(15)

Instead of using the previous result, we provide a direct proof of NP-hardness of Problem2.

Theorem 4.1. Problem2is NP-complete.

Proof:

Membership in NP is again justified by the fact that checking whether a guessed mask works can be done in polynomial time (it suffices to rename the events of the system according toM and apply the

algorithm of section 3.1). We show NP-hardness using a reduction of thecoloring problem. The

n-coloring problem asks the following: given an undirected graphG = (V, E), is it possible to color the

vertices with colors in{1, 2, · · · , n} so that no two adjacent vertices have the same color ?

LetG = (V, E) be an undirected graph. Let E = {e1, e2, · · · , ej} be the set of edges with ei = (ui, vi). We let Σ = V and define the automaton AGas pictured in Fig.5. The initial state ofAGisq0.

Widget fore2 q0 • • • • • • • • • • • • • • • • • • • u1 f v1 u2 f v2 uj f vj v1 _τ _u 1 v2 _τ _u 2 vj _τ _u j · · · τ τ τ τ

Figure 5. AutomatonAGforn-colorizability.

We claim thatG is n-colorizable iff AGis(M, n)-diagnosable.

If part AssumeAGis(M, n)-diagnosable for n ≥ 0. We first show that for all i = 1, ..., j, M (ui) 6= ε, M (vi) 6= ε and M (ui) 6= M (vi). For any k, we can define the runs ρ = v1.τ.u1.v2.τ.u2· · · ui.f.vi.τk

andρ′ = v1.τ.u1.v2.τ.u2· · · vi.τ.ui. If eitherM (ui) = ε or M (vi) = ε or M (ui) = M (vi) holds, then

we haveM∗_(π

/Σ(tr(ρ))) = M∗(π/Σ(tr(ρ′))). Thus, for any k, there is a faulty run with more than k

events after the fault, and a non-faulty run which gives the same observation through the mask. Hence, for anyk, A cannot be ((M, n), k)-diagnosable. Thus A is not (M, n)-diagnosable which contradicts

diagnosability ofA.

Note that the above implies in particular that n ≥ 1. We can now prove that G is n-colorizable.

LetC be the color mapping defined by C(v) = M (v). We need to prove that C(ui) 6= C(vi) for any (ui, vi) ∈ E. This holds by construction of AGand the fact thatM (ui) 6= M (vi) as shown above.

Only if part AssumeG is n-colorizable. There exists a color mapping C : V → {1, 2, · · · , n} s.t. if (v, v′_{) ∈ E then C(v) 6= C(v}′_{). Define the mask M by M (a) = C(a) for a ∈ V . We claim that} AG is((M, n), 1)-diagnosable (thus, also (M, n)-diagnosable). Assume on the contrary that AGis not

((M, n), 1)-diagnosable. Then there exist two runs ρ ∈ Faulty_≥1(AG) and ρ′ ∈ NonFaulty(AG) such

thatM∗(π/Σ(tr(ρ))) = M∗(π/Σ(tr(ρ′))). As ρ is 1-faulty it must be of the form ρ = v1.τ.u1· · · ui.f.vi.τk

with1 ≤ i ≤ j and k ≥ 0. Notice that M (a) 6= ε for all a ∈ V . Hence it implies M∗(π_/Σ(tr(ρ))) = M (v1).M (u1) · · · M (ui).M (vi), and |M∗(π/Σ(tr(ρ)))| = 2i. Consequently, |M∗(π/Σ(ρ′))| = 2i.

(16)

The only possible such ρ′ which is non-faulty is ρ′ = v1.τ.u1· · · vi.τ.ui. Now, M∗(π/Σ(tr(ρ))) = M∗_(π

/Σ(tr(ρ′))), which implies M (vi) = M (ui) i.e., C(vi) = C(ui). But (ui, vi) ∈ E, and this

con-tradicts the assumption thatC is a valid coloring. ⊓⊔

5. Fault Diagnosis with Dynamic Observers

In this section we introduce dynamic observers. They can choose after each new observation the set of events they are going to watch for. To illustrate why dynamic observers can be useful consider the following example.

Example 5.1. (Dynamic Observation)

Assume we want to detect faults in automatonB of Fig.6. A static diagnoser that observesΣ = {a, b}

works, however, no proper subset ofΣ can be used to detect faults in B. Thus the minimum cardinality

of the set of observable events for diagnosingB is 2 i.e., a static observer will have to monitor two events

during the execution of the DES B. If we want to use a mask, the minimum-cardinality for a mask is

• • • • • • τ τ f a b b a

Figure 6. The automatonB.

2 as well. This means that an observer will have to be receptive to at least two inputs at each point in

time to detect a fault inB. One can think of being receptive as switching on a device to sense an event.

This consumes energy. We can be more efficient using a dynamic observer, that only turns on sensors when needed, thus saving energy. In the case ofB, this can be done as follows: in the beginning we only

switch on the a-sensor; once an a occurs the a-sensor is switched off and the b-sensor is switched on.

Compared to the previous diagnosers we use half as much energy.

5.1. Dynamic Observers

We formalize the above notion of dynamic observation using observers. The choice of the events to observe can depend on the choices the observer has made before and on the observations it has made. Moreover an observer may have unbounded memory.

Definition 5.1. (Observer)

An observer Obs over Σ is a deterministic labeled automaton6 _Obs _{= (S, s}

0, Σ, δ, L), where S is a

(possibly infinite) set of states,s0∈ S is the initial state, Σ is the set of observable events, δ : S ×Σ → S

is the transition function (a total function), andL : S → 2Σis a labeling function that specifies the set of events that the observer wishes to observe when it is at state s. We require for any state s and any

6

(17)

a ∈ Σ, if a 6∈ L(s) then δ(s, a) = s: this means the observer does not change its state when an event it

has chosen not to observe occurs.

As an observer is deterministic we use the notationδ(s0, w) to denote the state s reached after reading

the wordw and L(δ(s0, w)) is the set of events Obs observes after w.

An observer implicitly defines a transducer that consumes an input eventa ∈ Σ and, depending on the

current states, either outputs a (when a ∈ L(s)) and moves to a new state δ(s, a), or outputs ε, (when a 6∈ L(s)) and remains in the same state waiting for a new event. Thus, an observer defines a mapping

Obs fromΣ∗ toΣ∗ (we use the same name “Obs” for the automaton and the mapping). Given a runρ,

Obs(π_/Σ(tr(ρ))) is the output of the transducer on ρ. It is called the observation of ρ by Obs. We next

provide an example of a particular case of observer which can be represented by a finite-state machine.

Example 5.2. Let Obs be the observer of Fig.7. Obs maps the following inputs as follows: Obs(baab) = ab, Obs(bababbaab) = ab, Obs(bbbbba) = a and Obs(bbaaa) = a. If Obs operates on the DES B of

0 L(0) = {a} 1 L(1) = {b} 2 L(2) = ∅ a b b a a b

Figure 7. A finite-state observer Obs.

Fig.6andB generates f.a.b, Obs will have as input π/Σ(f.a.b) = a.b with Σ = {a, b}. Consequently

the observation of Obs is Obs(π/Σ(f.a.b)) = a.b.

5.2. Checking Diagnosability with Dynamic Diagnosers

Definition 5.2. ((Obs, k)-diagnoser)

LetA be a finite automaton over Στ,f and Obs be an observer overΣ. D : Σ∗ → {0, 1} is an (Obs,

k)-diagnoser forA if

• ∀ρ ∈ NonFaulty(A), D(Obs(π/Σ(tr(ρ)))) = 0 and • ∀ρ ∈ Faulty≥k(A), D(Obs(π/Σ(tr(ρ)))) = 1.

A is (Obs, k)-diagnosable if there is an (Obs, k)-diagnoser for A. A is Obs-diagnosable if there is some k such that A is (Obs, k)-diagnosable.

If a diagnoser always selects Σ as the set of observable events, it is a static observer and (Obs,

k)-diagnosability amounts to the standard(Σ, k)-diagnosis problem [16]. In this caseA is (Σ, k)-diagnosable

iff equation (4) holds.

As for Σ-diagnosability, (Obs, k)-diagnosability can be stated as a language-based equality: A is (Obs, k)-diagnosable iff

Obs(π_/Σ(Faultytr

≥k(A))) ∩ Obs(π/Σ(NonFaultytr(A))) = ∅.

(18)

Problem 3. (Finite-State Obs-Diagnosability)

INPUT: Plant modelA = (Q, q0, Στ,f, →), Obs = (S, s0, Σ, δ, L) a finite-state observer.

PROBLEM:

(A) IsA Obs-diagnosable?

(B) If the answer to (A) is “yes”, compute the minimumk such that A is (Obs, k)-diagnosable.

Theorem 5.1. Problem3is in P.

Proof:

We first prove that Problem 3(A) is in P. The proof runs as follows: we build a product automaton7

A ⊗ Obs such that: A is (Obs, k)-diagnosable ⇐⇒ A ⊗ Obs is (Σ, k)-diagnosable.

LetA = (Q, q0, Στ,f, →) be a finite automaton and Obs = (S, s0, Σ, δ, L) be a finite-state observer.

Define the automatonA ⊗ Obs = (Q × S, (q0, s0), Στ,f, →) as follows:

• (q, s)−→ (qβ ′_{, s}′_{) iff ∃λ ∈ Σ s.t. q}_{−→ q}λ ′_,_s′ _{= δ(s, λ) and β = λ if λ ∈ L(s), β = τ otherwise;} • (q, s)−→ (qλ ′_{, s) iff ∃λ ∈ {τ, f } s.t. q}_{−→ q}λ ′_.

In this proof we use Obs(ρ) instead of Obs(π/Σ(tr(ρ))) to simplify notations. To prove Theorem5.1we

use the following lemmas:

Lemma 5.1.

1. Letρ ∈ Faulty_≥k(A). There is a word ν ∈ Faultytr_≥k(A ⊗ Obs) s.t. Obs(ρ) = π/Σ(ν).

2. Letρ′∈ NonFaulty(A). There is a word ν′ _{∈ NonFaulty}tr_{(A ⊗ Obs) s.t. Obs(ρ}′_{) = π} /Σ(ν′).

Proof:

Letr ∈ Runs(A) s.t. r = q0 −→ q1a1 · · · qn−1−→ qnan . By definition ofA ⊗ Obs, the run ˜r = (q0, s0)−→b1 (q1, s1) · · · (qn−1, sn−1)−→ (qbn n, sn) such that for 1 ≤ i ≤ n:

• if ai ∈ {τ, f }, bi= aiandsi= si−1;

• if ai ∈ L(si−1), bi = aiandsi = δ(si−1, ai); • if ai 6∈ L(si−1), bi = τ and si= si−1;

is in Runs(A ⊗ Obs). Moreover (i) a1a2· · · an∈ Faultytr≥k(A) ⇐⇒ b1b2· · · bn∈ Faultytr≥k(A ⊗ Obs)

and (ii) Obs(r) = π/Σ(b1b2· · · bn) which means that Obs(r) = π/Σ(tr(˜r)).

By (i), if r ∈ Faulty_≥k(A), tr(˜r) ∈ Faultytr

≥k(A ⊗ Obs) and if r ∈ NonFaulty(A), tr(˜r) ∈

NonFaultytr(A ⊗ Obs) and in both cases Obs(r) = π/Σ(tr(˜r)) by (ii). ⊓⊔

Lemma 5.2.

7

(19)

1. Letν ∈ Faultytr

≥k(A ⊗ Obs). There is a run ρ ∈ Faulty≥k(A) s.t. Obs(ρ) = π/Σ(ν).

2. Letν′ ∈ NonFaultytr_{(A ⊗ Obs). There is a run ρ}′ _{∈ NonFaulty(A) s.t. Obs(ρ}′_{) = π} /Σ(ν′).

Proof:

This proof is the counterpart of the proof of Lemma5.1. Letr = (q0, s0)−→ (q1b1 , s1) · · · (qn−1, sn−1)−→bn (qn, sn) be in Runs(A ⊗ Obs). Define a run ˜r = q0−→ qa1 1· · · qn−1−→ qan nwith1 ≤ i ≤ n:

• if bi = f , then ai= f , • if bi ∈ Σ, then ai = bi,

• if bi = τ , choose some ai ∈ {τ } ∪ Σ \ L(si−1) s.t. qi−1−→ qiai . ˜

r is a run in Runs(A) and Obs(˜r) = π_/Σ(tr(r)). It is then easy to see that Lemma5.2holds. ⊓⊔

Now assumeA is (Obs, k)-diagnosable and A ⊗ Obs is not (Σ, k)-diagnosable. There are two words ν ∈ Faultytr_≥k(A ⊗ Obs) and ν′ ∈ NonFaultytr_{(A ⊗ Obs) s.t. π}

/Σ(ν) = π/Σ(ν′). By Lemma5.2, there

are two runsρ ∈ Faulty_≥k(A) and ρ′ ∈ NonFaulty(A) s.t. Obs(ρ) = π/Σ(ν) = π/Σ(ν′) = Obs(ρ′) and

thusA is not (Obs, k)-diagnosable which is a contradiction.

Assume A ⊗ Obs is (Σ, k)-diagnosable and A is not (Obs, k)-diagnosable. There are two runs ρ ∈ Faulty_≥k(A) and ρ′ ∈ NonFaulty(A) with Obs(ρ) = Obs(ρ′). By Lemma5.1, there are two words

ν ∈ Faultytr

≥k(A ⊗ Obs) and ν′ ∈ NonFaultytr(A ⊗ Obs) s.t. Obs(ρ) = π/Σ(ν) and π/Σ(ν′) = Obs(ρ′)

and thus π/Σ(ν) = π/Σ(ν′). This would imply that A ⊗ Obs is not (Σ, k)-diagnosable which is a

contradiction.

The number of states of A ⊗ Obs is at most |Q| × |S| and the number of transitions is bounded

by the number of transitions ofA. Hence the size of the product is polynomial in the size of the input |A| + |Obs|. Checking that A ⊗ Obs is diagnosable can be done in polynomial time (section3.1) thus Problem3.(A) is in P.

To solve Problem 3(B), we can use the same algorithm as for finding the minimum k for a static

diagnoser, explained in section 3.1. In this case we apply it to the product A ⊗ Obs. The algorithm is

polynomial in the size ofA ⊗ Obs, thus Problem3(B) is also in P. This completes the proof. ⊓⊔

Example 5.3. LetA be the DES given in Fig.2and Obs the observer of Fig.7. The productA ⊗ Obs

used in the above proof is given in Fig.8. (Notice that this is different from the synchronized product (see section2.4)A × Obs since it uses the labeling information available in Obs.) Using an algorithm

for checkingΣ-diagnosability of A ⊗ Obs we obtain that it is (Σ, 2)-diagnosable (and 2 is the minimum

value). HenceA is (Obs, 2)-diagnosable with 2 the minimum value.

For Problem 3, we have assumed that an observer was given. It would be even better if we could

synthesize an observer Obs such that the plant is Obs-diagnosable. Before attempting to synthesize such

an observer, we should first check that the plant isΣ-diagnosable: if it is not, then obviously no such

observer exists; if the plant isΣ-diagnosable, then the trivial observer that observes all events in Σ at all

times works8. As a first step towards synthesizing non-trivial observers, we can attempt to compute the set of all valid observers, which includes the trivial one but also non-trivial ones (if they exist).

8

Notice that this also shows that existence of an observer implies existence of a finite-state observer, since the trivial observer is finite-state.

(20)

• • • • • • τ τ τ f a b a τ

Figure 8. The productA ⊗ Obs of the automaton of Fig.2and the observer of Fig.7.

Problem 4. (Dynamic-Diagnosability)

INPUT: Plant modelA = (Q, q0, Στ,f, δ).

PROBLEM: Compute the setO of all observers such that A is Obs-diagnosable iff Obs ∈ O.

We do not provide a solution to Problem4in this paper. The problem is left open and is part of our future work. Instead, we focus on the following problem:

Problem 5. (Dynamic-k-Diagnosability)

INPUT: Plant modelA = (Q, q0, Στ,f, δ), k ∈ N.

PROBLEM: Compute a set of observersO such that for any observer Obs, A is (Obs, k)-diagnosable iff

Obs∈ O. That is, O includes all observers (and only those) Obs for which A is (Obs, k)-diagnosable.

In the next subsection, we reduce Problem5to a safety control problem with partial information. Before giving a solution to Problem5, we review some useful results on safety games that will be needed in the sequel.

5.3. Some Results on Safety Games Under Partial Observation

Computing controllers for system under partial observation has already been studied in [7,13,14]. These results apply to general alternating two-player games under partial observation and also for very powerful control objectives (B ¨uchi control objectives). In the sequel we need to use results of the type of those presented in [7,13,14] but for a very particular type of two-player games, and only for safety objectives. This is why we provide hereafter the definitions and algorithms to solve this particular instance.

We model a control problem as a game problem between two players. In our setting, partial obser-vation only comes from the fact that Player 1 cannot observe all Player 2 moves. Still Player 1 has to enforce a safety objective i.e., to force the system to stay in a designated set of (safe) states. Player 1 must also play according to the sequence of events it has observed so far: the strategy of Player 1 must be trace-based. In the sequel we formalize these assumptions and give some useful results on this type of games.

5.3.1. Games, Plays, Safety Objective

In each state of the game, it is either the turn of Player 1 to play or the turn of Player 2 to play, but not both (i.e., the moves of the two players alternate). The game starts in a Player 1 state. The rules of the game are as follows:

(21)

• Player 1 always plays one move (in the alphabet Σ1) and hands it over to player 2;

• Player 2 can play two types of moves: invisible moves (τ is the invisible action) and visible moves

(Σ2-actions). If Player 2 plays an invisible moveτ , it is again his turn to play. Otherwise if he

plays a visible move, the turn switches back to Player 1. Let⊎ denote the union of disjoint sets.

Definition 5.3. (Two-Playerτ -Games)

A two-playerτ -game G is a tuple (Q1⊎ Q2, q0, Σ1⊎ Στ2, δ) with: • Qi,i = 1, 2 is a finite set of states for player i;

• q0 ∈ Q1is the initial state of the game;

• Σiis a finite set of visible actions for playeri, i = 1, 2;

• δ ⊆ (Q1× Σ1× Q2) ∪ (Q2× {τ } × Q2) ∪ (Q2× Σ2× Q1) is the transition relation.

Remark 5.1. The type ofδ is set according to the rules of the game.

We further assume that the game is deterministic w.r.t Player1 moves, i.e., for all q1 ∈ Q1, σ1 ∈ Σ1there

is at most one stateq2 ∈ Q2 such that(q1, σ1, q2) ∈ δ. For (q, a, q′) ∈ δ we also use the notation q a q′

to write sequences of transitions in a compact manner. IfG contains no τ transitions, G is an alternating

full-observation two-player game (we use the term two-player game in this case).

Definition 5.4. (Play, Trace)

A play inG is a finite or infinite sequence

ρ = q0ℓ1q1· · · qnℓn+1qn+1· · ·

such that for eachi, qi ℓi+1

−−−→ qi+1. We writeq0 −−−−−→ qnℓ1ℓ2···ℓn ifq0ℓ1q1· · · ℓnqnis a finite play inG. We

let Runs(G) be the set of finite plays in G. Runsi(G), i = 1, 2 are the sets of finite plays ending in a Qi-state. The state sequence of a playρ, denoted States(ρ) is the sequence q0q1· · · qn· · · .

We letρ(i) be the prefix of ρ up to state qi, soρ(0) = q0,ρ(n) = q0ℓ1q1· · · qn, and so on.

Definition 5.5. (Player 1 Strategy)

A strategy for Player 1 is a mapping f : Runs1(G) → Σ1. A strategy f is trace-based if for all ρ, ρ′ ∈ Runs1(G), tr(ρ) = tr(ρ′) implies f (ρ) = f (ρ′). A strategy f is memoryless if tgt(ρ) = tgt(ρ′)

impliesf (ρ) = f (ρ′_).

Definition 5.6. (Outcome)

Given a strategyf for Player 1, the outcome, Out(G, f ), of the game G under (or controlled by) f is

the set of states sequences9 States(ρ) for the plays ρ = q0ℓ1q1· · · qnℓn+1qn+1· · · such that for each qi ∈ Q1,ℓi+1= f (ρ(i)).

9

(22)

Definition 5.7. (Safety Control Objective)

An objectiveφ for Player 1 is a subset of (Q1∪ Q2)ω∪ (Q1∪ Q2)∗.φ is a safety objective if there is a

setF ⊆ Q1∪ Q2such thatφ = Fω∪ F∗(F is the base set of safe states).

Given a pair(G, φ), a strategy f is winning if Out(G, f ) ⊆ φ. If G is a τ -game and φ a safety objective

based on the setF , we say that the pair (G, F ) is a safety game. A state q of G is winning if there is a

winning strategy fromq.

5.3.2. Safety Two-Playerτ -Games

The usual control problem on two-playerτ -games asks the following:

Problem 6. (Control Problem)

INPUT: A two-playerτ -game G, a safety control objective F .

PROBLEM: Is there a winning strategy for(G, F ) ?

To solve safety two-playerτ -games we define the Cpre operator [17]:

Cpre(S) = {s ∈ Q1| ∃σ1∈ Σ1, s.t. δ(s, σ1) ∈ S} (5)

∪ {s ∈ Q2| ∀σ2∈ Στ₂, δ(s, σ2) ⊆ S} (6)

It is well-known [17] that iterating Cpre and computing the fix-point Cpre∗(F ) gives the set of winning

states10 of the game. In caseG is finite this computation terminates and can be done in linear time for

safety games [17]. Ifq0 ∈ Cpre∗_{(F ), there is a strategy for Player 1 to win. Moreover, for this type}

of games, memoryless strategies are sufficient to win. Indeed, as we can observe the state of the game,

τ transitions are not totally unobservable (or invisible) and thus knowing the current state gives some

useful information. Also there is a most permissive strategy11 F : Q1∩ Cpre∗_{(F ) → 2}Σ1 _{\ ∅ defined}

by: F(q) = {σ | δ(q, σ) ∈ Cpre∗(F )}. This is the most permissive strategy in the following sense: f is

a winning strategy for(G, F ) iff for any finite run ρ ∈ Out(G, f ) ending in Q1-state,f (ρ) ∈ F(tgt(ρ)),

i.e., every move defined byf is a move of the most permissive strategy and vice versa.

In our setting,τ is not observable. What we should ask for to win a game under partial observation,

is to base our choice of move on the sole basis on the observable moves that have occurred so far. The problem we want to address is the following:

Problem 7. (Trace-Based Control Problem)

INPUT: a gameG, a safety objective F .

PROBLEM: Is there a trace-based winning strategy for(G, F ) ?

This problem is more demanding than the usual Control Problem (Problem6) that asks only for a winning strategy for Player 1, i.e., a strategy in which full observation of the state is assumed. To solve Problem7, we reduce it to a full-observation two-player game (using a construction like the one given in [7]).

10

Notice that by definition of Cpre, Player 1 cannot win by refusing to play.

11

(23)

To do this, we first define the following operator forσ ∈ Στ₂,s ∈ Q2,

Next2(s, σ) = {s′| s τ ∗

−→ s1 −→ sσ ′} (7)

It follows that ifσ ∈ Σ2, Next2(s, σ) ⊆ Q1and ifσ = τ , Next2(s, σ) ⊆ Q2. Letσ ∈ Σ1andQ ⊆ Q1.

We define

Next1(Q, σ) = {s′ ∈ Q2| s′ = δ(s, σ) with s ∈ Q} (8)

To solve(G, F ), we build a full-observation two player game (GH, FH) (which contains no τ transitions)

such that:

Theorem 5.2. There is a winning strategy in (GH, FH) iff there is a trace-based winning strategy in (G, F ).

The proof of Theorem5.2is given in AppendixA. The definition of the game(GH, FH) is as follows:

Definition 5.8. AssumeG = (Q1⊎ Q2, q0, Σ1⊎ Στ2δ). The game GH = (W, s0, Σ′, ∆) is defined by: • W = W1⊎ W2 withW1 = 2Q1 ∪ {⊥1} are the Player 1 states, W2 = 2Q2 ∪ {⊥2} are Player 2

states;

• s0= {q0}, • Σ′_{= Σ1}_{∪ Σ}u

2 whereu is a fresh name, Σ1are Player 1 moves, andΣu2 Player 2 moves;

• ∆ ⊆ (W1 × Σ1 × W2) ∪ (W2 × Σu2 × W1) is defined by: (S, σ, S′) ∈ ∆ iff one of the three

conditions holds:

C1: S ⊆ Q1, σ ∈ Σ1 and S′ = Next1(S, σ) if for all s ∈ S, σ ∈ Enabled(s) and otherwise S′ _{= ⊥}

2;

C2: S ⊆ Q2, σ ∈ Σ2,S′ = Next2(S, σ) and S′6= ∅;

C3: S ⊆ Q2, σ = u, Next2(S, τ ) ∩ (W \ F ) 6= ∅ and S′= ⊥1.

We letFH = {Q ∈ S | Q ⊆ F } i.e., FH is the set of safe states forGH. ⊥1and⊥2are not safe states. (GH, FH) is a safety game as well. Notice also that GH is a full-observation turn-based two-player

game in which the moves of the two players alternate. To solve Problem7we can first solve the game

(GH, FH). This can be done in linear time in the size of (GH, FH). From this obtain obtain an algorithm

for Problem7i.e., compute the set of winning states ofGH. As the size ofGH is exponential in the size G:

Theorem 5.3. Problem7is in EXPTIME.

Indeed, to solve Problem 7, we compute the set of winning states WH of (GH, FH) and check that

the initial state ofGH is winning. For safety games like GH, with full observation, we can also

com-pute the most permissive strategy as emphasised earlier. Given GH and FH let FH be the most

(24)

Next1(q, σ) ∈ WH. In other words, the most permissive strategy gives, for each state, the set of moves

that keep the system into the set of winning statesWH.

GivenFH, we can build a strategy forG. Let β(w) be the (unique) state s′_s.t._q

0−−→ sw ′inGH. We

define the mappingF on Runs1(G) by: F(ρ) = FH(tgt(β(tr(ρ)))). Intuitively, this means that, given a

runρ in G ending in a Q1-state, to decide what to play, we compute the (unique) stateq = tgt(β(tr(ρ)))

that can be reached in GH; then we can play any move from the ones allowed by the most permissive

strategyFH(q).

Theorem 5.4. F is the most permissive trace-based strategy for G.

Proof:

FirstF is trace-based12_{. Let}_{f be a trace-based winning strategy. We define f}

H as in the If part proof

of Theorem5.2.fH is winning and thus for any runw, we have fH(w) ∈ FH(tgt(w)). By definition of fH,f (ρ) = fH(β(tr(ρ))) and f (ρ) ∈ FH(tgt(β(tr(ρ)))) = F(ρ).

Now assumef is a trace-based strategy such that for each run ρ ∈ Runs1(G), f (ρ) ∈ F(ρ). Build fH as before. fH is winning. Indeed, fH(w) = f (ρ) for any ρ s.t. tr(ρ) = tr(w). Hence, fH(w) ∈ FH(tgt(w)) and thus fH is winning. Now fromfH build a strategy ˜f as in the Only If part of the proof

of Theorem5.2. It turns out that ˜f = f and as ˜f is winning, f is winning. ⊓⊔

Corollary 5.1. The most permissive trace-based strategyF for (G, F ) can be effectively computed and

is represented by an automaton which has at most an exponential number of states in the size ofG.

This follows from the fact that GH is exponential in the size ofG. The most permissive trace-based

strategy is obtained fromGH by removing from each stateq the transitions that are not in FH(q).

5.4. Problem5as a Game Problem

To solve Problem5we reduce it to a safety 2-playerτ -game. In short, the reduction we propose is the

following:

• Player 1 chooses the set of events it wishes to observe, then it hands over to Player 2;

• Player 2 chooses an event and tries to produce a run which is the observation of both a k-faulty run

and a non-faulty run.

Player 2 wins if he can produce such a run. Otherwise Player 1 wins. Player 2 has complete information of Player 1’s moves (i.e., it can observe the sets that Player 1 chooses to observe). Player 1, on the other hand, only has partial information of Player 2’s moves because not all events are observable (details follow). Let A = (Q, q0, Στ,f, →) be a finite automaton. To define the game, we use two copies of

automatonA: Ak

1 and A2. The accepting states ofAk1 are those corresponding to runs ofA which are

faulty and where more thank steps occurred after the fault. A2 is a copy ofA where the f -transitions

have been removed. The game we are going to play is the following (see Fig. 9, Player 1 states are depicted with square boxes and Player 2 states with round shapes):

12

(25)

(q1, q2) (q1, q2) · · · (q, q′) · · · (q₁′, q′₂) (q₁′′, q′′₂) Player 1 choosesX ⊆ Σ λ1 6∈ X σ1 ∈ X λ2 6∈ X λ36∈ X σ2 ∈ X λ4 6∈ X

Figure 9. Game reduction for problem5.

1. the game starts in a state (q1, q2) corresponding to the initial states the automata Ak₁ and A2. Initially, it is Player 1’s turn to play. Player 1 chooses a set of events he is going to observe i.e., a subsetX of Σ and hands it over to Player 2;

2. assume the automataAk

1 andA2are in states(q1, q2). Player 2 can change the state of Ak1 andA2

by:

(a) firing an action which is not inX (like λ1, λ2, λ3, λ4in Fig.9): it is an unobservable action

and thus, it can be taken independently by eitherAk₁ orA2(they do not need to synchronise). In this case a new state(q, q′) is reached and Player 2 can play again from this state. Notice

that Player 2’s moves do not change the setX;

(b) firing an action inX (like σ1, σ2 in Fig. 9): to do this bothAk1 and A2 must be in a state

where such an action (e.g., σ1, σ2) is enabled (synchronization); after the action is fired a

new state(q′

1, q2′) is reached: now it is Player 1’s turn to play, and the game continues as in

step 1 above from the new state(q₁′, q′₂).

Player 2 wins if he can reach a state (q1, q2) in the product of Ak₁ and A2 where q1 is an accepting state ofAk₁ (this means that Player 1 wins if it can avoid ad infinitum this set of states). In this sense this is a safety game for Player 1 (and a reachability game for Player 2). Formally, the game GA = (S1⊎ S2, s0, Σ1⊎ Σ2, δA) is defined as follows (⊎ denotes union of disjoint sets):

• S1 = (Q × {−1, · · · , k}) × Q is the set of Player 1 states; a state ((q1, j), q2) ∈ S1indicates that Ak

1 is in stateq1,j steps have occurred after a fault, and q2 is the current state ofA2. If no fault

has occurred,j = −1 and if more than k steps occurred after the fault, we use j = k.

• S2 = (Q × {−1, · · · , k}) × Q × 2Σis the set of Player 2 states. For a state((q1, j), q2, X) ∈ S2,

the triple ((q1, j), q2) has the same meaning as for S1, and X is the set of moves Player 1 has

chosen to observe on its last move.

• s0= ((q0, −1), q0) is the initial state of the game belonging to Player 1; • Σ1 = 2Σ _{is the set of moves of Player 1;}_Σ

2 = Στ is the set of moves of Player 2 (as we encode

the fault into the state, we do not need to distinguishf from τ ).