Models - Advanced Information and Knowledge Processing

In this section, we explain the various models that are relevant. We will first discuss system models, followed by fault models.

6.3.1 System Models

Using a software architecture approach, an application can be seen as a collection of components that offer and require services [19]. A component is an artefact (soft-ware, hard(soft-ware, middleware) that implements a set of services. A service is an ab-straction of a given functionality. It can be regarded as a unit of work. The set of services is partitioned between exported and imported services. A service exported by a component is one that is provided by the component to other components, while a service imported by a component is one that is used by the component and is provided by another component. Some components do not need to import other services to provide their own services, e.g., a time server/clock. Figure 6.2 represents a component. In general, a component publishes both its imported and exported services.

Input 1 Input 2 Input 3

Output 1 Output 2

Fig. 6.2 A component with three imported services and two exported services. The imported ser-vices can be provided by different components. Similarly, the two exported serser-vices can be to different components

There exists a special class of components, calledconnectors, that enable the interconnection of non-connector components. Specifically, the tasks of the con-nectors can range from basic information transmission between components to pro-viding the necessary security and fault tolerance guarantees necessary during data transmission. Owing to the fact that services can be provided by different third

par-ties, incompatibilities between offered and required services will exist. To redress this problem, connectors are used to bridge the gap. At its simplest, i.e., where ser-vices match, a connector can just relay information from one component to another.

However, when incompatibilities exist, connectors are used to provide the required functionality to enable thecompositionof the two non-connector components. For example, if a component makes a service request without any security guarantee, which is in turn required by the service provider, a connector can be used to provide the necessary security clearance before relaying the request to the provider. The modularisation of a system into components and services provides a very elegant approach to system design.

Although it can be beneficial to access a given service through a semantically well-defined interface, a greater value is obtained when higher-level services are provided, made up from lower-level services. This can be achieved through the composition of components, possibly via the use of connectors. Composition (of services or components) then means that the output of a service is fed into the in-put of another component, which in turn may need inin-put from services from more than one component. In a distributed system, such as the Web, a component can use different connectors as appropriate for the network conditions and requirements of the end-user. For dependability purposes, the separation of concern between com-ponents/services and connectors allows failures due to the network to be handled cleanly and differently from failures due to computational (component/service) er-rors.

6.3.2 Fault Models

When developing a dependable system, it is very important to determine, in advance, the classes of faults that can affect the system. This can be a very difficult, and error-prone process. To help mitigate the problems, which may sometimes have catastrophic consequences, associated with an incomplete fault model, the concept of multitolerance has been advocated [1,16]. However, to help ease the development of a fault model, it becomes important to adopt a systematic approach to determine the various faults that can occur at various levels.

From Figure 6.1, there are several levels at which faults can occur. For example, faults can occur during the service publishing phase. Faults can also occur during service discovery. Overall, in a service-oriented architecture, any of the following service-oriented architecture-specific faults can occur [3]:

• publishing fault,

• discovery fault,

• composition fault,

• binding fault, and

• execution fault.

In addition to these faults, a number of other failures can occur in a system, including distributed systems. These can be network failures, hardware crashes, or middleware (such as OS) failures. However, we will focus on the failures specific to service-oriented architectures.

Publishing faults.

During the publishing phase, the service is deployed on a server so it can be exe-cuted, and the service description is made public. Faults that can occur at this level are service description faults and service deployment faults.

A service description fault occurs when problems arise when describing the ser-vice. Either the service is not completely described, or the service is wrongly de-scribed. These faults can lead to problems during the discovery phase, or during the execution phase, which we will detail in later sections. On the other hand, service deployment faults occur when any aspect during the deployment is incorrect. For example, a service deployment fault can occur when the service is deployed without the required resources.

Discovery faults.

In the discovery phase, three possible failures can occur, namely:

• the service is not found,

• wrong service is found, and

• timed out, in a distributed setting.

However, these failures can be brought about by problems occurring in other parts of the system or process. To be able to identify the potential sources of these problems, we build a fault tree [21]. A fault tree analysis is a logical, structured process that can help in identifying potential causes of system failure before the ac-tual failure occurs. It is performed using a top-down approach, i.e., starting from a top-level system failure, fault tree analysis is performed by working down to eval-uate all contributing events that may ultimately cause the top-level system failure.

Fault tree analysis helps determine the possible combinations of software and hard-ware failures that can lead to the overall system failure. At the core of fault tree analysis is a structure called the fault tree. The root of the tree is a top-level sys-tem failure, for which we want to determine its possible sources. Nodes in the tree represent intermediate component failures. Basic failure events are the leaves of the tree. One additional component in the structure is the use of boolean connectives to connect lower-level failure events into a higher-level failure event. For example, a fully functional CD-player will not work (top-level failure) if there is no battery (1st lower level failure) AND the player is not connected to the mains (2nd lower level failure).

Service not found

OR Search incorrect Service not

published

Service does not exist

Wrong service

search ^Publishing

fault

Fig. 6.3 A fault tree showing the possible failure sources for “Service not found” system failure

Figure 6.3 depicts a fault tree for the system failure “Service not found”. Events leading to such a failure can be one of the following.

1. No such service exists.If no such service exists, then the system will always fail whenever the service is required.

2. Service not published in registry.It can be the case that a service exists within the system, but has not yet been published. In such cases, until the service appears in the registry, the system will fail.

3. Incorrect search has been performed.It can be the case that a service search is performed with the wrong number of parameters, or with the wrong functional or non-functional requirements, leading to a failure in discovering the correct ser-vice. Or still, the correct search has been performed, however, because of pub-lishing faults, no service is found.

The events in ovals represent basic events, while events in diamonds represent undeveloped events. Undeveloped failure events are those whose sources are not further investigated, i.e., the failure in itself is more important than its sources (for the given system failure). However, for another system failure, it may be possible to investigate the failure sources further. In Figure 6.3, an incorrect search can occur if either (i) a wrong service search has been performed, or (ii) a seemingly correct search has been performed, however, publishing faults exist.

Further, it can be argued that, if a service does not exist or has not yet been published, several services can be composed to obtain the required one. However, problems can still occur during the composition phase. Problems occurring at that level will be classified under composition failures. Unless compatible services are found, a failure will occur.

Composition faults.

When an exact service match cannot be found, it is possible to compose different services so as to provide the required functionality. However, failures can occur in this phase too. Three types of failures can occur. These are:

• timed out, in a distributed system environment,

• no valid composition, and

• composition faults.

When a composition fault occurs, this indicates that contracts between compo-nents are not being respected. On the other hand, if there exists no compatible ser-vice, then a “no valid composition” fault occurs.

OR No valid composition

Incompatible components

Components missing

Fig. 6.4 A fault tree showing the possible failure sources for “No valid composition” system failure

In Figure 6.4, when either of the two faults (incompatible components, or compo-nents missing) occurs, a “no valid composition” fault occurs during the composition phase.

Binding faults.

During binding, the client and the service provider negotiate conditions to execute the service. The following binding failures can occur:

• timed out, in a distributed system environment,

• bound to the wrong service, and

• binding denied.

During binding, one fault that can occur is “bound to wrong service”. This can occur when there has been a “service description” fault during the publishing phase.

A “binding denied” fault occurs when some authorisation has not been granted by the authorisation component.

Execution faults.

Execution faults occur when the outcome of a service does not match the result expected by the client. The following failures can occur:

• timed out, in distributed systems,

• service crashed, in distributed systems, and

• incorrect result.

An incorrect result can occur if the wrong service has been selected. It can also occur if a transient fault occurs in the service provider. On the other hand, a service provider can crash, causing the service to be unavailable.

However, to be able to recover from an erroneous situation (error state), one needs to be able to detect that a fault has occurred. However, some faults may not be detected during the same phase where they occurred. For example, it may not always be possible to detect a “service description” fault during the service publishing phase until an “incorrect result” fault occurs during the execution phase.

6.4 Dependability Enhancement in a Service Oriented

Dans le document Advanced Information and Knowledge Processing (Page 158-163)