Classifications and basic concepts - 5 Terminology and basic algorithms

5 Terminology and basic algorithms

5.2 Classifications and basic concepts

5.2.1 Application executions and control algorithm executions

The distributed application execution is comprised of the execution of instructions, including the communication instructions, within the distributed application program. The application execution represents the logic of the application. In many cases, a control algorithm also needs to be executed in order to monitor the application execution or to perform various auxiliary functions. The control algorithm performs functions such as: creating a span-ning tree, creating a connected dominating set, achieving consensus among the nodes, distributed transaction commit, distributed deadlock detection, global predicate detection, termination detection, global state recording, checkpoint-ing, and also memory consistency enforcement in distributed shared memory systems.

The code of the control algorithm is allocated its own memory space. The control algorithm execution is superimposed on the underlying application execution, but does not interfere with the application execution. In other words, the control algorithm execution including all its send, receive, and internal events are transparent to (or not visible to) the application execution.

The distributedcontrol algorithmis also sometimes termed as aprotocol;

although the termprotocolis also loosely used for any distributed algorithm.

129 5.2 Classifications and basic concepts

In the literature on formal modeling of network algorithms, the termprotocol is more commonly used.

5.2.2 Centralized and distributed algorithms

In a distributed system, a centralized algorithm is one in which a predom-inant amount of work is performed by one (or possibly a few) processors, whereas other processors play a relatively smaller role in accomplishing the joint task. The roles of the other processors are usually confined to requesting information or supplying information, either periodically or when queried.

A typical system configuration suited for centralized algorithms is the client–serverconfiguration. Presently, much commercial software is written using this configuration, and is adequate. From a theoretical perspective, the single server is a potential bottleneck for both processing and bandwidth access on the links. The single server is also a single point of failure. Of course, these problems are alleviated in practice by using replicated servers distributed across the system, and then the overall configuration is not as centralized any more.

Adistributed algorithmis one in which each processor plays an equal role in sharing the message overhead, time overhead, and space overhead. It is difficult to design a purely distributed algorithm (that is also efficient) for some applications. Consider the problem of recording a global state of all the nodes. The well-known Chandy–Lamport algorithm which we studied in Chapter 4 is distributed – yet one node, which is typically the initiator, is responsible for assembling the local states of the other nodes, and hence plays a slightly different role. Algorithms that are designed to run on a logical-ring superimposed topology tend to be fully distributed to exploit the symmetry in the connectivity. Algorithms that are designed to run on the logical tree and other asymmetric topologies with a predesignated root node tend to have some asymmetry that mirrors the asymmetric topology. Although fully distributed algorithms are ideal, partly distributed algorithms are sometimes more practical to implement in real systems. At any rate, the advances in peer-to-peer networks, ubiquitous and ad-hoc networks, and mobile systems will require distributed solutions.

5.2.3 Symmetric and asymmetric algorithms

Asymmetric algorithmis an algorithm in which all the processors execute the same logical functions. Anasymmetric algorithmis an algorithm in which dif-ferent processors execute logically difdif-ferent (but perhaps partly overlapping) functions.

A centralized algorithm is always asymmetric. An algorithm that is not fully distributed is also asymmetric. In the client–server configuration, the

clients and the server execute asymmetric algorithms. Similarly, in a tree configuration, the root and the leaves usually perform some functions that are different from each other, and that are different from the functions of the internal nodes of the tree. Applications where there is inherent asymmetry in the roles of the cooperating processors will necessarily have asymmetric algorithms. A typical example is where one processor initiates the computation of some global function (e.g., min, sum).

5.2.4 Anonymous algorithms

Ananonymous systemis a system in which neither processes nor processors use their process identifiers and processor identifiers to make any execution decisions in the distributed algorithm. An anonymous algorithmis an algo-rithm which runs on an anonymous system and therefore does not use process identifiers or processor identifiers in the code.

An anonymous algorithm possesses structural elegance. However, it is equally hard, and sometimes provably impossible, to design – as in the case of designing an anonymous leader election algorithm on a ring [1]. If we examine familiar examples of multiprocess algorithms, such as the famous Bakery algorithm for mutual exclusion in a shared memory system, or the

“wait-wound” or “wound-die” algorithms used for transaction serializabil-ity in databases, we observe that the process identifier is used in resolving ties or contentions that are otherwise unresolved despite the symmetric and noncentralized nature of the algorithms.

5.2.5 Uniform algorithms

A uniform algorithm is an algorithm that does not use n, the number of processes in the system, as a parameter in its code. A uniform algorithm is desirable because it allows scalability transparency, and processes can join or leave the distributed execution without intruding on the other processes, except its immediate neighbors that need to be aware of any changes in their immediate topology. Algorithms that run on a logical ring and have nodes communicate only with their neighbors are uniform. In Section5.10, we will study a uniform algorithm for leader election.

5.2.6 Adaptive algorithms

Consider the context of a problemX. In a system withnnodes, letk k≤nbe the number of nodes “participating” in the context of X when the algorithm to solveXis executed. If the complexity of the algorithm can be expressed in terms of krather than in terms ofn, the algorithm isadaptive. For example, if the complexity of a mutual exclusion algorithm can be expressed in terms of the actual number of nodes contending for the critical section when the algorithm is executed, then the algorithm would be adaptive.

131 5.2 Classifications and basic concepts

5.2.7 Deterministic versus non-deterministic executions

A deterministic receive primitive specifies the source from which it wants to receive a message. A non-deterministic receive primitive can receive a message from any source – the message delivered to the process is the first message that is queued in the local incoming buffer, or the first message that comes in subsequently if no message is queued in the local incoming buffer. A distributed program that contains no non-deterministic receives has a deterministic execution; otherwise, if it contains at least one non-deterministic receive primitive, it is said to have anon-deterministic execution.

Each execution defines a partial order on the events in the execution.

Even in an asynchronous system (defined formally in Section5.2.9), for any deterministic (asynchronous) execution, repeated re-execution will reproduce the same partial order on the events. This is a very useful property for applications such as debugging, detection of unstable predicates, and for reasoning about global states.

Given any non-deterministic execution, any re-execution of that program may result in a very different outcome, and any assertion about a non-deterministic execution can be made only for that particular execution. Dif-ferent re-executions may result in difDif-ferent partial orders because of variable factors such as (i) lack of an upper bound on message delivery times and unpredictable congestion; and (ii) local scheduling delays on the CPUs due to timesharing. As such, non-deterministic executions are difficult to reason with.

5.2.8 Execution inhibition

Blocking communication primitives freeze the local execution¹ until some actions connected with the completion of that communication primitive have occurred. But from a logical perspective, is the process really prevented from executing further? The non-blocking flavors of those primitives can be used to eliminate the freezing of the execution, and the process invoking that primitive may be able to execute further (from the perspective of the program logic) until it reaches a stage in the program logic where it cannot execute further until the communication operation has completed. Only now is the process really frozen.

Distributed applications can be analyzed for freezing. Often, it is more interesting to examine the control algorithm for its freezing/inhibitory effect on the application execution. Here, inhibition refers to protocols delaying actions of the underlying system execution for an interval of time. In the literature on inhibition, the term “protocol” is used synonymously with the term “control algorithm.” Protocols that require processors to suspend their

1 The OS dispatchable entity – the process or the thread – is frozen.

normal execution until some series of actions stipulated by the protocol have been performed are termed asinhibitoryorfreezing protocols[10].

Different executions of a distributed algorithm can result in different inter-leavings of the events. Thus, there are multiple executions associated with each algorithm (or protocol). Protocols can be classified as follows, in terms of inhibition:

• A protocol isnon-inhibitoryif no system event is disabled in any execution of the protocol. Otherwise, the protocol isinhibitory.

• A disabled eventein an execution is said to belocally delayedif there is someextension of the execution (beyond the current state) such that: (i) the event becomes enabled after the extension; and (ii) there is no intervening receive event in the extension, Thus, the interval of inhibition is under local control. A protocol islocally inhibitoryif any event disabled in any execution of the protocol is locally delayed.

• An inhibitory protocol for which there is some execution in which some delayed event is not locally delayed is said to beglobally inhibitory. Thus, in some (or all) execution of a globally inhibitory protocol, at least one event is delayed waiting to receive communication from another processor.

An orthogonal classification is that ofsend inhibition,receive inhibition, and internal event inhibition:

• A protocol issend inhibitoryif some delayed events are send events.

• A protocol isreceive inhibitoryif some delayed events are receive events.

• A protocol isinternal event inhibitoryif some delayed events are internal events.

These classifications help to characterize the degree of inhibition necessary to design protocols to solve various problems. Problems can be theoretically analyzed in terms of the possibility or impossibility of designing protocols to solve them under the various classes of inhibition. These classifications also serve as a yardstick to evaluate protocols. The more stringent the class of inhibition, the less desirable is the protocol. In the study of algorithms for recording global states and algorithms for checkpointing, we have the opportunity to analyze the protocols in terms of inhibition.

5.2.9 Synchronous and asynchronous systems

Asynchronous systemis a system that satisfies the following properties:

• There is a known upper bound on the message communication delay.

• There is a known bounded drift rate for the local clock of each processor with respect to real-time. The drift rate between two clocks is defined as the rate at which their values diverge.

• There is a known upper bound on the time taken by a process to execute a logical step in the execution.

133 5.2 Classifications and basic concepts

An asynchronous system is a system in which none of the above three properties of synchronous systems are satisfied. Clearly, systems can be designed that satisfy some combination but not all of the criteria that define a synchronous system. The algorithms to solve any particular problem can vary drastically, based on the model assumptions; hence it is important to clearly identify the system model beforehand. Distributed systems are inher-ently asynchronous; later in this chapter, we will study synchronizers that provide the abstraction of a synchronous execution.

5.2.10 Online versus offline algorithms

An on-line algorithm is an algorithm that executes as the data is being generated. Anoff-linealgorithm is an algorithm that requires all the data to be available before algorithm execution begins. Clearly, on-line algorithms are more desirable. Debugging and scheduling are two example areas where on-line algorithms offer clear advantages. On-on-line scheduling allows for dynamic changes to the schedule to account for newly arrived requests with closer deadlines. On-line debugging can detect errors when they occur, as opposed to collecting the entire trace of the execution and then examining it for errors.

5.2.11 Failure models

A failure model specifies the manner in which the component(s) of the system may fail. There exists a rich class of well-studied failure models. It is important to specify the failure model clearly because the algorithm used to solve any particular problem can vary dramatically, depending on the failure model assumed. A system is t-fault tolerant if it continues to satisfy its specified behavior as long as no more thantof its components (whether processes or links or a combination of them) fail. Themean time between failures (MTBF) is usually used to specify the expected time until failure, based on statistical analysis of the component/system.

Process failure models [26]

• Fail-stop [31] In this model, a properly functioning process may fail by stopping execution from some instant thenceforth. Additionally, other processes can learn that the process has failed. This model provides an abstraction – the exact mechanism by which other processes learn of the failure can vary.

• Crash[21] In this model, a properly functioning process may fail by stopping to function from any instance thenceforth. Unlike the fail-stop model, other processes do not learn of this crash.

• Receive omission[27] A properly functioning process may fail by inter-mittently receiving only some of the messages sent to it, or by crashing.

• Send omission [16] A properly functioning process may fail by inter-mittently sending only some of the messages it is supposed to send, or by crashing.

• General omission [27] A properly functioning process may fail by exhibiting either or both of send omission and receive omission failures.

• Byzantine or malicious failure, with authentication[22] In this model, a process may exhibit any arbitrary behavior. However, if a faulty process claims to have received a specific message from a correct process, then that claim can be verified using authentication, based on unforgeable signatures.

• Byzantine or malicious failure[22] In this model, a process may exhibit any arbitrary behavior and no authentication techniques are applicable to verify any claims made.

The above process failure models, listed in order of increasing severity (except for send omissions and receive omissions, which are incomparable with each other), apply to both synchronous and asynchronous systems.

Timing failurescan occur in synchronous systems, and manifest themselves as some or all of the following at each process: (i) general omission failures;

(ii) process clocks violating their prespecified drift rate; (iii) the process violating the bounds on the time taken for a step of execution. In term of severity, timing failures are more severe than general omission failures but less severe than Byzantine failures with message authentication.

The failure models less severe than Byzantine failures, and timing failures, are considered “benign” because they do not allow processes to arbitrarily change state or send messages that are not to be sent as per the algorithm.

Benign failures are easier to handle than Byzantine failures.

Communication failure models

• Crash failure A properly functioning link may stop carrying messages from some instant thenceforth.

• Omission failures A link carries some messages but not the others sent on it.

• Byzantine failures A link can exhibit any arbitrary behavior, including creating spurious messages and modifying the messages sent on it.

The above link failure models apply to both synchronous and asynchronous systems. Timing failures can occur in synchronous systems, and manifest themselves as links transporting messages faster or slower than their specified behavior.

5.2.12 Wait-free algorithms

A wait-free algorithm is an algorithm that can execute (synchronization operations) in ann−1-process fault tolerant manner, i.e., it is resilient to

135 5.3 Complexity measures and metrics

n−1 process failures [18,20]. Thus, if an algorithm is wait-free, then the (syn-chronization) operations of any process must complete in a bounded number of steps irrespective of the failures of all the other processes.

Although the concept of ak-fault-tolerant system is very old, wait-free algo-rithm design in distributed computing received attention in the context of mutual exclusion synchronization for the distributed shared memory abstraction. The objective was to enable a process to access its critical section, even if the pro-cess in the critical section fails or misbehaves by not exiting from the critical section. Wait-free algorithms offer a very high degree of robustness. Design-ing a wait-free algorithm is usually very expensive and may not even be pos-sible for some synchronization problems, e.g., the simple producer–consumer problem. Wait-free algorithms will be studied in Chapters12and14. Wait-free algorithms can be viewed as a special class of fault-tolerant algorithms.

5.2.13 Communication channels

Communication channels are normally first-in first-out queues (FIFO). At the network layer, this property may not be satisfied, giving non-FIFO channels.

These and other properties such as causal order of messages will be studied in Chapter6.

Dans le document This page intentionally left blank (Page 148-155)