• Aucun résultat trouvé

Basic Concepts on Network Survivability

Network Survivability: End-to-End Recovery Using Local Failure Information

5.1 Basic Concepts on Network Survivability

Network survivability reflects the ability of a network to maintain service continu-ity during and after failures. Although this abilcontinu-ity is of relevance for any transport network, it is essential for operators of optical networks, where a single fiber cut, for example, can lead to severe service disruption and to loss of revenue, affecting thousands of end users whose traffic is transported by high-capacity DWDM links.

Naturally, failures are not limited to fiber optic cables; other components such as multiplexers, optical cross-connects (OXC), and repeaters can also fail. Moreover, the causes of failure are equally diverse, from aging of the physical components to natural dissasters to errors caused by human intervention.

Recovery is the name given to the sequence of events and actions taken after the detection of a failure in order to keep the service in operation –whether in degraded mode or not– and return the network to the preferred state upon the completion of the repair procedures [20]. In this chapter, we consider that the unit of service of the optical network, and therefore the subject of recovery, is a connection, i.e., the virtual communication channel created between two designated nodes, with a given capacity and duration. In line with the concepts of GMPLS, we assume that any such connection is served by paths inside the network.

The body of knowledge on network recovery for optical networks is extensive.

However, as this chapter is focused on end-to-end recovery using local failure in-formation, the reader interested in the general topic of network recovery is referred to [13], which surveys the major techniques for next generation networks, and to [6], which offers several options for classifying them, as well as an evaluation of their features from the perspective of quality of service and differentiation. Nevertheless, we will devote the following subsections to presenting essential concepts and termi-nology on survivability, necessary for understanding the problems addressed in the rest of the chapter as well as the applicability of the proposed solutions.

5.1.1 Protection and Restoration

The existing recovery techniques differ in objectives (i.e., offer very fast recovery time, maximize network utilization, accomodate conflicting QoS requirements, or a combination of several objectives) as well as approaches to specific issues such as the provisioning method employed, the protection scope, the signaling require-ments, and the layers, topologies and transmission technologies to which they are applicable. Despite their differences, however, recovery generally implies that the traffic affected by a failure is switched to a backup path. The moment in time in which this backup path is established gives rise to two general approaches, one is called restoration and the other is called protection. Under restoration, backup paths are discovered on demand, and spare capacity is dynamically allocated thereafter, whereas under protection both steps are completed at service setup time, whether any failure arises or not later on during the service lifetime.

In general, protection favors quality of recovery in the sense that, by doing pre-allocation, there is guarantee that resources are readily available at failure time and less steps are necessary to successfully complete the recovery process. But this guar-antee comes at the expense of network utilization: capacity for recovery is essen-tially locked even if finally it is never needed. As discussed in Section 5.1.3, net-work utilization can be improved by sharing the resources for recovery at the cost of losing the certainty on successful recovery.

5.1.2 The Scope of Backup Paths

The conceptually simplest method of path assignment is called global backup path or just path protection, whereby two disjoint paths are assigned to one connection, as illustrated in Figure 1(a). The path that carries the traffic under normal conditions is called the working path, and it is covered by the backup path. When the working path is affected by a failure, the connection is rerouted on an end-to-end basis, that is, the switchover is usually performed at the source node, irrespective of the actual failure location. Unless 1+1 protection is used (discussed in the next subsection), the source node must be notified upon failure, which increases the recovery time as well as the risk of losing traffic. Its advantage is that only the source node needs switchover functionality.

(a) Global backup path

(b) Local backup path

(c) Segment-based backup path Fig. 5.1 Scope of backup paths

The local path method can be employed to overcome the drawbacks of using global backup paths. As illustrated in Figure 1(b), the working path is divided into shorter reroutable paths so that the node detecting the failure (node q in the example) can immediately switch to the alternate path (passing through r in the example) and route the connection around the failed element. However, this method requires switchover functionality in every node.

An intermediate solution is offered by the segment-based method [4]. For in-stance, if a failure is detected between nodes t and d in Figure 1(c), the failure no-tification is sent to q, which performs the switchover. The nodes between the failed network element and the source node of the connection are called upstream nodes, while the ones between the failure and the destination node are called downstream nodes.

The idea of restricting the notification to the node closer to the point of failure is employed by the local-to-egress restoration method [2]. In this case, the affected traffic is rerouted between the upstream node adjacent to the failure and the egress node of the connection, combining short notification time and high resource ef-ficiency. This form of recovery is used by the Shortcut Span Protection method presented in Section 5.4.1.

5.1.3 Shareability of Protection Resources

Based on whether or not the sharing of network resources between connections is allowed, a protection scheme can be categorized as shared or dedicated. An ex-ample of dedicated protection that uses global backup path is 1+1 protection. In 1+1 protection, traffic is sent simultaneously over both the working and the backup paths; the destination node monitors continuously the reception for choosing the most convenient path. By not requiring failure notification, the recovery time can be very short. Another variation is called 1:1 protection, in which, unlike in 1+1 protection, traffic is sent over the working path only until a failure is detected.

In shared protection, however, the backup paths are provisioned only at the con-trol plane to facilitate the sharing of the recovery resources of several working paths.

It is also called soft provisioning [34] in GMPLS terminology and backup mul-tiplexing [23] in optical transport domain. Shared protection is usually combined with single link failure protection. This combination yields to low bandwidth uti-lization and is relatively simple to implement. Section 5.4.4 studies the availability of connections protected with both dedicated path protection (DPP) and shared path protection (SPP).

The concept of shared protection is to activate a single protection path after the interruption of working bandwidth at a preselected upstream node (a branch node in GMPLS terminology [34]); the protection path merges back to the working route at a preselected downstream node called merging node. In SPP, the switching node is always the source node, while the merging node is always the destination node.

Shared Link Protection (SLP) typically protects link failures only, and the switching

node is the neighboring upstream node, while the merging node is the neighboring downstream node. SLP is favored for its simple and local fault recovery. For Shared Segment Protection (SSP, a.k.a. Sub-Path Protection [25]) the working path is di-vided into segments and the starting node of the segment is the switching node and the last node is the merging node. The segments may be overlapped but not embed-ded; thus, the closest upstream switching node is always the switching node of the corresponding segment. It provides an explicit mapping between the network ele-ment and the segele-ment. SSP provides the finest compromise between fast restoration time and bandwidth utilization efficiency [34].

s d

SPP

a c

SSP b

SLP

upstream nodes downstream nodes

Fig. 5.2 Shared path/segment/link protection