Figure 2-13. Two different views of future transport networking

Figure 2-13. Two different views of future transport networking.

Figure 2-13(a) shows the view of pure router-handled IP transport. There are no OXCs or OADMs. All client services of the IP backbone transport network interface at an IP packet level to the backbone router node. Point-to-point unprotected optical links interconnect these super-routers directly and the router looks after all connection management, QoS, and protection or restoration needs. Advantages of this architecture are the uniformity of how all services and traffic are handled and the capacity efficiency that arises from re-consolidation of traffic onto each link leaving the node. This allows for the highest utilization of the point-to-point links between routers. On the other hand, because the physical network graphs are typically rather sparse, router-centric transport implies that there will be more router-hops in the average path than with an OXC-managed optical layer that can create numerous direct inter-router links. In addition, most of the packet flows forwarded through such backbone routers will be transiting traffic.

As network traffic grows, router-centric transport also requires continually higher capacity routers and there are definite limits to the size and speed of an electronic router in terms of space, power and reliability. Power consumption limits of 7kW for a 7-foot equipment rack are already a dominant limiting factor in the design of high capacity routers. Routers are also extremely software-intensive and highly

dependent on accumulated network state information. In this regard it seems ironic that the number of lines of software in a conventional digital circuit switch was often pointed to as a key reason explaining how unmanageable, and ultimately expensive these systems had become. But it seems reasonable that the amount of software and database dependencies in the "big router" model would be at least as great if it was to control all service layer functions and all of the logical transport constructs they require simultaneously. The router-centric architecture is thus likely to be most effective for private networks that are not mission-critical and not of extremely high capacity.

For carrier-grade networks where extremely high availability and scalability for growth are paramount concerns, the "smart optics"

architecture in Figure 2-13(b) has more advantages. Operating below the service layer, OXCs can provide flexible optical pipes and much faster restoration against physical failures. The optical network layer also decouples the router connectivity from limitations of the physical graph, providing a wavelength networking layer that can richly connect the routers, reducing the load on each and reducing average packet delay throughout the network, as well as permitting smaller, less costly routers. Through coordination among OXC nodes, using communication overheads within itself, the optical layer can establish, adapt and evolve a set of lightpaths that supports the required time-varying logical connectivity and capacity between routers as well as nodes of any other service layer networks.

An important simplifying practice that the OXC-based architecture also enables is the aggregation and grooming of traffic onto lightpaths at or near the edge of the network. By grooming and aggregating packet flows onto optical paths at the OC-48 level or higher, these transit flows bypass the routers en route optically, without adding to the forwarding load on the routers. Such optical bypass is especially important as demand becomes independent of distance, increasing the fraction of transiting flow at nodes. The only advantage of handling transiting flow at the packet level at every hop is a theoretical gain in statistical multiplexing efficiency in each hop. But doing so also increases average delay and packet loss and generates more cost, power and space consumption in the router. Well groomed optical paths can bypass the routers at almost as high utilization levels and without the corresponding risk of sudden extreme congestion that comes with the operation of single "fat" bit pipes at high utilizations. (The transient congestion of large pipes is why we say the higher utilization is only a theoretical benefit; it is risky in practice to reach high IP pipe utilization levels.) In the OXC-based architecture routers tend to handle only traffic that is within one of two remaining hops of their destination. The optical bypass the OXCs support thus also shifts the problem of scalability into the optical transport domain where doublings in capacity are far more easily obtained than in router speed. The OXC-based architecture is also more scalable because cross-connects are more compact, economic and lower in power consumption than large routers.

The OXC architecture also divides functionality and control complexity into smaller service-specific domains and its network state is largely physical, not based on software state and databases, and hence intrinsically more robust. Through user-network interfaces to the OXC, a variety of service layer networks can also manage their own internal state and service-related functionality and simply act as clients of the OXC asking for and/or releasing paths of requested capacity and protection status from the OXC-managed transport network. This is further distributes control responsibility, inherently putting fewer eggs in each basket compared to the entirely router-based architecture.

2.10.2 Concept of a "Transport Stabilized" Internet

The architecture of Figure 2-13(b) creates the option of using intelligence in the optical transport layer to simplify and enhance both the performance and stability of an overlying Internet layer. Internet "brownouts," routing storms, and other unpredictable and complex performance degradations are often caused by somewhat self-induced congestion effects and debilitating interactions with large numbers of IP traffic applications that compound these effects. An unexpected overload on one link may cause millions of TCP/IP sessions to back off, producing a massive self-synchronized load on the network again a few seconds later. In the meantime a flurry of LSAs may be generated from some router, or link failure or sheer transient link congestion may send the routers into a signaling-intensive phase of trying to re-synchronize their global database views and redistribute label information. The propensity for such complex dynamics in the

Internet is well known. In addition, there are more and more software complexities and configuration and database dependencies in evolving protocols.

But what if the Internet layer effectively never saw either link congestion or link failures? In fact one of the most common empirical practices today to stabilize IP layer performance is to simply keep all link capacities well over-provisioned. It is called "pouring bandwidth on the problem." If inter-router links in the IP layer are kept at 15% utilization or below, then experience is that delays and packet losses stay low and risks of complex interactions is minimized. In fact despite much research and development, IP layer links that approach 30%

utilization tend to mark onset of loss and stability problems for network operators (signaling the need to pour on more bandwidth, etc). But this is hardly an efficient or safe way to run a network.

A more efficient alternative to pouring bandwidth onto the inter-router links of an inherently all-router transport network is intelligent transport networking below the IP layer. The idea is that many such complex dynamic degradations in the Internet could be avoided by adaptive capacity management in the optical network layer. The IP router layer would work under the simplest forwarding protocols possible, within a richly connected mesh of direct inter-router logical pipes, and with little or no apparent changes in the logical link topology and hence little or no activity for topology and routing updates. In the IP layer the capacity of each pipe in the logical fabric would appear to spontaneously grow (and decrease) just as needed to match the IP flows. Such an apparently stable and perfectly capacitated environment of network connectivity for the IP layer would be an "artificial world" created for the IP layer by the transport layer. Under such idealized conditions, the simplest suite of basic IP forwarding protocols actually works quite well. This is the vision of a "transport

stabilized" Internet. An adaptive and survivable transport layer would create the illusion of an artificially perfect world for the Internet layer.

This is done through adaptive creation and deletion of transport layer paths, and restoration or protection in the tens of milliseconds range.

The interaction among OXC nodes would create this apparent world for the IP layer in which:

Every time an IP pipe begins to approach congestion, or crosses a preset utilization threshold, the IP layer link capacity

"magically" increases (then later decreases as well when this will be unnoticeable to reduced traffic).

The IP links between routers appear not to fail, other than due to single link interface card failures on routers themselves, because physical failures are hidden in the transport layer.

Creation of this artificially ideal world for the IP layer stabilizes the IP layer: routing tables are almost never updated and LSAs are rare.

End applications see a stable, predictable, low-delay environment and rarely need to go through TCP/IP window size and other transmission load-managing dynamics. The underlying transport layer produces this view by self-organizing adaptation of its logical capacity configuration to protect against actual failures and to adaptively configure physical capacity to best support the current loads on point-to-point logical links of the overlying IP layer.

[ Team LiB ]

Chapter 3. Failure Impacts, Survivability Principles, and Measures of Survivability

In this chapter we will look at causes of fiber cable failures, identify the impacts of outage, and relate these to the goals for restoration speed. We then provide an overview of the different basic principles and techniques for network survivability. This provides a first overview appreciation of the basic approaches of span, path and p-cycle based survivability which we treat in depth in later chapters.

The survey of basic mesh-oriented schemes in this chapter also lets the reader see these schemes in contrast to ring-based schemes that are 100% or more redundant, and which we do not consider further in the book. The chapter concludes with a look at the quantitative measures of network survivability, and the relationships between availability, reliability and survivability.

[ Team LiB ]

3.1 Transport Network Failures and Their Impacts

3.1.1 Causes of Failure

It is reasonable to ask why fiber optic cables get cut at all, given the widespread appreciation of how important it is to physically protect such cables. Isn't it enough to just bury the cables suitably deep or put them in conduits and stress that everyone should be careful when digging? In practice what seems so simple is actually not. Despite best-efforts at physical protection, it seems to be one of those large-scale statistical certainties that a fairly high rate of cable cuts is inevitable. This is not unique to our industry. Philosophically, the problem of fiber cable cuts is similar to other problems of operating many large-scale systems. To a lay person it may seem baffling when planes crash, or nuclear reactors fail, or water sources are contaminated, and so on, while experts in the respective technical communities are sometimes amazed it doesn't happen more often! The insider knows of so many things that can go wrong [Vau96]. Indeed some have gone as far as to say that the most fundamental engineering activity is the study of why things fail [Ada91] [Petr85].

And so it is with today's widespread fiber networks: it doesn't matter how advanced the optical technology is, it is in a cable. When you deploy 100,000 miles of any kind of cable, even with the best physical protection measures, it will be damaged. And with surprising frequency. One estimate is that any given mile of cable will operate about 228 years before it is damaged (4.39 cuts/year/1000 sheath-miles) [ToNe94]. At first that sounds reassuring, but on 100,000 installed route miles it implies more than one cut per day on average. To the extent that construction activities correlate with the working week, such failures may also tend to cluster, producing some single days over the course of a year in which perhaps two or three cuts occur. In 2002 the FCC also published findings that metro networks annually experience 13 cuts for every 1000 miles of fiber, and long haul networks experience 3 cuts for 1000 miles of fiber [VePo02]. Even the lower rate for long haul implies a cable cut every four days on average in a not atypical network with 30,000 route-miles of fiber. These frequencies of cable cut events are hundreds to thousands of times higher than corresponding reports of transport layer node failures, which helps explain why network survivability design is primarily focused on recovery from span or link failures arising from cable cuts.

3.1.2 Crawford's Study

After several serious cable-related network outages in the 1990s, a comprehensive survey on the frequency and causes of fiber optic cable failures was commissioned by regulatory bodies in the United States [Craw93]. Figure 3-1 presents data from that report on the causes of fiber failure. As the euphemism of a "backhoe fade" suggests, almost 60% of all cuts were caused by cable dig-ups. Two-thirds of those occurred even though the contractor had notified the facility owner before digging. Vehicle damage was most often suffered by aerial cables from collision with poles, but also from tall vehicles snagging the cables directly or colliding with highway overpasses where cable ducts are present. Human error is typified by a craftsperson cutting the wrong cables during maintenance or during copper cable salvage activities ("copper mining") in a manhole. Power line damage refers to metallic contact of the strain-bearing "messenger cable" in aerial installations with power lines. The resulting i2

R (heat dissipation) melts the fiber cable. Rodents (mice, rats, gophers, beavers) seem to be fond of the taste and texture of the cable jackets and gnaw on them in both aerial and underground installations. The resulting cable failures are usually partial (not all fibers are severed). It seems reasonable that by partial gnawing at cable sheaths, rodents must also compromise a number of cables which then ultimately fail at a later time. Sabotage failures were typically the result of deliberate actions by disgruntled employees, or vandalism when facility huts or enclosures are broken into. Today, terrorist attacks on fiber optic cables must also be considered.

Dans le document [ Team LiB ] (Page 101-105)