• Aucun résultat trouvé

Engineering Reliable, Low-Latency Networks

3. Network Organization

3.2 Wire Length

In this chapter, we have introduced many networks which have wires whose length is a function of the network size. We call a long wire any single run of wire between two switches which has a transit time in excess of the rate at which we could otherwise clock data between the switches. If we required the data to traverse any such wires in a single clock cycle, we would have to increase the clock period to accommodate the longest wire in the system. The longest wires in many of these network will be Ω(p3

N

) due to spatial constraints in three-dimensions. Requiring data to

I

I

I

I

Shown above is a portion of an express mesh after [Dal91]. The components labelled with an I are interchange units which allow connections to be routed along express channels, thereby bypassing intermediate switching nodes.

Figure 3.12: Express Cube Network –

k

=2

Network

T

s

T

t Locality Resources Practical Drawbacks Fully Connected 0 Θ(

N

) Θ(

N

3) Node sizeΘ(

N

)

Distributed Crossbar Θ(

N

) Θ(

N

) some Θ(

N

2)

Hypercube Θ(log(

N

)) Θ(p2

N

) yes Θ(

N

32) Node sizeΘ(log(

N

))

k

-ary-

n

-cube Θ(pk

N

) Θ(p3

N

) yes Θ(

N

)

Flat Multistage Θ(log(

N

)) Θ(p2

N

) no Θ(

N

32)

Fat-Tree Θ(log(

N

)) Θ(p3

N

) yes Θ(

N

)

Express Cube Θ(log(

N

)) Θ(p3

N

) yes Θ(

N

)

Table 3.1: Network Comparison

traverse these wires in a single clock cycle would require our clock period to increase comparably with network size. However, if we pipeline multiple bits on the long wires, we do not have to adjust the clock frequency to accommodate long wires. Our notion of transit latency as proportional to interconnection distance (Equation 2.2), will still hold. Instead of being a continuous equation as given, it becomes discretized in units of the clock period,

t

c.

T

t=X i

&

dvi

t

c

'

t

c (3

:

1)

Equation 3.1 explicitly breaks the total distance into segments (

d

i) between each pair of switching elements in the path between the source and destination nodes to properly account for the effects of this discretization. Techniques for ensuring correct operation when bits are pipelined on the wires are detailed in Section 6.9.

3.3 Fault Tolerance

In order to achieve fault tolerance in the network, we need multiple, distinct paths between any pair of nodes. The more distinct paths our network supports, the more robust the network will be to faults occurring in the network. In this section, we look at the multipath nature of the practical low-latency networks identified in the previous section.

3.3.1 Indirect Routing

If we allow indirect routing, all of the networks examined in this chapter have multiple paths.

With indirect routing, a message may be routed from a source to a destination node by first routing the message through one or more intermediate nodes in the network. That is, when the source cannot reach the destination directly through the network, it is often possible for it to reach another processing node in the network which can, in turn, reach the desired destination node. If we allow arbitrary indirect hops through the network, any message can eventually be routed as long as the transitive closure of the non-faulty direct interconnect covers all the nodes in use in the network.

While indirect routing will allow messages to eventually reach their destination, they do so at an increase in latency. Latency increases due to several effects. First, since messages must cross the network multiple times. Additional overhead is generally required to allow indirection and process messages requiring re-routing. Also, contention latency is increased since each indirected message consumes network bandwidth on each hop through the network.

3.3.2

k

-ary-

n

-cubes and Express Cubes

Direct, cube-based networks, like the

k

-ary-

n

-cube or the express cube, function by indirect routing. Each node is connected to

O

(

k

)neighbors in a regular pattern and all routing is achieved by sending the message to a neighbor node which, generally, moves the message closer to the desired destination. At every hop, the message has a choice of paths to take to the destination, many of which would require the same transit and switching latency. The underlying network thus provides the requisite multiple paths. It is then up to the routing algorithm to efficiently utilize them. If our routing algorithm is omniscient about faults in the network, it can always find the shortest path between points in a faulty network. For many faults, the length of the shortest paths between close nodes will increase. However nodes which are further apart will see no increase in transit or switching latency. The more distant two nodes are from each other, the more minimum length paths there will be between them.

3.3.3 Multiple Networks

A simple technique for adding adding fault tolerance to a network which works for all kinds of networks is to simply replicate a base network. We give each node a connection to each of the networks. As long as there is a non-faulty path on some network between any pair of nodes which must communicate, normal communication may occur with no degradation in switching or transit latency. The originating node need only choose which network to use for each message it needs to deliver. Additionally, the existence of multiple networks increases the bandwidth available in the network and hence can reduce contention latency if utilized efficiently. Unfortunately, the gain in

Two four-stage networks connecting 16 endpoints are attached together at the endpoints.

Each component is a 22, dilation-1 crossbar.

Figure 3.13: Replicated Multistage Network

fault-tolerance is small compared to the costs. Each additional path through the network requires that we construct a complete copy of the original network. Multiple, multistage style networks are used in the telecommunications field to minimize contention and increase available bandwidth over single-path networks [Hui90]. Figure 3.13 shows a 2-replicated bidelta network.

Replicated networks do have one advantage over pure indirect routing schemes including most cube style networks. With multiple networks, each node does have multiple connections both to and from the network. As noted in Section 2.1.2 multiple network i/o connections are key to avoiding a single point of failure which may sever a node completely from the interconnection network.

3.3.4 Extra-Stage, Multistage Networks

When using multistage interconnection networks one can construct extra-stage networks with more switching stages than are actually required to uniquely specify a destination ([LP83], [CYH84]

et. al.) (See Figures 3.10 and 3.14). The set of routing specifications that reach the same physical destination defines a class of equivalent paths. So long as one path of each such class remains intact in a faulty extra-stage network, any endpoint will be able to successfully route to its destination. The extra stages in these schemes result in larger switching and transit latencies than the corresponding baseline network, even in the absence of faults.

If extra stages are added, but the single connection into and out-of each node is retained, extra-stage networks retain a single-point of failure where the nodes connect to the network. To eliminate

Figure 3.14: Extra Stage Network

this problem, the extra-stage network should be constructed such that multiple network endpoints can be assigned to each node of the network.

3.3.5 Interwired, Multipath, Multistage Networks

Multibutterfly style, multipath networks are multistage networks which use dilated crossbar routing components. In addition to being characterized by the radix of the switching element, each dilated crossbar router is characterized by its dilation,

d

. The dilation is the number of logically equivalent outputs in each distinct direction. With a dilation greater than one, redundant routing is provided in each routing direction. Figure 3.11 shows an example of such a network. Figure 3.15 shows some configurations for the dilated routing elements used in Figure 3.11.

This class of multipath networks has a large number of distinct paths between each pair of nodes. The number of different switches in a stage which can be used to route between any pair of routers increases toward the center of the network. Up to the center of the network, the number of routers in any path grows by a factor of the dilation with each successive stage. Past the center of the network, the sorting function performed by the network limits the number of routers in the path to the desired destination. For those later stages, all routers which are in the path to the destination are candidates for use in routing any connection.

For a given number of node connections, the multibutterfly style networks generally have more paths than the comparable replicated network. Consider a k-replicated network. A multipath network can be constructed from the k-replicated network by taking each of the

k

routers in the same location in each of the k-replicated network and creating one dilated router out of them with dilation,

d

=

k

. This will give us a multibutterfly style network. Note that in the replicated network, we were only able to chose which resources to use when the message entered the network. In the multibutterfly network we have the option of switching between networks at each routing stage.

Thus, there are many more paths through the multibutterfly networks. The fine details of how one wires these redundant paths are discussed in Section 3.5.

Dilated Crossbar (no connections)

Logically Equivalent Connection Pairs

Figure 3.15: 42 Crossbar with a dilation of 2

By constructing fat-tree networks using dilated crossbar routers, it is possible to build multipath, fat-tree networks which exhibit the same basic properties. The tree networks will need multiple connections into and out-of the network to avoid single points of failure. Connections made through higher tree-levels have more paths between the source and the destination as they traverse more dilated routers in the network.