Communication Performance - Cost-Based Deflection Routing for Intelligent NoC Switches

Cost-Based Deflection Routing for Intelligent NoC Switches

6.6 Communication Performance

We have performed extensive simulation studies to validate the intelligent, cost-based routing approach. The most significant results are presented subsequently.

6.6.1 Experimental Setup

Our simulation experiments use a two-dimensional mesh topology (cf. Fig.6.1) of sizeN_x9N_y=898. Links are 128 bit wide and transmit a complete packet in one cycle. Switches implement deflection routing without flow control as each incoming packet is forwarded to an output in the same cycle. Input ports are registered for synchronous operation. Switches at the mesh boundaries are of the same type as all other ones, but their unused inputs and outputs are tied off. The only design parameter varied between simulations is the routing policy.

Resources in our simulation generate and receive network traffic. We have used synthetic traffic patterns,uniform trafficwith pseudo-random destination addresses Table 6.2 Cost and timing performance

Design Area

(lm²)

Equivalent gate count (NAND2)

Critical path (ns)

Maximum operating frequency (MHz)

Nostrum 13,570 21,695 9.36 106.8

Cost-based (Sect. 6.4.1) 12,320 19,739 10.67 93.7

Fault-tolerant (Sect. 6.4.2) 16,790 26,961 12.92 77.4

Retimed version 19,129 28,546 9.25 108.1

Pipelined version 21,580 34,738 5.45 183.5

Load balancing (Sect. 6.4.3) 17,299 27,776 13.13 76.2

86 M. Radetzki and A. Kohler

uniformly distributed over the address range, andcomplement trafficfrom resource positioned at (x, y) to destination (N_x-1-x, Ny-1 -y) for all x2{0, ..., N_x-1},y2{0, ...,N_y-1}. Packets are generated with a configurable, constant rate, and are stored in an unbounded packet FIFO per resource. A packet is injected into the NoC when a packet FIFO is non-empty and the attached switch has an unused output after routing all other packets.

Simulation models have been implemented in SystemC [18] based on the Transaction Level Modelling (TLM) extension version 2.0 [19] and the object-oriented approach from [20]. While simulation allows switching between modes of different accuracy in an adaptive way [21], we employ a strictly cycle-accurate simulation here in order to obtain best possible accuracy. For each individual parameter set, 2,000 cycles have been simulated.

6.6.2 Latency and Throughput

We evaluate the performance impact of cost-based intelligent routing by com-paring its throughput and hop count against a traditional router with weighted priority deflection routing. Simulation has been performed under varying load conditions, shown here for complement traffic, with and without load balancing.

At rates below saturation (linear zone in Fig.6.4, left), all routing methods provide similar throughput. The cost-based intelligent deflection router reaches a higher saturation throughput than the priority based two-stage deflection routing mech-anism. Load balancing has small impact in case of intelligent routing, but reduces saturation throughput significantly in conjunction with two-stage deflection. This is because packets are unnecessarily deflected in a network that is fully congested anyway. Cost-based intelligent routing avoids this pitfall by not favoring deflection if stress is equally high in all directions.

Another advantage of the cost-based approach can be seen in Fig.6.4(right):

it reduces the average hop count significantly, compared to the priority based

0.05

Throughput [packets per resource and cycle]

Injection rate [packets per resource and cycle]

cost driven

Injection rate [packets per resource and cycle]

priority based priority + load balancing cost driven cost + load balancing

Fig. 6.4 Packet throughput (left) and average hop count (right)

6 Cost-Based Deflection Routing for Intelligent NoC Switches 87

variant. In the sub-saturation zone, adding load balancing to priority based routing also provides significant improvement, but it fails under saturation due to unnecessary deflections in a fully congested network. Adding load balancing to cost-based routing generally increases hop count slightly; however its advantage is in the reduction of FIFO backlog (cf.Sect. 6.6.4).

6.6.3 Fault-Tolerance

Here we measure NoC performance by means of its saturation throughput achievable in presence of faults and under uniform traffic. Faults have been sim-ulated as permanent or transient with durationt=1, 4, ..., 256. Figure6.5(left) shows maximum packet throughput over varying failure rate, under the assumption that a failure makes a switch fully unavailable. This is the best assumption that can be made (and has been made by previous work) if no intelligence about the internal switch status is available. It results in a significant negative performance impact already at small failure rates. With increasing fault duration, performance degrades more rapidly. In the case of permanent faults, even low failure rates can reduce throughput to near zero.

Figure6.5 (right) shows achievable throughput using our concept of fault matrix. For bidirectional link faults, i.e. complete rows and columns indicating faults, performance degrades significantly less compared to the previous case.

When assuming single crossbar connection faults, i.e. singular fault entries in the fault matrix, performance is reduced just slightly, even at the highest failure rates.

Of course, this is because a single connection fault is much less severe than the complete breakdown of a switch. We argue that in practice, most faults would affect only part of a switch. In this case, using intelligence on the switch’s internal status for making routing decisions vastly increases the performance of fault-tolerant routing mechanisms.

Maximum throughput [packets per resource and cycle]

Failure rate [%]

Maximum throughput [packets per resource and cycle]

single conn. t=1

Fig. 6.5 Throughput under switch faults (left), link and crossbar faults (right)

88 M. Radetzki and A. Kohler

6.6.4 Benefits from Load Balancing

Figure6.6 shows the average packet FIFO filling levels (backlog) at the different positions in the mesh, without (left) and with (right) load balancing. These filling levels have been obtained with complement traffic at a packet generation rate of 0.21, just slightly below network saturation.

Without load balancing, FIFOs in the center of the network are filled, on average over all simulated cycles, with up to almost 10 packets. Load balancing according to Sect. 6.4.3 significantly improves the situation at the given packet generation rate (right). Note the different scale: the largest FIFO has an average filling level of less than 0.6. Moreover, FIFO backlog is much better distributed over the network than without load balancing, where it is centered in the middle.

Above results have been obtained with the parametersm=4 andn=12 (cf.

Sect. 6.4.3). Shorter window length, e.g. n=4, and less differentiation through penalty steps, e.g.m=1, both yield inferior results.

6.7 Conclusion

We have presented an intelligent NoC routing algorithm that uses information on the router’s and its environment’s status. This information is weighted and com-bined in a cost function which enables the computation of locally optimized routing permutations. The technique has been employed to combine, for the first time in NoC, fault-avoidance and congestion avoidance as criteria for selecting deflections. Models of the intelligent cost-based deflection router show perfor-mance improvements over previous deflection routing variants employed in NoC.

A VHDL implementation shows that cost-based routing causes no area overhead.

0 10 0 10 0 100 100 100 100 100 10 0 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.6

Fig. 6.6 Average packet FIFO filling level without (left) and with load balancing (right) 6 Cost-Based Deflection Routing for Intelligent NoC Switches 89

References

1. Coppola M, Grammatikakis MD, Locatelli R, Maruccia G, Pieralisi L (2008) Design of cost-efficient interconnect processing units—Spidergon STNoC. CRC Press, Boca Raton 2. Furber S (2006) Living with failure: lessons from nature. In: Proceedings of the European test

symposium (ETS), pp 1–4

3. Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In:

Proceedings of the design automation conference (DAC), pp 684–689

4. Penolazzi S, Jantsch A (2006) A high level power model for the Nostrum NoC. In:

Proceedings of the Euromicro conference on digital system design (DSD), pp 673–676 5. Lu Z, Zhong M, Jantsch A (2006) Evaluation of on-chip networks using deflection routing.

In: Proceedings of the great lakes symposium on VLSI (GLSVLSI), pp 296–301

6. Raik J, Ubar R, Govind V (2007) Test configurations for diagnosing faulty links in NoC switches. In: Proceedings of the European test symposium (ETS), pp 29–34

7. Grecu C, Ivanov A, Saleh R, Sogomonyan ES, Pande PP (2006) On-line fault detection and location for NoC interconnects. In: Proceedings of the international on-line testing symposium (IOLTS), pp 145–150

8. Alaghi A, Karimi N, Sedghi M, Navabi Z (2007) Online NoC switch fault detection and diagnosis using a high level fault model. In: Proceedings of the international symposium on defect and fault-tolerance in VLSI systems (DFT), pp 21–29

9. Kohler A, Radetzki M (2009) Fault-tolerant architecture and deflection routing for degradable NoC switches. In: Proceedings of the 3rd ACM/IEEE international symposium on networks-on-chip (NOCS), pp 22–31

10. Bogdan P, Dumitras T, Marculescu R (2007) Stochastic communication: a new paradigm for fault-tolerant networks-on-chip. Hindawi VLSI design, p 17

11. Mediratta SD, Draper J (2007) Performance evaluation of probe-send fault-tolerant network-on-chip router. In: Proceedings of the conference on application-specific systems, architectures and processors (ASAP), pp 69–75

12. Wu J, Wang D (2002) Fault-tolerant and deadlock-free routing in 2-d meshes using rectilinear-monotone polygonal fault blocks. In: Proceedings of the international conference on parallel processing, pp 247–254

13. Hu J, Marculescu R (2004) Dyad—smart routing for networks-on-chip. In: Proceedings of the design automation conference (DAC), pp 260–263

14. Zhang Z, Greiner A, Taktak S (2008) A reconfigurable routing algorithm for a fault-tolerant 2d-mesh network-on-chip. In: Proceedings of the design automation conference (DAC), pp 441–446

15. Li M, Zeng Q-A, Jone W-B (2006) DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of the design automation conference (DAC), pp 849–852

16. Nilsson E, Millberg M, Öberg J, Jantsch A (2003) Load distribution with the proximity congestion awareness in a network on chip. In: Proceedings of the design, automation and test in Europe (DATE), pp 1126–1127

17. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Quart 2:83–97

18. IEEE Standard 1666 (2005) SystemC 2.1 language reference manual. IEEE Standards Association, Piscataway

19. Open SystemC Initiative (2008) OSCI TLM-2.0 user manual. Software version TLM-2.0.

Document version JA22,http://www.systemc.org

20. Radetzki M (2006) SystemC TLM transaction modelling and dispatch for active objects. In:

Proceedings of the forum on design languages (FDL), pp 203–209

21. Radetzki M, Salimi Khaligh R (2008) Accuracy-adaptive simulation of transaction level models. In: Proceedings of the design automation and test in Europe (DATE), pp 788–791

90 M. Radetzki and A. Kohler

Chapter 7 NOCEXplore

Dans le document Lecture Notes in Electrical Engineering (Page 90-95)