Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

Size: px

Start display at page:

Download "Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks"

Tracy Miller
5 years ago
Views:

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut

1 Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹ ¹, Institute for Integrated Systems ² Lantiq GmbH Deutschland ³ Infineon Technologies AG, Intellectual Property Reuse

2 Networks-on-Chip & Deadlocks Packet-switched NoCs susceptible to deadlocks Especially wormhole forwarding Routing cycles in channel dependency diagram S D D S NOCS 2010 Andreas Lankes 2

3 Deadlock Prevention Removal of Routing cycles Implementation of virtual channels and adaption of routing function Restriction of routing function Forbidden turns Allowed turns NOCS 2010 Andreas Lankes 3

4 Message Dependent Deadlocks Network itself free of routing cycles Communication contains message dependencies Memory access: read request -> read response DMA transaction... N-way protocol N dependent messages or message types Request packet Memory Response packet CPU Message dependency between request and response packet creates forbidden turn! NOCS 2010 Andreas Lankes 4

5 Message Dependent Deadlock Avoidance Buffer Sizing Destination tile guarantees reception of all packets -> Huge input buffers End-to-end flow control Limitation of sender quota E.g. credit based Strict ordering Separation of message types in different networks E.g. virtual channels: Buffer size rises with number of dependent messages 2 virtual channels Switch Link NOCS 2010 Andreas Lankes 5

6 Table of Content Introduction Message Dependent Deadlock Recovery for NoCs Comparison of Deadlock Recovery and Deadlock Avoidance Conclusion NOCS 2010 Andreas Lankes 6

7 Deadlock Avoidance Strict Ordering with virtual channels Additional buffer queues per port (number of message I0 types!) Input buffer D H 1 1 Router Output buffer D 2 O0 I1 O1 Virtual channel queue 2 virtual channel queues PER port NOCS 2010 Andreas Lankes 7

8 Deadlock Recovery in HPC Additional channel in the network reserved for deadlocked packets In all routers and network interfaces Central to the router Timer based deadlock detection Redirection from inputand output buffers I0 I1 T in Input buffer D H 1 1 D 3 Deadlock recovery control unit normal path of a packet redirection of packet Router Output buffer D 2 H 3 O0 O1 T out Reserved deadlock channel Reserved deadlock channel as virtual channel NOCS 2010 Andreas Lankes 8

9 Deadlock Recovery for NoCs Avoid deadlocks in reserved deadlock channel Strict ordering in deadlock recovery channel I0 Input buffer D H 1 1 Router Output buffer D 2 O0 Exclusive access to deadlock virtual channels I1 D 3 O1 H 3 T in Deadlock recovery control unit normal path of a packet redirection of packet T out Reserved channel with nested deadlock virtual channels NOCS 2010 Andreas Lankes 9

10 Access Regulation Scheme Exclusive access to each deadlock virtual channel by token based access scheme Tokens circle through the token distribution ring network On redirection: Token travels with redirected packets Released on reception in the destination Tile Router Network Interface Token distribution ring network NOCS 2010 Andreas Lankes 10

11 Enable Redirection of Packets Problems: Buffers implemented as FIFO queues Wormhole forwarding Header flits always at first position in queues Restrict switching function Restrict flow control function I0 I1 Input buffer D H 1 1 D 3 Packet 1 must not be switched Router Output buffer D 2 H 3 O0 O1 Reduction of effective buffer size -> throughput T in Deadlock recovery control unit T out normal path of a packet redirection of packet NOCS 2010 Andreas Lankes 11

12 Back-off Mechanism Timer based deadlock detection: Congested network Deadlock recovery unit Back-off mechanism Disable sending Back-off token in token ring network Forced sending stop for tiles Tile Router Network Interface Token distribution ring network NOCS 2010 Andreas Lankes 12

13 Table of Content Introduction Message Dependent Aware Deadlock Recovery for NoCs Comparison of Deadlock Recovery and Deadlock Avoidance Conclusion NOCS 2010 Andreas Lankes 13

14 Comparison of Deadlock Avoidance & Recovery Common system architecture 8x8 2D mesh architecture XY routing, wormhole forwarding CPU CPU MEM CPU CPU Applied Traffic Inter processor traffic (uniform distribution, rate constant) Memory access traffic (uniform or varying localization, rate iterated) CPU CPU CPU CPU CPU MEM CPU CPU CPU MEM CPU CPU CPU CPU CPU CPU CPU MEM CPU CPU Deadlock Recovery (MeshDr) Deadlock Avoidance: strict ordering using virtual channels (Mesh) NOCS 2010 Andreas Lankes 14

15 Buffer Size Comparison Deadlock Recovery saves almost 50% of total buffer space For 2 dependent messages Buffer space [flits] Mesh Buffer Space of Networks MeshDr MeshExtBuf MeshDrExtBuf MeshExtBuf2 MeshDrExtBuf2 0 Length of routers' buffer queues 2 flits 4 flits 8 flits NOCS 2010 Andreas Lankes 15

16 Memory Throughput Deadlock avoidance outperforms deadlock recovery Memory Throughput Throughput of deadlock recovery depends on timings Name of timings Deadlock Detection Threshold [cycles] Back-off Period [cycles] T T T T Send rate of response flits of a memory 0,6 0,5 0,4 0,3 0,2 0,1 Mesh MeshDrT1 MeshDrT2 MeshDrT3 MeshDrT4 0,0 0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020 Request flit generation rate of one processor NOCS 2010 Andreas Lankes 16

17 Localization of Memory Access Traffic Processors prefer nearer memories CPU CPU MEM CPU CPU CPU CPU CPU CPU CPU MEM CPU CPU CPU MEM CPU CPU CPU CPU CPU CPU CPU MEM CPU CPU Deadlock recovery profits from localization Send rate of response flits of a memory 0,6 0,5 0,4 0,3 0,2 Mesh MeshLoc0 MeshLoc1 T1 T1Loc0 T1Loc1 Memory Throughput 0,1 0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020 Request flit generation rate of one processor [flits/cycle] Increasing localization NOCS 2010 Andreas Lankes 17

18 Comparison of Networks with equal Buffer Space Higher throughput for recovery scheme with equal buffer space (for localized memory access traffic) Buffer space [flits] Length of routers' buffer queues 0 Buffer Space of Networks Mesh MeshDr MeshExtBuf MeshDrExtBuf MeshExtBuf2 2 flits 4 flits 8 flits MeshDrExtBuf2 Send rate of response flits of a memory 0,7 0,6 0,5 0,4 0,3 0,2 MeshLoc1 MeshLoc1ExtBuf MeshLoc1ExtBuf2 T1Loc1 T1Loc1ExtBuf T1Loc1ExtBuf2 Memory Throughput 0,1 0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020 Request flit generation rate of one processor [flits/cycle] Approx. equal buffer space NOCS 2010 Andreas Lankes 18

19 Table of Content Introduction Message Dependent Aware Deadlock Recovery for NoCs Comparison of Deadlock Recovery and Deadlock Avoidance Conclusion NOCS 2010 Andreas Lankes 19

20 Conclusion Significant savings in buffer space For 2 dependent messages almost 50% Savings increase with number of dependent messages Comparable buffer space leads to throughput advantage (for localized memory traffic) Future work Deadlock detection Random access to buffer queues... NOCS 2010 Andreas Lankes 20

21 Thank You! Any Questions? NOCS 2010 Andreas Lankes 21

22 Effects of Restricted Switching & Flow Control Reduction of effective buffer size Reduction of throughput Latency [ns] Transfer Latency of Uniform Traffic Mesh:pl=3 MeshDr:pl=3 Mesh:pl=10 MeshDr:pl= ,05 0,1 0,15 0,2 0,25 0,3 Flit Generation Rate [flits/cycles] NOCS 2010 Andreas Lankes 22

Basic Low Level Concepts

Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock