Hardware Verification 2IMF20

Size: px

Start display at page:

Download "Hardware Verification 2IMF20"

Russell Stevens
6 years ago
Views:

1 Hardware Verification 2IMF20 Julien Schmaltz Lecture 06: Verification of on-chip communications

2 Living a revolution It is a really exiting time to be in computing right now, because we are [experiencing] a major change. A change that happens maybe once every 20 or 30 years... a fundamental re-thinking of... what computation is William Dally, DAC keynote

3 Multi-core shift reality 3

4 Growing number of cores AMD Opteron 12 cores Sun Niagara3 16 cores Intel 8 cores ~1.8 Bill. T. on 2x3.46cm 2 ~1 Bill. T. on 3.7cm 2 ~2.3 Bill. T. on 6.8cm 2 Intel SCC 48 cores ~1.3 Bill. T. on 5.6cm 2 Intel 4 cores ~582 Mio. T. on 2.86cm 2 Intel Research 80 cores ~100 Mio. T. on 2.75cm 2 Tilera TILEPro64 64 cores Intel 2 cores ~167 Mio. T. on 1.1cm 2 4

5 80 Cores Research Chip (Intel) Teraflops, 62 Watts 100 millions transistors, 275 mm2 25% node area for router ASCI Red Supercomputer Teraflops (Dec. 1996) 10, 000 Pentium Pro 104 cabinets, 230 m2 5

6 It is only the beginning Source. IEEE Computers

7 Communications are key 80 core research chip about 25% area for the NoC about 30% power consumption for the NoC Communication fabrics key to performance and efficiency to functional correctness 7

8 Verification Challenges NoCs are very large systems Methods must scale up to 100s of agents Large number of parameters (routing, switching, buffers, etc.) Regular and irregular structures NoCs must be fault-tolerant Deep sub-micron effect Not all routers/processors are working Static and dynamic fault-models NoCs have intricate message dependencies Mix between interconnect and protocols e.g. cache coherency or master/slave Deadlocks can emerge from deadlock-free routing and protocols 8

9 Protocol'between' applica/ons' Applica/on'layer' Processing'nodes,' routers,'channels' Network'layer' Protocol'between' routers'' Data'link'layer'

10 NoC Example: Hermes 10

11 Application layer Masters send requests and wait for responses Slaves produce responses when receiving requests Deadlock-free protocol No cyclic message dependencies: 11

12 Network layer Deterministic simple routing algorithm First route to the destination column and then to the correct row No cyclic dependencies and thus deadlock-free 12

13 Link layer A packet moves if the next channel has free space p A packet moves if it has sufficiently credits p A packet is joined with another packet p p 13

14 Link layer A packet moves if the next channel has free space p 13

15 Deadlock-free? Deadlock-free application layer + Deadlock-free network layer + Deadlock-free link layer? = Deadlock-free system 14

16 All masters right, slaves left» Is the system deadlock-free? Master Slave 15

17 All masters right, slaves left» Is the system deadlock-free?» Yes! A deadlock would require a response to wait for a request Master Slave 16

18 Masters even columns, slaves odd» Is the system deadlock-free? Master Slave Slave Master Slave Master 17

19 Masters even columns, slaves odd» Is the system deadlock-free?» No if at least four columns, yes otherwise. Slave Slave Slave Master Master Master Purple requests are transformed into orange responses Blue requests are transformed into green responses 18

20 Masters even columns, slaves odd» Is the system deadlock-free?» No if at least four columns, yes otherwise. Slave Slave Slave Master Master Master Orange responses blocked by blue requests Green responses blocked by purple requests 19

21 Deadlock-free application layer + Deadlock-free network layer + Deadlock-free link layer? = Deadlock-free system 20

22 NoC Example (2) - Spidergon Design by STMicroelectronics 21

23 Application layer» Nodes send requests High-level protocol req! 22

24 Network layer Routing logic RelAd = (dest - current ) mod 4 * N if RelAd = 0 then stop elseif 0 < RelAd <= N then go clockwise elseif 3*N <= RelAd <= 4*N then go counter clockwise else go across endif 23

25 The network has a deadlock!» For instance, we can have a cycle of packets

26 Dividing in two networks» Is the system deadlock-free? Send packets Idle cores

27 Dividing in two networks» Is the system deadlock-free?» Yes! None of the dependencies in the right upper quarter occur Send packets Idle cores

28 Slaves in 1st quarter only?» Is the system deadlock-free? Send packets Idle cores

29 Slaves in 1st quarter only?» Is the system deadlock-free?» No! Send packets Idle cores

30 Slaves in 1st quarter only?» Is the system deadlock-free?» No! Send packets Idle cores

31 Slaves in 1st quarter only?» Is the system deadlock-free?» No! Send packets Idle cores

32 Slaves in 1st quarter only?» Is the system deadlock-free?» No! Send packets Idle cores

33 Slaves in 1st quarter only?» Is the system deadlock-free?» No! Send packets Idle cores

34 Deadlock-free application layer + Network layer with deadlocks + Deadlock-free link layer? = Deadlock-free system 31

35 Confusing» We need tools to (quickly) check for deadlocks taking details of all three layers into account in large systems 32

36 Outline» Intel's micro-architectural description language xmas language Capturing high-level structure and message dependencies Extended to MaDL at TU/e. Micro-architectural Description Language» Deadlock verification for MaDL Definition of deadlocks Labelled dependency graph Feasible logically closed subgraph» Conclusion and future work 33

37 Intel s abstraction for networks High-level of abstraction Exploit high-level structure Automatic proofs using invariant generation and hardware model-checking 34

38 xmas Executable Micro-Architectural Specification» Fair sinks and sometimes sources» Diagram is formal model» Friendly to microarchitects 35

39 MaDL (1) Channels with three signals data, input ready, target ready Transfer cycle both input and target are "true" Notion of a dead channel: F (c.irdy & G! c.trdy) {req,rsp} rsp {req} req 36

$Sink(CtrlJoin(long,short)); predicates and functions pred P (p : pkt_t, q : pkt_t ) { p.id == q.id }; function F (p : pkt_t) : pkt_t { flda = p.fldb; fldb = p.$

40 MaDL (2) param int QSIZE; textual input const pkt; chan src := Source(pkt); chan one, two := Fork(src); chan long := Queue(QSIZE,Queue(QSIZE, one)); chan short := Queue(QSIZE, two); Sink(CtrlJoin(long,short)); predicates and functions pred P (p : pkt_t, q : pkt_t ) { p.id == q.id }; function F (p : pkt_t) : pkt_t { flda = p.fldb; fldb = p.flda; }; structured recursive data types IO-FSM additional complex primitives LoadBalancer Joitch Multi-Match struct pkt0_t { flda : [2:0]; fldb : [3:0]; }; union pkt1_t { optiona : pkta_t; optionb : pktb_t; }; enum pkt2_t { optiona; optionb; }; And more: for-loops, if-then-else, uses, const optiona; 37

41 Composing modules via channels» Channels with three signals data, input ready, target ready» Transfer cycle both input and target are "true" 38

42 MaDL example req,rsp q 0 q 1 rsp req q 2 req» Two sources one for requests one for responses 39

43 MaDL example P req,rsp q 0 q 1 rsp req q 2 req» Two sources one for requests one for responses 39

44 MaDL example req,rsp q 0 q 1 P rsp req q 2 req» Two sources one for requests one for responses 39

45 MaDL example req,rsp req P q 0 q 1 P q 2 rsp req» Two sources one for requests one for responses 39

46 MaDL example req,rsp req q 0 q 1 q 2 P P rsp req» Two sources one for requests one for responses 39

47 MaDL example req,rsp q 0 q 1 P rsp req q 2 req» Two sources one for requests one for responses 39

48 Processing node for XY routing in a 2D-mesh L N S E h.x = X h.x X h.y = Y h.y Y h.x > X h.x < X h.y < YN h.y > Y S E W W

49 Processing node with requests and responses L (dst,src,req)! (src, _,rsp) req rsp N S E h.x = X h.x X h.y = Y h.y Y h.x > X h.x < X h.y < YN h.y > Y S E W W

50 A more complex processing nodes with virtual channels and credits

51 Outline» Intel's micro-architectural description language xmas language Capturing high-level structure and message dependencies Extended to MaDL at TU/e. Micro-architectural Description Language» Deadlock verification for MaDL Definition of deadlocks Labelled dependency graph Feasible logically closed subgraph» Conclusion and future work 43

52 Formal definition of "deadlock" in MaDL Intuition is a "dead" channel Formal definition based on Linear Temporal Logic Predicate logic Temporal operators "eventually" ( ) and "globally" ( ) Channel c is dead iff (c.irdy c.trdy)

53 MaDL example (c.irdy c.trdy) requests req,rsp q 0 q 1 rsp req q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

54 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0

55 MaDL example requests req,rsp q 0 q 1 rsp responses req x q 2 req Inject two requests in q0 Fork creates two copies

56 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

57 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

58 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

59 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

60 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk

61 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk Inject two responses in q0

62 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk Inject two responses in q0 q2 is permanently idle for responses, q1 is permanently blocking

63 MaDL example requests req,rsp q 0 q 1 rsp responses reqx q 2 req Inject two requests in q0 Fork creates two copies One pair is sunk Inject two responses in q0 q2 is permanently idle for responses, q1 is permanently blocking We have a deadlock without a circular wait!

64 General approach for deadlock detection in MaDL networks Define deadlock equations for all components Equations capture the reason why a component is idle or blocking Build a labelled waiting graph for each queue Labels correspond to the equations Graph captures the topology, i.e., the dependencies between the MaDL components Search for a feasible logically closed subgraph Corresponds to a deadlock situation Feasibility checked using Linear Programming This approach may output unreachable deadlocks A first step generates invariants to rule out false deadlocks Invariants are rather weak and simple - false deadlocks are in theory still possible

65 General approach for deadlock detection in xmas networks Define deadlock equations for all components Equations capture the reason why a component is idle or blocking Build a labelled waiting graph for each queue Labels correspond to the equations Graph captures the topology, i.e., the dependencies between the MaDL components Search for a feasible logically closed subgraph Corresponds to a deadlock situation Feasibility checked using Linear Programming This approach may output unreachable deadlocks A first step generates invariants to rule out false deadlocks Invariants are rather weak and simple - false deadlocks are in theory still possible

66 Deadlock equations for a channel Depends on the target component connected to the channel We look at the input port of the target component trdy data irdy Initiator u target

67 Deadlock equations for a join 2 cases output is blocked the other input is idle data irdy Block(u) = Idle(v) + Block(w) trdy u v w req

68 Deadlock equations for a join 2 cases output is blocked the other input is idle We need to know when a channel is idle! data irdy Block(u) = Idle(v) + Block(w) trdy u v w req

69 Idle equations for a channel Depends on the initiator component connected to the channel We are looking at the input port of the initiator data irdy trdy Initiator u target

70 Idle equations for a join A join is idle if one of the input channels is idle Idle(w) = Idle(u) + Idle(v) data irdy trdy u v w req

71 Idle equations for a fork A fork output is idle if the input is idle or the other output is blocked Idle(w) = Idle(u) + Block(v) data irdy trdy u v w req

72 Idle equations for a queue A queue is idle if it is empty and its input channel is idle This is for one message type which might be blocked by another type data Idle(w) = Empty(q). Idle(u) + Block(w') where w' is a message with a type different from w irdy trdy u w req

73 Our quest for "dead" queues Definition of a deadlock F (u.irdy => G~u.trdy) We look for a "dead" queue with a message in it (u.irdy) output blocked (G~u.trdy) Over approximation configuration not always reachable we may output false deadlocks

74 MaDL Verification: Overview Basic idea: Generate Invariants Generate SMT Problem - Express deadlock in SMT - Over-approximate deadlocks - Use invariants to rule-out false deadlocks - Reachability analysis of found deadlocks SMT Solver (Z3) (unsat) No deadlock! (sat) Generate SMV Model nuxmv Deadlock!

75 General approach for deadlock detection in MaDL networks Define deadlock equations for all components Equations capture the reason why a component is idle or blocking Build a labelled waiting graph for each queue Labels correspond to the equations Graph captures the topology, i.e., the dependencies between the MaDL components Search for a feasible logically closed subgraph Corresponds to a deadlock situation Feasibility checked using Linear Programming This approach may output unreachable deadlocks A first step generates invariants to rule out false deadlocks Invariants are rather weak and simple - false deadlocks are in theory still possible

76 Step 1 / simulation - req queues with req inject req req

77 Step 1 / simulation - req queues with req inject req req

78 Step 1 / simulation - req queues with req inject req req

79 Step 1 / simulation - rsp queues with req and rsp inject rsp req

80 Step 1 / simulation - rsp queues with req and rsp inject rsp req

81 Step 1 / simulation - rsp queues with req and rsp inject rsp req

82 General approach for deadlock detection in MaDL networks Define deadlock equations for all components Equations capture the reason why a component is idle or blocking Build a labelled waiting graph for each queue Labels correspond to the equations Graph captures the topology, i.e., the dependencies between the MaDL components Search for a feasible logically closed subgraph Corresponds to a deadlock situation Feasibility checked using Linear Programming This approach may output unreachable deadlocks A first step generates invariants to rule out false deadlocks Invariants are rather weak and simple - false deadlocks are in theory still possible

83 Step 2 / labelled dependency graph (1) start join req q1 q 1.req 1 join start with a message in q1 and visit the join

84 Step 2 / labelled dependency graph (2) start join u v w req sw mrg2 Block(u) = Idle(v) + Block(w) analyse the join according to its deadlock equation q1 q 1.req 1 join we go forward to the merge and backward to the switch + sw mrg2

85 Step 2 / labelled dependency graph (2) start req join sw u w v mrg2 Block(u) = Block(w) q1 q 1.req 1 join forwards to the switch - then the sink can never be blocked we assume fair sinks + sw mrg2 false sink

86 Step 2 / labelled dependency graph (2) start join req w v u mrg2 sw Idle(u) = Idle(w) backwards to the switch q1 q 1.req 1 join + sw mrg2 false sink

87 Step 2 / labelled dependency graph (2) start join w u req mrg2 sw Idle(u) = Idle(w). Empty(q2) backwards to the queue note that we forgot the Block(w') case q1 q2 q 1.req 1 join + sw q 2.rsp =0 mrg2 false sink

88 Step 2 / labelled dependency graph (2) start join req u v w u mrg2 sw Idle(w) = Idle(u). Idle(v) backwards to the merge and branch note branching is bad for us mrg1 q1 q2 q 1.req 1 join + sw q 2.rsp =0 mrg2 false sink

89 Step 2 / labelled dependency graph (2) start w v u u join req mrg2 Idle(u) = Block(v) + Idle(w) sw backwards to the merge and branch to the source - idle if no type produced to the fork q1 q 1.req 1 join src2. rsp 62 x frk mrg1 q2 + sw q 2.rsp =0 mrg2 false sink

90 Step 2 / labelled dependency graph (2) start join w u u req mrg2 Idle(u) = Idle(w). Empty(q0) sw backwards to q0 and the source src1 q0 q1 q 1.req 1 join false q 0.rsp =0. src2 rsp 62 x frk mrg1 q2 + sw q 2.rsp =0 mrg2 false sink

91 Step 2 / labelled dependency graph (2) start u w join req Block(u) = Block(w). Full(q1) sw mrg2 forwards back to q1 and stop expansion src1 false q0 q 0.rsp =0 src2. + rsp 62 x frk mrg1 q1 q 1 = q 1.size q2 q 1.req 1 join + sw q 2.rsp =0 mrg2 false sink

92 General approach for deadlock detection in MaDL networks Define deadlock equations for all components Equations capture the reason why a component is idle or blocking Build a labelled waiting graph for each queue Labels correspond to the equations Graph captures the topology, i.e., the dependencies between the MaDL components Search for a feasible logically closed subgraph Corresponds to a deadlock situation Feasibility checked using Linear Programming This approach may output unreachable deadlocks A first step generates invariants to rule out false deadlocks Invariants are rather weak and simple - false deadlocks are in theory still possible

93 Step 2 / logically closed subgraph 1 req src1 false q0 q 0.rsp =0 +. src2 rsp 62 x q1 q 1.req 1 join q 1 = q 1.size frk + sw q 2.rsp =0 mrg1 q2 mrg2 false sink

94 Step 2 / logically closed subgraph 1 req src1 false q0 q 0.rsp =0 +. src2 rsp 62 x q1 q 1.req 1 join q 1 = q 1.size frk + sw q 2.rsp =0 mrg1 q2 mrg2 false sink

95 Step 2 / logically closed subgraph 1 req src1 false q0 q 0.rsp =0 +. src2 rsp 62 x q1 q 1.req 1 join q 1 = q 1.size frk + sw q 2.rsp =0 mrg1 q2 mrg2 false sink not feasible

96 Step 2 / logically closed subgraph 2 req src1 false q0 q 0.rsp =0. src2 true + q1 q 1.req 1 join q 1 = q 1.size frk + sw q 2.rsp =0 mrg1 q2 mrg2 false sink

97 Step 2 / logically closed subgraph 2 req src1 false q0 q 0.rsp =0. src2 true + q1 q 1.req 1 join q 1 = q 1.size frk + sw q 2.rsp =0 mrg1 q2 mrg2 sink false q 1.req 1 q 1 = q 1.size q 2.rsp =0 true

98 Invariant Generation (1) Flow invariants: - Automatically generated - Linear equations over the # of packets in channels - Gaussian elimination - Equalities of # of packet between queues - NumX = #In - #Out Linear equations: - #in = #A_in = #B_in - #A = #A_in - #A_out - #B = #B_in - #B_out - #A_out = #B_out = #out After Gaussian elimination: - #A = #B 77

99 [Verbeek et al. 2016] Invariant Generation Highlight (2) Cross-layer invariants: - Automatically generated - Linear equations over: - (1) # transitions and being in a given state - (2) # of events on a channel and # transitions 78

100 Invariant Generation Highlight (2) [Verbeek et al. 2016] = req = ack 79

101 Experimental results 80

102 [Verbeek et al. 2016] Case-study: 2D Mesh - XY routing - MI protocol Interconnect: 81

103 [Verbeek et al. 2016] Case-study: 2D Mesh - XY routing - MI protocol Protocol: L2 caches Directory 82

104 [Verbeek et al. 2016] Case-study: 2D Mesh - XY routing - MI protocol Router MI Cache Core Router MI Directory Core 83

105 Case-study: Experimental results [Verbeek et al. 2016] 84

106 Reachability analysis 85

107 MaDL Verification: Overview Basic idea: Generate Invariants Generate SMT Problem - Express deadlock in SMT - Over-approximate deadlocks - Use invariants to rule-out false deadlocks - Reachability analysis of found deadlocks SMT Solver (Z3) (unsat) No deadlock! (sat) Generate SMV Model nuxmv Deadlock!

108 Reachability checking flow xmas network transfer islands state nuxmv model interleaving, or synchronous invariants reachable (under certain conditions)? definitely unreachable 87

109 Transfer islands 88

110 Enabled island 89

111 Enabled island 90

112 Combination of islands 91

113 Combination of islands 92

114 Conflicts 93

115 Different semantics interleaving asynchronous synchronous 94

116 Two Agents Example c c c c 95

117 Deadlock found by SMT c c "However if credit counters are sized incorrectly to provide more credits (say k + 1) than the capacity k of the ingress queues, the system deadlocks..." -- Chatterjee et al. VMCAI 2011 c c 96 96

118 Deadlock found by SMT c c "However if credit counters are sized incorrectly to provide more credits (say k + 1) than the capacity k of the ingress queues, the system deadlocks..." -- Chatterjee et al. VMCAI 2011 Incorrect! c c 96 96

119 Let s rewind one cycle c c No credit available. (3 requests) c c 97 97

120 Let s move one cycle forward c c 1 credit back. c c 98 98

121 Some experimental results Two agents example IC3 or BDD optional invariants synchronous or interleaving c = 2; k = 2 unreachable deadlock 2; 3 unreachable deadlock 4; 4 unreachable deadlock synchronous/ic3+inv ~3s state space: ~2 40 (about ) 4; 5 unreachable deadlock 4; 6 reachable deadlock 99

122 Outline» Intel's micro-architectural description language xmas language Capturing high-level structure and message dependencies Extended to MaDL at TU/e. Micro-architectural Description Language» Deadlock verification for MaDL Definition of deadlocks Labelled dependency graph Feasible logically closed subgraph» Conclusion and future work 100

123 Research questions and axes Q1: How can we formalise the generic aspects of specific domains? Foundations Formalise domain-specific theories Abstractions Q2: How can we define specification languages with built-in support for formal analyses? Q3: How do we relate the formally proven correct specification to the actual design? Focus on aspects and properties Language to express and analyse Approximations Ignore details to scale-up

124 Research questions Q1: How can we formalise the generic aspects of specific domains? Q2: How can we define specification languages with built-in support for formal analyses? Q3: How do we relate the formal proven correct specification to the actual design?

125 General Theory of Networking Architectures (Q1) GeNoC theory with Productivity Theorem Deadlock-free routing theory & algorithms Languages (Q2) Verification Technologies (Q2 & Q3) MaDL language & algorithms Integration Real World Applications Cooperation with industrial partners (Intel, ARM, NXP?) and academic partners (FBK, UC Irvine, Tsinghua, )

126 Today s focus General Theory of Networking Architectures (Q1) GeNoC theory with Productivity Theorem Deadlock-free routing theory & algorithms Languages (Q2) Verification Technologies (Q2 & Q3) MaDL language & algorithms Integration Real World Applications Cooperation with industrial partners (Intel, ARM, NXP?) and academic partners (FBK, UC Irvine, Tsinghua, )

127 MaDL: Conclusions & Future Work Conclusions Adequate DSL for prototyping network architectures. Being applied to some industrial case-studies Future work Equivalence relations between MaDL and RTL Performance (throughput & latency) evaluation Extend to entire systems

128 THANKS! 106

Formal Verification of Communications in Networks on Chips

Formal Verification of Communications in Networks on Chips Sebastiaan J.C. Joosten Julien Schmaltz Freek Verbeek Bernard van Gastel Open University of the Netherlands, Heerlen Radboud University Nijmegen