Fine-Grained Interconnect Synthesis

Size: px

Start display at page:

Download "Fine-Grained Interconnect Synthesis"

Eustacia Atkins
5 years ago
Views:

1 Fine-Grained Interconnect Synthesis Alex Rodionov, David Biancolin, Jonathan Rose Department of Electrical & Computer Engineering University of Toronto (1)

2 Making Hardware Design Easier n Interconnect: important part of hardware design n Allows functional units to communicate + - ready data ON- CHIP MEM CPU CPU MEM CPU CPU MEM CPU CPU n Observation: it s difficult to design properly Ø Our focus: automatic design and synthesis of interconnect (2)

3 Existing Tools: Coarse-Grained Design n Commercial: Altera Qsys, Xilinx IPI n Academic: Networks-on-Chip (ex: CONNECT) n Generally connect big things: processors and IP blocks CPU MEM H/W Accel n Interfaces: memory-mapped, streaming n Variable-latency-tolerant à plug and play (3) n What about inside IP blocks?

4 Fine-Grained Interconnect Design n smaller functional modules n coordinate data transfer more closely depend on specific interconnect latencies Area is at a premium in fine-grained systems n unicast, but also broadcast/multicast Ø Existing tools aren t good at this S Unicast: Multicast: Broadcast:

5 GENIE: Generic Interconnect Engine n Grand vision wants both: Fine & Coarse Grain n Input: functional modules logical connectivity performance specifications n Output: instantiated functional modules Fine or coarse interconnect optimized to meet constraints n Focus of this work: automatic generation of fine-grained interconnect and some optimization (5)

6 Inputs/Outputs.sv Component Implementa;ons WriMen by designer Untouched/unreferenced.lua Inputs (Lua script code): Component defini;ons Component instan;a;ons Logical E2E connec;vity Topology choice Primi;ves Library.sv GENIE.sv.sv Outputs (SystemVerilog): Components Interconnect (6)

7 GENIE Flow: Input Specification Components Interfaces mysend myrecv foo A B C bar (7)

8 GENIE Flow: Input Specification System A1 A mysend mysend foo bar C1 P Q R Linkpoints Instances myrecv B X Y X Y myrecv myrecv B1 B2 foo C bar (8)

9 GENIE Flow: Synthesizing Logical Connectivity Interconnect System A1 mysend P Q R Split Node FlowID Converters X Y S Merge Node myrecv B1 foo C1 bar M Links(logical) Physical Interconnect (9) X Y myrecv B2

10 GENIE Flow: Design Space Exploration System Alternative topology: Ring A1 mysend M S myrecv B1 M foo C1 bar S S myrecv B2 (10)

11 Optimization Features (11)

12 Removing Unnecessary Arbitration A B M C n Designer: A and B will never try to send to C simultaneously n No competition à No arbitration à Simpler circuit Merge Node (15) Simplified Merge Node

13 Smart Clock Domain Crossing n Applications can use more than one clock domain Need to insert clock crossers where domains meet Typically FIFOs n There can be many choices of where to put the crossers Some choices more expensive than others Cost = f(#crossers, data width) S (clk B) (clk A) Ø Optimization problem: given any topology, where are the crossing points? (13)

14 Latency Introspection n Recall: Fine-grained design scenario n Area/complexity at a premium: don t want overhead of flow control or latency insensitivity n Component must know exact interconnect latency N=5 N S M S (14)

15 Results (15)

16 Measurement and Comparison n Recall Goals: Easy to design hardware Produce interconnect with good performance, low area n Will Compare GENIE with: 1. Manual hand-optimized RTL Human engineer 2. Altera Qsys Commercial Coarse-Grain System Interconnect Tool (16)

17 Metrics 1. Ease-of-use: source code line counts Manual: all Verilog Qsys: Verilog for functional modules + TCL for spec GENIE: Verilog for functional modules + Lua for spec 2. Area 3. Clock Frequency (17)

18 Design Example n Application: Blocked Matrix LU Decomposition Given: matrix A Find: lower/upper-triangular matrices L,U, s.t. LU=A n Parallelized among N Compute Elements (CE) n Fine-grained system: CE interior (16)

19 CE Architecture n 5 caches: 2x(2 double-buffered) + 1 n Data Marshaller: fills/empties caches n Compute pipeline: operates on data in caches n Control block (19)

20 Connections To and From Caches pipe rd req Caches top rd resp pipe Read Paths pipe left0 left1 pipe pipe cur0 pipe marsh cur1 marsh Write Paths pipe marsh wr req top left0 left1 cur0 cur1 = Clock A = Clock B (20)

21 Connections To and From Caches pipeline assumes fixed rd latency Read Paths pipe pipe clock domain crossings pipe rd req Caches top left0 left1 cur0 rd resp pipe pipe mutuallyexclusive access pipe marsh cur1 marsh multicast Write Paths pipe marsh wr req top left0 left1 cur0 cur1 = Clock A = Clock B (21)

22 Experimental Setup n Create three variants of the CE design 1. Manual 2. Altera Qsys Avalon-MM for cache links Avalon-ST for other, misc. point-to-point links 3. GENIE n Compile with Quartus 14.0 for Stratix V (6 seeds) n Measure Source code line counts Area Clock frequency (22)

23 Results: Lines of Code Total Functional Interconnect GENIE vs. Manual -33% -2.6% -72% GENIE vs. QSys Qsys -15% -8.6% -34% (23)

24 Results: Clock Frequencies Clock A Clock B GENIE vs. Manual -1% +9% GENIE vs. Qsys +25% +27% (24)

25 Results: Area Usage ALM M20K GENIE vs. Manual +3.7% +0% GENIE vs. Qsys -17% -57% (25)

26 Results: Observations n RAM block usage: Clock Crossings Qsys inserts too many crossings à High RAM usage GENIE intelligently inserts crossings to reduce area (26)

27 Qualitative Ease-of-Use Advantage Recall: Pipeline s cache reads require known fixed latency n With Qsys had to determine interconnect s contribution by simulation and then modify Verilog source code by hand n With GENIE Used latency introspection to tell pipeline what the latencies are (27)

28 Conclusions 1. For our (single, representative) fine-grained example: Similar to hand-made: +4% area 72% lines of interconnect code Better than Qsys: 25% faster 17% smaller 34% fewer lines of interconnect code 2. Qualitative design flow improvements (fixed latency) (28)

29 Future Work 1. Move towards Grand Vision Explore/exploit topology communication ability Allow performance spec Automatic topology based on communication patterns 2. Evaluate full LU system, and others (coarse-grained), rapid design-space exploration of different topologies 3. Build higher-level memory-mapped protocols on top of existing GENIE infrastructure (29)

30 Software Release n Available at: (30)

31 GenIE Flow Func. Mod. (verilog) Logical Connec;vity current Predefined/Custom Topology future Performance Constraints Genie Phases Instan;ated Components + Full Design of Interconnect (verilog)

32 Some Details on the Input n Data, flow control, and where to go/where it came from n Signals: data: zero or more (tagged), arbitrary width valid ready sop (start of packet) eop (end of packet) lp_id (linkpoint ID, selects/informs linkpoint on interface) n Not all need to be present (32)

33 Example Input: Component Spec (Lua) component('a', 'ver_module_name') clock_sink('inclk', 'in_clk_sig_name') interface('mysend', 'rs', 'out', 'InClk') signal('valid', 'valid_sig_name') signal('ready', 'ready_sig_name') signal('data', 'data_sig_name_x', 8, 'x') signal('data', 'data_sig_name_y', 13, 'y') signal('lp_id', 'lpid_sig_name', 2) linkpoint('p', "2'b00") linkpoint('q', "2'b01") linkpoint('r', "2'b10") A mysend P Q R (33)

34 Interconnect Architecture n Split/Merge (Y. Huan et al., FPT 2012) n Lightweight, composable switching primitives n Split: one to many S n Merge: many to one M n Misc. conversion/utility blocks (34)

35 Split Node payload is broadcast S FlowID à which output(s!) to send to state tracking (35)

36 Optimization: Smart Clock Domain X-ing 1. Create graph: color = clock domain 2. Edge weight = link width (bits) 3. Find min-weight cut = crossing points 4. Assign domains, insert clock converters Clk B Clk A Clk C (36)

37 Latency Introspection L_X=2 L_Y=1 L_X L_Y L_X S?? M S L_Y (37)

Fine-Grained Interconnect Synthesis

Fine-Grained Interconnect Synthesis Alex Rodionov, David Biancolin, and Jonathan Rose The Edward S. Rogers Sr. Department of Electrical & Computer Engineering University of Toronto {arod, Jonathan.Rose}@ece.utoronto.ca,