Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement

Department of Electrical Engineering Computer Engineering Helmut Schmidt University, Hamburg University of the Federal Armed Forces of Germany Meyer, Haase, Eckert, Klauer Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement Department of Electrical Engineering Computer Engineering Helmut Schmidt University, Hamburg University of the Federal Armed Forces of Germany Meyer, Haase, Eckert, Klauer Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 0 / 15

Table of Contents Table of Contents 1 Motivation Approach Experiments 4 Conclusion 1 Motivation Approach Experiments 4 Conclusion Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 0 / 15

Motivation Partial Runtime Reconfigurable Systems Motivation Partial Runtime Reconfigurable Systems RM0.bit RM0.bit RM01.bit FPGA RM00.bit RM0 static logic RM0.bit RM1 RM1.bit RM1.bit RM11.bit RM10.bit RM0.bit RM01.bit RM00.bit RM0 FPGA Figure: Example partitioning of an FPGA for use with the Xilinx PR design flow[4] static logic Reconfiguration of parts of an FPGA while other parts are still active. RM1 RM1.bit RM1.bit RM11.bit RM10.bit What are PRRS? in normal RS, FPGA is changed in a whole PRRS use different design flow to support... explain figure in this case Partial Reconfiguration from Xilinx Figure: Example partitioning of an FPGA for use with the Xilinx PR design flow[4] Reconfiguration of parts of an FPGA while other parts are still active. explain figure available on modern FPGAs including Xilinx Virtex, Artix, Kintex Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Motivation Example System FPGA - reconfiguration plattform reconfiguration Module Uplink Ethernet/ Uart Motivation Example System FPGA - reconfiguration plattform Uplink Ethernet/ Uart ICAP reconfiguration Module ICAP IOB IOB Downlink Ethernet/ Uart IOB Figure: Multicore Reconfiguration Platform[] IOB Example Multicore Reconfiguration Platform describe figure ceb just rm Downlink Ethernet/ Uart NoC - circuit switched Runtime adaptive multiprocessor system-on-chip (RampSoC) next Figure: Multicore Reconfiguration Platform[] Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 / 15

Motivation RampSoC FPGA Motivation RampSoC FPGA (Type 1) (Type 1) (Type 1) (Type 1) (Type 1) (Type 1) (Type ) (Type 1) Figure: Runtime adaptive multiprocessor system-on-chip (RampSoC)[1] (Type 1) (Type 1) another example system processors and accelerators reconfigurable (Type ) (Type 1) different NoCs, eg. Bus - circuit switched Figure: Runtime adaptive multiprocessor system-on-chip (RampSoC)[1] Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 / 15

Motivation General Motivation General Partial Runtime Reconfigurable System (PRRS) consist of: a static part some partial reconfigurable parts some Network On Chips (NOCs) Partial Runtime Reconfigurable System (PRRS) consist of: a static part some partial reconfigurable parts some Network On Chips (NOCs) in general PRRS consist of... Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 4 / 15

Motivation Problem Motivation Problem Design User Constraints Size: 80% of FPGA Synthesis Placement Routing Optimize for Speed Design Size: 80% of FPGA User Constraints Optimize for Speed Problem is now... LongTime not routable Synthesis Placement Routing feed in design, in our cases 80% of fpga area constraints for speed optimization (static part, end reconfigurable) output: long time for design flow or no result at all LongTime not routable Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 5 / 15

Approach Main Idea Reduce routing times through relaxed user constraints measure signal latencies after configuration place components according to their clock speed requirements FPGA static I/O extern Approach Main Idea Reduce routing times through relaxed user constraints measure signal latencies after configuration place components according to their clock speed requirements RO Component ReRouter Component RM RM FPGA Clk 0 Clk 1 Clk Clk 0 Clk 1 Clk static I/O extern speed up design flow, reduce constraint for reconfigurable part RO Component ReRouter Component obvious: parts of design do not meet required clock speed RM RM not always necessary: components have different requirements place according to or set clock speed according to requirement Clk 0 Clk 1 Clk Clk 0 Clk 1 Clk one component measures, the other reroutes paths simplified image Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 6 / 15

Approach Signal Latency Measurement Approach Signal Latency Measurement adapted Ring Oscillator (RO) approach of Ruffoni and Bogliolo[] RO generates frequency through a path connected as a ring period of the RO is twice the propagation delay of its ring adding a path to the ring extends the propagation delay of the ring T1 = T0 + d propagation delay of added path: dp = (T1 T0) 1 adapted an approch of Ruffoni, used RO to measure path delays explain in short adapted Ring Oscillator (RO) approach of Ruffoni and Bogliolo[] RO generates frequency through a path connected as a ring period of the RO is twice the propagation delay of its ring adding a path to the ring extends the propagation delay of the ring T 1 = T 0 + d propagation delay of added path: d p = (T 1 T 0 ) 1 period is twice the propagation delay Therefore: a path can be added to loop/ring measure two periods d p = (T 1 T 0 ) 1 Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 7 / 15

Approach Measurement Components Measurement Component RO + multiplexer multiplexer adds the measurement path to RO ring counts RO ticks + counts ticks of a 50Mhz clock period in ns: ReRouter Component connects the input to the output 1 T = 1000 #(RO ticks) #(f[mhz] ticks) f[mhz] adaption: component with a RO and a multiplexer multiplexer can add a path to RO loop two counters: RO ticks, ticks of 50Mhz clock calculate period for both periods, base and extended path Approach Measurement Components Measurement Component RO + multiplexer multiplexer adds the measurement path to RO ring counts RO ticks + counts ticks of a 50Mhz clock period in ns: ReRouter Component connects the input to the output 1 T = 1000 #(RO ticks) #(f[mhz] ticks) f[mhz] rerouter simple Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 8 / 15

FPGA - reconfiguration plattform reconfiguration Module ICAP IOB IOB Uplink Ethernet/ Uart Downlink Ethernet/ Uart Experiments Measurement Setup Experiments Measurement Setup Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure FPGA - reconfiguration plattform reconfiguration Module ICAP IOB IOB Uplink Ethernet/ Uart Downlink Ethernet/ Uart Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns if time explain which structure then next part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M0 RR 1 Path Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M0 RR 1 Path Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M0 RR 1 Path Path 1 CSN 0 CSN 1 Path CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns RR M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M0 RR 1 Path Path 1 CSN 0 CSN 1 Path CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths RR M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Results Experiments Measurement Results 0-0 0-1 0-0- 1--1 1-1- -0-1 - - -0-1 - - 0-0.6 5.61 5.4 5.07 7.75 8.58 9.61 10.8 8.70.41 9.41 8.5 10. 9.8 9.65 9.65 0-1 5.7.90 7.7 6. 9.8 10.11 11.14 1.5 9.74 11.45 10.45 9.9 11.7 10.85 10.69 11.7 0-5. 7..07 5.8 7.46 8.9 9. 10.5 8.4 9.94 8.94 7.78 9.86 9.5 9.19 9.86 0-5.05 6.19 5.85.1 9.1 9.95 10.98 1.18 8.4 10.14 9.14 7.97 10.06 9.54 9.8 10.05 1-0 7.57 9.6 7.9 8.19 1.8 4.15 5.5 5.46 10.4 1.1 11.1 9.96 9.6 9.1 8.89 9.70 1-1 8,6 10.6 8.65 9.45 4.60.9 1.86 1.91 11.68 1.8 1.8 11. 10. 9.8 9.60.40 1-10.01.80 9.8 10.6 5.50 6.65.15 6.05 1.85 14.56 1.56 1.40.76 10.7 10.04 10.84 1-10.68 1.47 1.47 11.0 5.40 6.48 5.74.7.5 15. 14.4 1.07 10.50.01 9.78 10.58-0 8.86 9.79 8.4 8.60.7 11.51 1.9.9.87 5. 6.04 4.6 9.45 8.78 8. 9.5-1 10.56 11.49 10.1 10.1 1.4 1. 14.65.61 5..01 6.16 5.45 10.04 9.8 8.91 9.85-9.8 10.1 8.95 9.1 11.4 1..4 14.4 5.86 5.99.44 1. 9.08 8.4 7.95 8.89-8.4 9.6 7.90 8.08 10.0.99 1.8 1.8 4.55 5.8 6.07.6 9.8 8.7 8.5 9.19-0.06 10.99 9.6 9.80 9.96 10.1 10.86 10.9 9.50.09 9.1 9.51.4 6.19 6.10 6.0-1 9.46 10.9 9.0 9.1 9.54 9.79 10,4 10.50 8.91 9.50 8.7 8.9 6.6.00 4.67 5.84-8.60 9.5 8.17 8.5 8.9 9.17 9.8 9.88 8.04 8.6 7.85 8.05 5.78 4.8.17 4.67-9.81 10.74 9.8 9.55 10.00.4 10.89 10.96 9.5 9.84 9.06 9.6 5.98 5.7 4.95.70 Table: Propagation delay matrix for all s Configurable Entity Block RM = reconfigurable module 0-0 0-1 0-0- 1--1 1-1- -0-1 - - -0-1 - - 0-0.6 5.61 5.4 5.07 7.75 8.58 9.61 10.8 8.70.41 9.41 8.5 10. 9.8 9.65 9.65 0-1 5.7.90 7.7 6. 9.8 10.11 11.14 1.5 9.74 11.45 10.45 9.9 11.7 10.85 10.69 11.7 0-5. 7..07 5.8 7.46 8.9 9. 10.5 8.4 9.94 8.94 7.78 9.86 9.5 9.19 9.86 0-5.05 6.19 5.85.1 9.1 9.95 10.98 1.18 8.4 10.14 9.14 7.97 10.06 9.54 9.8 10.05 1-0 7.57 9.6 7.9 8.19 1.8 4.15 5.5 5.46 10.4 1.1 11.1 9.96 9.6 9.1 8.89 9.70 1-1 8,6 10.6 8.65 9.45 4.60.9 1.86 1.91 11.68 1.8 1.8 11. 10. 9.8 9.60.40 1-10.01.80 9.8 10.6 5.50 6.65.15 6.05 1.85 14.56 1.56 1.40.76 10.7 10.04 10.84 1-10.68 1.47 1.47 11.0 5.40 6.48 5.74.7.5 15. 14.4 1.07 10.50.01 9.78 10.58-0 8.86 9.79 8.4 8.60.7 11.51 1.9.9.87 5. 6.04 4.6 9.45 8.78 8. 9.5-1 10.56 11.49 10.1 10.1 1.4 1. 14.65.61 5..01 6.16 5.45 10.04 9.8 8.91 9.85-9.8 10.1 8.95 9.1 11.4 1..4 14.4 5.86 5.99.44 1. 9.08 8.4 7.95 8.89-8.4 9.6 7.90 8.08 10.0.99 1.8 1.8 4.55 5.8 6.07.6 9.8 8.7 8.5 9.19-0.06 10.99 9.6 9.80 9.96 10.1 10.86 10.9 9.50.09 9.1 9.51.4 6.19 6.10 6.0-1 9.46 10.9 9.0 9.1 9.54 9.79 10,4 10.50 8.91 9.50 8.7 8.9 6.6.00 4.67 5.84-8.60 9.5 8.17 8.5 8.9 9.17 9.8 9.88 8.04 8.6 7.85 8.05 5.78 4.8.17 4.67-9.81 10.74 9.8 9.55 10.00.4 10.89 10.96 9.5 9.84 9.06 9.6 5.98 5.7 4.95.70 explain table, more if time Table: Propagation delay matrix for all s path to and from a component are different paths => different times diagonal represent measurements from component to switch Configurable Entity Block RM = reconfigurable module Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 10 / 15

Experiments Clockrates Experiments Clockrates CSN- Clks (MHz) Clkc (MHz) Clks Clkc 5 67 1 150 75 16 81 159 79 Table: Maximum clock speeds within one switch clock using sequential cicuits only clock using combinational circuits CSN- Clk s (MHz) Clk c (MHz) 5 67 1 150 75 16 81 159 79 can calculate the maximum clock rate for the network Table: Maximum clock speeds within one switch present values experiments on a xilinx virtex5 fpga most good designs without pr achive 00Mhz Clk s Clk c clock using sequential cicuits only clock using combinational circuits Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 11 / 15

CSN 1...5.9 CSN simple 1...6.0 Fetch Control ALU Decode CSN 0 CSN 1 1...1...0 RegFile CSN CSN 1...4.0 Experiments Example System Experiments Example System Placement of a simple processore core according to the propagation delay matrix. Fetch Ctrl Loads instructions from RAM Control Unit of the processor CSN Fetch Control ALU CSN 0 CSN 1 Placement of a simple processore core according to the propagation delay matrix. RegF RegisterFile Dec Decodes instructions 1...5.9 1...1...0 ALU Arithmetical Logical Unit Decode RegFile wanted to know if it is possible to place a complex component simple processor core (Fetch, Decode, Registerfile, Control, ALU) CSN simple 1...6.0 CSN CSN 1...4.0 Fetch Ctrl Loads instructions from RAM Control Unit of the processor show placement RegF Dec RegisterFile Decodes instructions runs at 5MHz/50MHz ALU Arithmetical Logical Unit no performance measurement because CPU is not optimized for speed components can be placed differently as long requirements meet Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Experiments Floorplan Experiments Floorplan CSN 0 CSN 1 CSN CSN CSN 0 CSN 1 Yellow CSN 0 Red CSN 1 Green CSN lilac CSN light blue used FPGA area CSN CSN if time! explain placement highlight why some path have awkward measurements routing a random process Yellow CSN 0 Red CSN 1 Green CSN lilac CSN light blue used FPGA area Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Conclusion Conclusion Presented: method to measure path delays between runtime reconfigurable modules propagation delay matrix for the MRP placement of a small processor core according to this matrix Presented: method to measure path delays between runtime reconfigurable modules propagation delay matrix for the MRP placement of a small processor core according to this matrix presented method to measure path delays in PRRS evaluated through creating propagation delay matrix of MRP and placing a simple processor core according to it thank you very much for your attention. Questions? next Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 14 / 15

Questions Questions Questions? Questions? Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 15 / 15

Bibliography D. Göhringer et al. Runtime adaptive multi-processor system-on-chip: RAMPSoC. In: Parallel and Distributed Processing, 008. IPDPS 008. IEEE International Symposium on. Apr. 008, pp. 1 7. Dominik Meyer. Multicore Reconfiguration Platform - A Research and Evaluation FPGA Framework for Runtime Reconfigurable Systems. PhD thesis. Helmut-Schmidt-University Hamburg, Germany, 015. M. Ruffoni and A. Bogliolo. Direct Measures of Path Delays on Commercial FPGA Chips. In: Signal Propagation on Interconnects, 6th IEEE Workshop on. Proceedings. May 00, pp. 157 159. Xilinx, Inc. Partial Reconfiguration User Guide. http://www.xilinx.com/support/documentation/sw_ manuals/xilinx14_7/ug70.pdf. Apr. 01. Bibliography D. Göhringer et al. Runtime adaptive multi-processor system-on-chip: RAMPSoC. In: Parallel and Distributed Processing, 008. IPDPS 008. IEEE International Symposium on. Apr. 008, pp. 1 7. Dominik Meyer. Multicore Reconfiguration Platform - A Research and Evaluation FPGA Framework for Runtime Reconfigurable Systems. PhD thesis. Helmut-Schmidt-University Hamburg, Germany, 015. M. Ruffoni and A. Bogliolo. Direct Measures of Path Delays on Commercial FPGA Chips. In: Signal Propagation on Interconnects, 6th IEEE Workshop on. Proceedings. May 00, pp. 157 159. Xilinx, Inc. Partial Reconfiguration User Guide. http://www.xilinx.com/support/documentation/sw_ manuals/xilinx14_7/ug70.pdf. Apr. 01. Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 15 / 15