Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement

Similar documents
Implementation of Optimized ALU for Digital System Applications using Partial Reconfiguration

FPGA: What? Why? Marco D. Santambrogio

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

Introduction to Field Programmable Gate Arrays

A Time-Multiplexed FPGA

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Introduction to Partial Reconfiguration Methodology

Ultra-Fast NoC Emulation on a Single FPGA

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

FPGA architecture and design technology

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Design and Implementation of a FPGA-based Pipelined Microcontroller

Computer Architecture 2/26/01 Lecture #

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

Design methodology for multi processor systems design on regular platforms

FACTFILE: GCE DIGITAL TECHNOLOGY

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

PINE TRAINING ACADEMY

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Chapter 2. FPGA and Dynamic Reconfiguration ...

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Field Programmable Gate Array

INTRODUCTION TO FPGA ARCHITECTURE

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

Note: Closed book no notes or other material allowed, no calculators or other electronic devices.

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

EECS150 - Digital Design Lecture 09 - Parallelism

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES

Jakub Cabal et al. CESNET

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Digital Integrated Circuits

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Dual Port SRAM Based Microcontroller Chip Test Report

Spiral 2-8. Cell Layout

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

Design & Implementation of 64 bit ALU for Instruction Set Architecture & Comparison between Speed/Power Consumption on FPGA.

Introduction to Microprocessor

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

The University of Reduced Instruction Set Computer (MARC)

C8051 Legacy-Speed 8-Bit Processor Core

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

ECE 545 Lecture 12. FPGA Resources. George Mason University

Design Scaled Arm 7- Soft Core Processor with Communication Techniques With Fpga

Reconfigurable PLL for Digital System

Mapping a Pipelined Data Path onto a Network-on-Chip

Fast dynamic and partial reconfiguration Data Path

The Xilinx XC6200 chip, the software tools and the board development tools

FPGA Implementation of MIPS RISC Processor

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Multicycle-Path Challenges in Multi-Synchronous Systems

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Advanced FPGA Design Methodologies with Xilinx Vivado

Design Space Exploration for Memory Subsystems of VLIW Architectures

DESIGN AND IMPLEMENTATION OF APPLICATION SPECIFIC 32-BITALU USING XILINX FPGA

Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology

Introduction Technology Equipment Performance Current developments Conclusions. White Rabbit. A quick introduction. Javier Serrano

Two hours - online EXAM PAPER MUST NOT BE REMOVED FROM THE EXAM ROOM UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Computer Architecture and Organization:

Fast, Accurate and Detailed NoC Simulations

Midterm Exam. Solutions

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Operating System Approaches for Dynamically Reconfigurable Hardware

Digital Design with FPGAs. By Neeraj Kulkarni

Exploiting Dynamically Changing Parallelism with a Reconfigurable Array of Homogeneous Sub-cores (a.k.a. Field Programmable Core Array or FPCA)

Virtex-II Architecture

MICROCONTROLLERS 8051

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas

A Building Block 3D System with Inductive-Coupling Through Chip Interfaces Hiroki Matsutani Keio University, Japan

A Prototype Multithreaded Associative SIMD Processor

Finite State Machines (FSMs) and RAMs and CPUs. COS 116, Spring 2011 Sanjeev Arora

History and Basic Processor Architecture

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Chapter 5 Embedded Soft Core Processors

EECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline

Lecture 7: Introduction to Co-synthesis Algorithms

Integrating MRPSOC with multigrain parallelism for improvement of performance

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

EECS150 - Digital Design Lecture 16 Memory 1

Mobile Robot Path Planning Software and Hardware Implementations

Measuring and Evaluating the Power Consumption and Performance Enhancement on Embedded Multiprocessor Architectures

Multi processor systems with configurable hardware acceleration

International Journal of Informative & Futuristic Research ISSN (Online):

Transcription:

Department of Electrical Engineering Computer Engineering Helmut Schmidt University, Hamburg University of the Federal Armed Forces of Germany Meyer, Haase, Eckert, Klauer Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement Department of Electrical Engineering Computer Engineering Helmut Schmidt University, Hamburg University of the Federal Armed Forces of Germany Meyer, Haase, Eckert, Klauer Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 0 / 15

Table of Contents Table of Contents 1 Motivation Approach Experiments 4 Conclusion 1 Motivation Approach Experiments 4 Conclusion Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 0 / 15

Motivation Partial Runtime Reconfigurable Systems Motivation Partial Runtime Reconfigurable Systems RM0.bit RM0.bit RM01.bit FPGA RM00.bit RM0 static logic RM0.bit RM1 RM1.bit RM1.bit RM11.bit RM10.bit RM0.bit RM01.bit RM00.bit RM0 FPGA Figure: Example partitioning of an FPGA for use with the Xilinx PR design flow[4] static logic Reconfiguration of parts of an FPGA while other parts are still active. RM1 RM1.bit RM1.bit RM11.bit RM10.bit What are PRRS? in normal RS, FPGA is changed in a whole PRRS use different design flow to support... explain figure in this case Partial Reconfiguration from Xilinx Figure: Example partitioning of an FPGA for use with the Xilinx PR design flow[4] Reconfiguration of parts of an FPGA while other parts are still active. explain figure available on modern FPGAs including Xilinx Virtex, Artix, Kintex Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Motivation Example System FPGA - reconfiguration plattform reconfiguration Module Uplink Ethernet/ Uart Motivation Example System FPGA - reconfiguration plattform Uplink Ethernet/ Uart ICAP reconfiguration Module ICAP IOB IOB Downlink Ethernet/ Uart IOB Figure: Multicore Reconfiguration Platform[] IOB Example Multicore Reconfiguration Platform describe figure ceb just rm Downlink Ethernet/ Uart NoC - circuit switched Runtime adaptive multiprocessor system-on-chip (RampSoC) next Figure: Multicore Reconfiguration Platform[] Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 / 15

Motivation RampSoC FPGA Motivation RampSoC FPGA (Type 1) (Type 1) (Type 1) (Type 1) (Type 1) (Type 1) (Type ) (Type 1) Figure: Runtime adaptive multiprocessor system-on-chip (RampSoC)[1] (Type 1) (Type 1) another example system processors and accelerators reconfigurable (Type ) (Type 1) different NoCs, eg. Bus - circuit switched Figure: Runtime adaptive multiprocessor system-on-chip (RampSoC)[1] Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 / 15

Motivation General Motivation General Partial Runtime Reconfigurable System (PRRS) consist of: a static part some partial reconfigurable parts some Network On Chips (NOCs) Partial Runtime Reconfigurable System (PRRS) consist of: a static part some partial reconfigurable parts some Network On Chips (NOCs) in general PRRS consist of... Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 4 / 15

Motivation Problem Motivation Problem Design User Constraints Size: 80% of FPGA Synthesis Placement Routing Optimize for Speed Design Size: 80% of FPGA User Constraints Optimize for Speed Problem is now... LongTime not routable Synthesis Placement Routing feed in design, in our cases 80% of fpga area constraints for speed optimization (static part, end reconfigurable) output: long time for design flow or no result at all LongTime not routable Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 5 / 15

Approach Main Idea Reduce routing times through relaxed user constraints measure signal latencies after configuration place components according to their clock speed requirements FPGA static I/O extern Approach Main Idea Reduce routing times through relaxed user constraints measure signal latencies after configuration place components according to their clock speed requirements RO Component ReRouter Component RM RM FPGA Clk 0 Clk 1 Clk Clk 0 Clk 1 Clk static I/O extern speed up design flow, reduce constraint for reconfigurable part RO Component ReRouter Component obvious: parts of design do not meet required clock speed RM RM not always necessary: components have different requirements place according to or set clock speed according to requirement Clk 0 Clk 1 Clk Clk 0 Clk 1 Clk one component measures, the other reroutes paths simplified image Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 6 / 15

Approach Signal Latency Measurement Approach Signal Latency Measurement adapted Ring Oscillator (RO) approach of Ruffoni and Bogliolo[] RO generates frequency through a path connected as a ring period of the RO is twice the propagation delay of its ring adding a path to the ring extends the propagation delay of the ring T1 = T0 + d propagation delay of added path: dp = (T1 T0) 1 adapted an approch of Ruffoni, used RO to measure path delays explain in short adapted Ring Oscillator (RO) approach of Ruffoni and Bogliolo[] RO generates frequency through a path connected as a ring period of the RO is twice the propagation delay of its ring adding a path to the ring extends the propagation delay of the ring T 1 = T 0 + d propagation delay of added path: d p = (T 1 T 0 ) 1 period is twice the propagation delay Therefore: a path can be added to loop/ring measure two periods d p = (T 1 T 0 ) 1 Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 7 / 15

Approach Measurement Components Measurement Component RO + multiplexer multiplexer adds the measurement path to RO ring counts RO ticks + counts ticks of a 50Mhz clock period in ns: ReRouter Component connects the input to the output 1 T = 1000 #(RO ticks) #(f[mhz] ticks) f[mhz] adaption: component with a RO and a multiplexer multiplexer can add a path to RO loop two counters: RO ticks, ticks of 50Mhz clock calculate period for both periods, base and extended path Approach Measurement Components Measurement Component RO + multiplexer multiplexer adds the measurement path to RO ring counts RO ticks + counts ticks of a 50Mhz clock period in ns: ReRouter Component connects the input to the output 1 T = 1000 #(RO ticks) #(f[mhz] ticks) f[mhz] rerouter simple Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 8 / 15

FPGA - reconfiguration plattform reconfiguration Module ICAP IOB IOB Uplink Ethernet/ Uart Downlink Ethernet/ Uart Experiments Measurement Setup Experiments Measurement Setup Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure FPGA - reconfiguration plattform reconfiguration Module ICAP IOB IOB Uplink Ethernet/ Uart Downlink Ethernet/ Uart Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns if time explain which structure then next part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M0 RR 1 Path Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M0 RR 1 Path Path 1 CSN 0 CSN 1 CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Setup Experiments Measurement Setup M0 RR 1 Path Path 1 CSN 0 CSN 1 Path CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns RR M Component for latency measurement RR ReRouter routes signals back used MRP as measurement environment because of structure if time explain which structure then next M0 RR 1 Path Path 1 CSN 0 CSN 1 Path CSN CSN Path1 δ( Path1) =,6ns Path δ( Path) = 5,61ns Path δ( Path) = 5,7ns Path δ( Path) = 8,5ns Path δ( Path) = 8,4ns part of mrp, switches crossbars, every input to every output next explain confguration of components and measurement all paths RR M RR Component for latency measurement ReRouter routes signals back Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 9 / 15

Experiments Measurement Results Experiments Measurement Results 0-0 0-1 0-0- 1--1 1-1- -0-1 - - -0-1 - - 0-0.6 5.61 5.4 5.07 7.75 8.58 9.61 10.8 8.70.41 9.41 8.5 10. 9.8 9.65 9.65 0-1 5.7.90 7.7 6. 9.8 10.11 11.14 1.5 9.74 11.45 10.45 9.9 11.7 10.85 10.69 11.7 0-5. 7..07 5.8 7.46 8.9 9. 10.5 8.4 9.94 8.94 7.78 9.86 9.5 9.19 9.86 0-5.05 6.19 5.85.1 9.1 9.95 10.98 1.18 8.4 10.14 9.14 7.97 10.06 9.54 9.8 10.05 1-0 7.57 9.6 7.9 8.19 1.8 4.15 5.5 5.46 10.4 1.1 11.1 9.96 9.6 9.1 8.89 9.70 1-1 8,6 10.6 8.65 9.45 4.60.9 1.86 1.91 11.68 1.8 1.8 11. 10. 9.8 9.60.40 1-10.01.80 9.8 10.6 5.50 6.65.15 6.05 1.85 14.56 1.56 1.40.76 10.7 10.04 10.84 1-10.68 1.47 1.47 11.0 5.40 6.48 5.74.7.5 15. 14.4 1.07 10.50.01 9.78 10.58-0 8.86 9.79 8.4 8.60.7 11.51 1.9.9.87 5. 6.04 4.6 9.45 8.78 8. 9.5-1 10.56 11.49 10.1 10.1 1.4 1. 14.65.61 5..01 6.16 5.45 10.04 9.8 8.91 9.85-9.8 10.1 8.95 9.1 11.4 1..4 14.4 5.86 5.99.44 1. 9.08 8.4 7.95 8.89-8.4 9.6 7.90 8.08 10.0.99 1.8 1.8 4.55 5.8 6.07.6 9.8 8.7 8.5 9.19-0.06 10.99 9.6 9.80 9.96 10.1 10.86 10.9 9.50.09 9.1 9.51.4 6.19 6.10 6.0-1 9.46 10.9 9.0 9.1 9.54 9.79 10,4 10.50 8.91 9.50 8.7 8.9 6.6.00 4.67 5.84-8.60 9.5 8.17 8.5 8.9 9.17 9.8 9.88 8.04 8.6 7.85 8.05 5.78 4.8.17 4.67-9.81 10.74 9.8 9.55 10.00.4 10.89 10.96 9.5 9.84 9.06 9.6 5.98 5.7 4.95.70 Table: Propagation delay matrix for all s Configurable Entity Block RM = reconfigurable module 0-0 0-1 0-0- 1--1 1-1- -0-1 - - -0-1 - - 0-0.6 5.61 5.4 5.07 7.75 8.58 9.61 10.8 8.70.41 9.41 8.5 10. 9.8 9.65 9.65 0-1 5.7.90 7.7 6. 9.8 10.11 11.14 1.5 9.74 11.45 10.45 9.9 11.7 10.85 10.69 11.7 0-5. 7..07 5.8 7.46 8.9 9. 10.5 8.4 9.94 8.94 7.78 9.86 9.5 9.19 9.86 0-5.05 6.19 5.85.1 9.1 9.95 10.98 1.18 8.4 10.14 9.14 7.97 10.06 9.54 9.8 10.05 1-0 7.57 9.6 7.9 8.19 1.8 4.15 5.5 5.46 10.4 1.1 11.1 9.96 9.6 9.1 8.89 9.70 1-1 8,6 10.6 8.65 9.45 4.60.9 1.86 1.91 11.68 1.8 1.8 11. 10. 9.8 9.60.40 1-10.01.80 9.8 10.6 5.50 6.65.15 6.05 1.85 14.56 1.56 1.40.76 10.7 10.04 10.84 1-10.68 1.47 1.47 11.0 5.40 6.48 5.74.7.5 15. 14.4 1.07 10.50.01 9.78 10.58-0 8.86 9.79 8.4 8.60.7 11.51 1.9.9.87 5. 6.04 4.6 9.45 8.78 8. 9.5-1 10.56 11.49 10.1 10.1 1.4 1. 14.65.61 5..01 6.16 5.45 10.04 9.8 8.91 9.85-9.8 10.1 8.95 9.1 11.4 1..4 14.4 5.86 5.99.44 1. 9.08 8.4 7.95 8.89-8.4 9.6 7.90 8.08 10.0.99 1.8 1.8 4.55 5.8 6.07.6 9.8 8.7 8.5 9.19-0.06 10.99 9.6 9.80 9.96 10.1 10.86 10.9 9.50.09 9.1 9.51.4 6.19 6.10 6.0-1 9.46 10.9 9.0 9.1 9.54 9.79 10,4 10.50 8.91 9.50 8.7 8.9 6.6.00 4.67 5.84-8.60 9.5 8.17 8.5 8.9 9.17 9.8 9.88 8.04 8.6 7.85 8.05 5.78 4.8.17 4.67-9.81 10.74 9.8 9.55 10.00.4 10.89 10.96 9.5 9.84 9.06 9.6 5.98 5.7 4.95.70 explain table, more if time Table: Propagation delay matrix for all s path to and from a component are different paths => different times diagonal represent measurements from component to switch Configurable Entity Block RM = reconfigurable module Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 10 / 15

Experiments Clockrates Experiments Clockrates CSN- Clks (MHz) Clkc (MHz) Clks Clkc 5 67 1 150 75 16 81 159 79 Table: Maximum clock speeds within one switch clock using sequential cicuits only clock using combinational circuits CSN- Clk s (MHz) Clk c (MHz) 5 67 1 150 75 16 81 159 79 can calculate the maximum clock rate for the network Table: Maximum clock speeds within one switch present values experiments on a xilinx virtex5 fpga most good designs without pr achive 00Mhz Clk s Clk c clock using sequential cicuits only clock using combinational circuits Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 11 / 15

CSN 1...5.9 CSN simple 1...6.0 Fetch Control ALU Decode CSN 0 CSN 1 1...1...0 RegFile CSN CSN 1...4.0 Experiments Example System Experiments Example System Placement of a simple processore core according to the propagation delay matrix. Fetch Ctrl Loads instructions from RAM Control Unit of the processor CSN Fetch Control ALU CSN 0 CSN 1 Placement of a simple processore core according to the propagation delay matrix. RegF RegisterFile Dec Decodes instructions 1...5.9 1...1...0 ALU Arithmetical Logical Unit Decode RegFile wanted to know if it is possible to place a complex component simple processor core (Fetch, Decode, Registerfile, Control, ALU) CSN simple 1...6.0 CSN CSN 1...4.0 Fetch Ctrl Loads instructions from RAM Control Unit of the processor show placement RegF Dec RegisterFile Decodes instructions runs at 5MHz/50MHz ALU Arithmetical Logical Unit no performance measurement because CPU is not optimized for speed components can be placed differently as long requirements meet Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Experiments Floorplan Experiments Floorplan CSN 0 CSN 1 CSN CSN CSN 0 CSN 1 Yellow CSN 0 Red CSN 1 Green CSN lilac CSN light blue used FPGA area CSN CSN if time! explain placement highlight why some path have awkward measurements routing a random process Yellow CSN 0 Red CSN 1 Green CSN lilac CSN light blue used FPGA area Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 1 / 15

Conclusion Conclusion Presented: method to measure path delays between runtime reconfigurable modules propagation delay matrix for the MRP placement of a small processor core according to this matrix Presented: method to measure path delays between runtime reconfigurable modules propagation delay matrix for the MRP placement of a small processor core according to this matrix presented method to measure path delays in PRRS evaluated through creating propagation delay matrix of MRP and placing a simple processor core according to it thank you very much for your attention. Questions? next Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 14 / 15

Questions Questions Questions? Questions? Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 15 / 15

Bibliography D. Göhringer et al. Runtime adaptive multi-processor system-on-chip: RAMPSoC. In: Parallel and Distributed Processing, 008. IPDPS 008. IEEE International Symposium on. Apr. 008, pp. 1 7. Dominik Meyer. Multicore Reconfiguration Platform - A Research and Evaluation FPGA Framework for Runtime Reconfigurable Systems. PhD thesis. Helmut-Schmidt-University Hamburg, Germany, 015. M. Ruffoni and A. Bogliolo. Direct Measures of Path Delays on Commercial FPGA Chips. In: Signal Propagation on Interconnects, 6th IEEE Workshop on. Proceedings. May 00, pp. 157 159. Xilinx, Inc. Partial Reconfiguration User Guide. http://www.xilinx.com/support/documentation/sw_ manuals/xilinx14_7/ug70.pdf. Apr. 01. Bibliography D. Göhringer et al. Runtime adaptive multi-processor system-on-chip: RAMPSoC. In: Parallel and Distributed Processing, 008. IPDPS 008. IEEE International Symposium on. Apr. 008, pp. 1 7. Dominik Meyer. Multicore Reconfiguration Platform - A Research and Evaluation FPGA Framework for Runtime Reconfigurable Systems. PhD thesis. Helmut-Schmidt-University Hamburg, Germany, 015. M. Ruffoni and A. Bogliolo. Direct Measures of Path Delays on Commercial FPGA Chips. In: Signal Propagation on Interconnects, 6th IEEE Workshop on. Proceedings. May 00, pp. 157 159. Xilinx, Inc. Partial Reconfiguration User Guide. http://www.xilinx.com/support/documentation/sw_ manuals/xilinx14_7/ug70.pdf. Apr. 01. Clock Speed Optimization of Runtime Reconfigurable Systems Meyer, Haase, Eckert, Klauer 015 11 1 / IECON015 15 / 15