Development and synthesis of adaptive multi-grained i reconfigurable hardware architecture for dynamic functions patterns (AMURHA)
|
|
- Amie Evans
- 5 years ago
- Views:
Transcription
1 Development and synthesis of adaptive multi-grained i reconfigurable hardware architecture for dynamic functions patterns (AMURHA) Alexander Thomas Institut für Technik der Informationsverarbeitung (ITIV) Universität Karlsruhe (TH) Prof. Dr.-Ing. Klaus Müller-Glaser Prof. Dr.-Ing. Jürgen Becker
2 AMURHA Project Overview Main Goal: Development and implementation of the new reconfigurable array-based hardware architecture the HoneyComb architecture Hardware goals: Hardware exploration integration of adaptive switching circuits Resulting architecture implements a set of new features, like - Adaptive online routing, multi-context data paths, programmable I/O-IF, Highly parametrizable hardware description (RTL) Synthesis and Layout, resulting in a Chip prototype (tape out expected Oct. 2009) Specification of the final system demonstrator Software goals: Programming model for the new architecture - Programming language specification - Compiler design Visualization and Simulation tools Configuration manager / SuperConfigurator Runtime environment (not finished) Applications for architecture demonstration
3 HoneyComb Platform Design Environment Runtime Environment Porting Applications / Algorithms Dynamic Allocation and Distribution ib ti Manager (dadm) Configuration Ctrl Parame eter HCL-Description Offline Mapper und Configuration-Template- Generator (MCTG) Assembler-Code Transformation Rules Super- Configuration- Generator Assembler Configura ation-templat te Executab ble-code RT TL-Configura ation HoneyComb-Architecture Simulation Data Debugging and Simulation Environment HCViewer HCSim Implemented Path Open Path Tools Tools Implemented Modules / Tools Open Implementation
4 HoneyComb Array-based Reconfigurable Architecture Main Features Hexagonal cell structure Three different cell types: - Datapath-HoneyComb-Cell () - Memory-HoneyComb-Cell (MEMHC) - Input/Output-HoneyComb-Cell (IOHC) Multi-grained data types / Multi-context datapath cells Programmable IO-Interfaces (IOHC) Hardware-Supported Online-Routing Fully synchronized communication network Two-clock domains Two-level clock gating Unified cell structure containing Routing Unit - Part of the communication network - Connects FU outputs and inputs within the Array Functional Module - Specifies Cell Type:, MEMHC, IOHC HoneyComb-Architecture IOHC IOHC IOHC MEMHC MEMHC MEMHC MEMHC IOHC IOHC IOHC Routing Unit Functional Module Honeycomb cell structure CG&MG Links Routing Unit CG Links MG Links
5 HoneyComb-Architecture Cell Types Cell Types Defined by functional modules Datapath Cell () - Integrates ALUs, LUTs, CG/FG registers / FIFOs - Coarse-/fine-grained data types - Multi-Context-Features - Register control functions - Highly parametrizable at RTL regarding interconnections, registers, operations, LUT-parameters, etc. Routing Unit Functional Module Memory Cell (MEMHC) - Storage functionality like RAM, FIFO, LIFO - Supporting all data types - Complex FSM programming is possible - Adaptable at RTL (module count / size, interconnect, registers) Honeycomb cell structure Input / Output Cell (IOHC) - System Interface / programmable µcontroller - Configuration-Sequencing - Conditional Control of the Array Datapath-Module Memory-Module
6 HoneyComb-Architecture Hardware supported online routing Routing Unit (RU) Main component of the routing network Each cell integrates a RU Parametrizable at RTL (position, neighbors, CG/MG connects / direction) Instruction based control of the point-2-point-routing Four instructions have been defined CG Routing Instruction (CGRI) MG Routing Instruction (MG1RI, MG2RI, 2 Words) End Packet Instruction (EPI) InReg0 InRegN RU Controller Algorithm: 1) Storing Incoming Instructions within Input Registers (InReg) 2) InRegs forward pending requests to Routing Unit (RU) 3) RU selects next request (round robin) 4) If current cell is the destination -> acknowledge route, continue with 5 Routing otherwise Unit -> calculate new route, continue with 6 5) Establish connection to Functional Module, continue with 1 InReg0 InRegN 6) Establish connection to next cell, continue with 1 Functional Module Honeycomb cell structure
7 HoneyComb-Architecture Hardware supported online routing Routing Network Coordinates-based Depth-search-first-strategy strategy Backtracking-algorithm Routing-performance: 4 Cycles per cell - 3 cycles for getting to next cell - 1 cycle for acknowledgement Establishes point-2-point connection between ports of functional modules Option to force shortest path routing - Optimum-Bit Support for multi-grained data types: - Coarse-grained - Multi-grained: 1 n bits Transports configurations as well as application data 0,1 1,1 Routing w/o obstacles 0,2 1,4 2,6 IOHC IOHC IOHC 1,3 2,5 3,7 3 cycles 1 cycle 1,2 2,4 3,6 4,7 2,3 3,5 3cycles 1cycle 2,2 3,4 4,6 1 cycle 2,1 3,1 MEMHC 4,1 5,1 3,2 4,2 5,2 6,2 62 3,3 4,5 5,7 3 cycles 4,4 1cycle 5,6 3 cycles MEMHC 4,3 MEMHC MEMHC 5,5 6,7 1 cycle 3 cycles 53 5,3 6,3 5,4 6,4 7, ,5 Routing path establishment: 6,1 20 cycles 7,5 Communication latency: 6,6 7,6 8,6 86 7,7 8,7 7,3 8,5 IOHC IOHC IOHC 7,2 5 cycles 8,4 9,6 9,7
8 HoneyComb-Architecture - Hardware supported online routing Routing Network Coordinates-based Depth-search-first-strategy strategy Backtracking-algorithm Routing-performance: 4 Cycles per cell - 3 cycles for getting to next cell - 1 cycle for acknowledgement Establishes point-2-point connection between ports of functional modules Option to force shortest path routing - Optimum-Bit Support for multi-grained data types: - Coarse-grained - Multi-grained: 1 n bits Transports configurations as well as application data Routing w/ obstacle IOHC IOHC IOHC MEMHC MEMHC MEMHC MEMHC Routing path establishment : 24 cycles IOHC IOHC IOHC Communication latency: 6 cycles
9 HoneyComb-Architecture - Hardware supported online routing Optimal path routing Optimal paths map - Shows all possible shortest t paths Decision if direction is optimal - Determined through direction spots Algorithm within each cell: 1. Wait for incoming requests 2. Check all directions 3. Found one: take one with smallest utilization else: go back to previous cell, go to step 1 4. Forward to selected direction and wait for response 5. Positive: acknowledge and reserve the path Negative: continue with step 2 6. Continue with step 1 Routing w/ obstacle and Optimum-Bit IOHC IOHC IOHC MEMHC MEMHC MEMHC MEMHC Possible paths to get the shortest path Routing path establishment: s e t 24 cycles IOHC Communication IOHClatency: IOHC 5 cycles
10 HoneyComb-Architecture - Using Online Routing for RePlacement Routing w/ obstacle and Optimum-Bit Configuration Technique IOHC IOHC IOHC Establish configuration path to target cell Target is specified by X,Y coordinates Transport configuration data to the target Configuration data is position independent Cell configuration must meet configuration data requirements (RTL-compatibility) Online Placement By changing the target coordinates X, Y Hardware establishes configuration path to the new target replacement is done Explicit handling of the data streams is necessary MEMHC MEMHC MEMHC MEMHC Original Placement New Placement Replacement can be done by runtime environment (x, y) IOHC IOHC IOHC (x+ x, y+ y)
11 RTL Configuration Manager Problem Highly parametrizable architecture description (RTL) High count of parameters (: parameters) - Input / output definitions of cells - Data width / granularity - Number of ALUs / LUTs / Registers - Interconnection between Modules - How are we supposed to manage this kind of complexity? Approach Easy representation of parameters in a table: e.g. MS Excel Scripting based consistency checks (Excel VBA) Generation of the complete HoneyComb-Array incl. VHDL and Compiler/Viewer-configuration files HoneyComb-Assembler is part of the application
12 Pre-defined Template List Control Buttons Currently defined Array
13 HoneyComb Architecture Programming HoneyComb Assembler (HCA) Low level programming language Highly RTL-configuration dependant Quite complex / not easy to understand Structural programming HoneyComb Language (HCL) Abstraction from strict structural programming Functional description on cell level - Partitioning is done by programmer - Use of high level constructs, like if-then-else Management of configurations (IOHC) - Conditional/unconditional I/O-control - Configuration sequencing RTL - independent code - Dependency is still selectable by programmer Utilization of the given hardware parallelism Process-based, VHDL-like parallel language
14 HoneyComb-Language (HCL) Functional Process () CELL CounterExample IN Start#1, Stop#1, Range; OUT CounterOut, Finish#1; ALIAS S1 = 1#1, S2 = 0#1; VAR State#1, Counter; INIT SET State = S1; BEGIN State <= State; Finish = 0; IF (State = S1) THEN IF (Start) THEN State <= S2; Counter <= Range; END IF ELSE // State = S2 Counter <= Counter 1; IF (Stop OR ALU(Counter).zf) THEN State <= S1; Finish = 1; END IF; END IF; CounterOut = Counter; END CELL; Definition Part Initial i Part Functional Statements
15 HoneyComb-Language (HCL) Functional Process () CELL CounterExample IN Start#1, Stop#1, Range; OUT CounterOut, Finish#1; ALIAS S1 = 1#1, S2 = 0#1; VAR State#1, Counter; INIT SET State = S1; BEGIN State <= State; Finish = 0; IF (State = S1) THEN IF (Start) THEN State <= S2; Counter <= Range; END IF ELSE // State = S2 Counter <= Counter 1; IF (Stop OR ALU(Counter).zf) THEN State <= S1; Finish = 1; END IF; END IF; CounterOut = Counter; END CELL; Stop V Counter = 0 Finish = 1 S1 ELSE Finish = 0 S2 ELSE Counter = Counter 1 Finish = 0 Startt Counter = Range Finish = 0
16 HoneyComb-Language (HCL) Programming Methodology Applications - Functions / Procedures - Input / Output -DFG/ CFG Start Application Funktion: Function: Datei Read lesen Data Prozedur: Procedure: Daten Data pre aufbereiten processing Function: Funktion: Calculation Berechnungsschleife Loop Procedure: Prozedur: Data Daten post nachbereiten processing Function: Funktion: Write Ergebnis Dataausgeben Exit Manual Partitioning - Break down to single cells - Consider communication - Goal: Cell-Descriptions in HCL Function: Funktion: Calculation Funktion: Loop Berechnungsschleife Funktion: Berechnungsschleife Funktion: Berechnungsschleife Berechnungsschleife proc1 proc4 proc2 proc5 proc3 proc3 proc6 proc6 proc3 proc6 proc6 Definition of Scheduling Sub-Configurations - Configuration Sequencing - Process/Cell instantiations - Load/Delete of Sub-Cfgs - Optional location predefinition - Conditional flow control - Interconnection description: - Parallel/sequential - inter/intra subconfiguration execution of Sub-Cfgs - Reuse of predefined sub-cfgs - Task of the main-processes libraries are imaginable Sub-Configuration: Application Configuration Calculation Loop proc1 proc2 proc4 proc3 proc5 proc6 Similar procedure for remaining functions l parallel SubC Cfg 1 Sub bcfg 2 SubCfg 3 t 1 t 2 t 3 sequential bcfg 4 Su ubcfg 5 Su t
17 HoneyComb Architecture Design Flow for the specification of the RTL-Configuration Template Library Excel Generator HoneyComb RTL template generation Cells contain all user-predefined elements Reference applications HCL HCL Application HCL Application HCL Application HCL HCL Application A Application B Application C D E F Reference configurations RTL Template MCTG Applications Cfg AD EB FC Initial point: Overloaded RTL-configuration with all allowed elements Compilation of the chosen applications RTL template is used as target Result: Set of RTL dependant configurations for best-possible application execution SuperCfg Generator SuperCfg Generator creates a superset for the given RTL descriptions and reduces the template Super Super RTL Configuration Result: Reduced HoneyComb-ConfigurationConfiguration Further iterative steps are possible
18 Extraction of the Super-Configurations Eased Presentation Create an empty cell template for the generation Application dependant d analysis of the given operations, source and target t units for each unit Incremental adding of the required resources to the current unit Quit, if all applications are satisfied Application A Application B a b c b Resulted RTL Configuration a c b + +, z y z y
19 Extraction of the Super-Configurations Homogeneous Arrays Mapping of all the application cells on one single cell: Homogeneous Array-Configuration Application A Application B Cell with all required characteristics HoneyComb-Architecture IOHC IOHC IOHC IOHC IOHC IOHC Advantage: Disadvantage: Highest flexibility for the runtime mapping Simplified application development Considering application structure - non-optimal utilization in the peripheral area
20 Application examples (1) 1024-point FFT Radix-2-Butterfly-Implementation Precision: Fixpoint Interleaver Single butterfly version requires 5 cells - Butterfly : 1 Cell (8 ALUs, 1 LUT), 2 cycles / operation - Controller: 2 Cells (11 ALUs, 7 LUTs) - Interleaver: 1 Cell (4 ALUs, 2 LUTs) - Memory: 1 Cell (4 HCMEMs, 4x4 Kbytes) Performance: 2 cycles / operation, 5120 butterfly operations = cycles / operation + Store / Load time of 1024 cycles Wavelet Transformation Frequency filter for JPEG2000 Works on the whole image Single Wavelet Filter Implementation: ti - High Pass Filter: 1 Cell (5 ALUs, 2 LUTs) - Low Pass Filter: 1 Cell (6 ALUs, 2 LUTs) Performance: 1 pixel / cycle Easy performance increase through parallel execution Controller Memory FFT-1024 Butterfly
21 Application examples (1) High-pass Filter 1024-point FFT Radix-2-Butterfly-Implementation Precision: Fixpoint Memory Single butterfly version requires 5 cells - Butterfly : 1 Cell (8 ALUs, 1 LUT), 2 cycles / operation - Controller: 2 Cells (11 ALUs, 7 LUTs) - Interleaver: 1 Cell (4 ALUs, 2 LUTs) - Memory: 1 Cell (4 HCMEMs, 4x4 Kbytes) Performance: 2 cycles / operation, 5120 butterfly operations = cycles / operation + Store / Load time of 1024 cycles Wavelet Transformation Frequency filter for JPEG2000 Works on the whole image Single Wavelet Filter Implementation: ti - High-pass Filter: 1 Cell (5 ALUs, 2 LUTs) - Low-pass Filter: 1 Cell (6 ALUs, 2 LUTs) Performance: 1 pixel / cycle Wavelet 3x Easy performance increase through parallel execution Low-pass Filter
22 Application examples (2) AES-256 Advanced Encryption Standard Block-based algorithm HC Implementation processes 4 bytes at once in each functional block Requires 13 cells (complete prototype size) - 11 s: 69 ALUs, 7 LUTs - 2 MEMHCs: 16 x 4 kbytes Performance: 25,6 MB/s encryption speed AES-256 imdct Application: MP3/OggVorbis Decoder Used recursive approach due to Nokolajevic/Fettweis i Single finger version requires 4 cells - Finger: 1 Cell (7 ALUs, 1 LUT) - Controller: 1 Cell (4 ALUs, 6 LUTs) - Interleaver: 1 Cell (5 ALUs) - Memory: 1 Cell (8 HCMEMs, 8x 4 kbytes) Performance by using OggVorbis specification transformation ti and one finger - 47,6 blocks / sec, 1 block = 2048 samples, 43 blocks / sec are required
23 Application examples (2) AES-256 imdct Advanced Encryption Standard Block-based algorithm HC Implementation processes 4 bytes at once in each functional block Requires 13 cells (complete prototype size) - 11 s: 69 ALUs, 7 LUTs - 2 MEMHCs: 16 x 4 kbytes Performance: 25,6 MB/s encryption speed Controller Application: MP3/OggVorbis Decoder Used recursive approach due to Nokolajevic/Fettweis i Single finger version requires 4 cells - Finger: 1 Cell (7 ALUs, 1 LUT) - Controller: 1 Cell (4 ALUs, 6 LUTs) - Multiplexing: 1 Cell (5 ALUs) - Memory: 1 Cell (8 HCMEMs, 8x 4 kbytes) Performance by using OggVorbis specification and one finger - 47,6 blocks / sec, 1 block = 2048 samples, 43 blocks / sec are required Multiplexing Low Pass Filter Memory imdct
24 Prototype Configuration - Synthesis RTL Configuration generated based on the given Application application results set (AES, (@ 100 imdct, MHz) FFT, Wavelet) - 11 s - 2 MEMHCs - 2 IOHCs Additional functionality will be added if some area can be spared during the Layout process Synthesis Performed by using Synopsys Design Compiler Target Technology: TSMC 90nm standard cell technology Maximum possible frequencies: - IOHCs: up to 400 MHz - Array: around 166 MHz Application MEMHC Configuration time Performance AES µs 25.6 MB / sec IOHC IOHC imdct (1,0) µs 47.6 blocks / sec(3,1) MEMHC FFT µs blocks / sec (0,0) (2,1) Wavelet 3x µs 0.6 cycles / pixel (1,1) 1) (3,2) Synthesis results (TSMC 90 nm standard cell technology) Area (mm2) Power(mW) (0,1) Application (2,2) MEMHC IOHC Static Dynamic AES (1,2) (3,3) imdct MEMHC (4,2) (4,3) (0,2) (2,3), (4,4) 4) FFT ,02 Wavelet 3x ,84 ASIC Prototype
25 Prototype Configuration - Layout
26 Prototype Configuration and PCB Integration Multi-Chip-Approach DDR SDRAM Due to limited budget and limited available area Technology: 90nm TSMC standard cell Cell area: mm² () FPGA On-Die HoneyComb-Prototype Maximum available area: 16 mm² Current area of the array: 11.5 mm² Host-System-Implementation on FPGA HoneyComb Controller Additional device to control HC-Array SoC Interface MEMHC IOHC IOHC MEMHC Receives information from - IOHCs pipeline modules - IOHCs FIFO states t / activity it Flexible peripheral interfaces HC Controller - Routing units status - Controls - IOHC pipelines (starting, resetting, ) - Routing Units (disable if faulty) RS232 2 SPI I²C USB2.0 Ethernet t SATA A VGA PCB-Level Integration
27 HoneyComb architecture Discussion and Perspectives Advantages Runtime adaptive routing technique Hexagonal Cell Shape Programmable I/O-IF Multi-context / multi-grained functions Hardware template characterization Array-based approach: bandwidth advantage Local memories Clock gating Lower frequencies Disadvantages Synchronization protocol - Additional hardware overhead - Application specialization Adapted array is highly application dependant Array-based approach: no real DMA available von-neumann flexibility is practically gone Programming is a hard piece of work - Structural programming is harder than it seems Future Work / Improvements Optimization of the multiplexing structures (GFTs, Crossbar Networks, over 50% possible savings) Away from synchronized networks (architecture generalization) Adding debugging functions (currently very rudimental) Runtime environment the only way to exploit all given features C/C++ Compiler development
28 Thank you for your attention
CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP
133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located
More informationChapter 5: ASICs Vs. PLDs
Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.
More informationDesign Space Exploration for Memory Subsystems of VLIW Architectures
E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System
More informationDesign and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA
Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,
More informationFPGA design with National Instuments
FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationSoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik
SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More informationToday. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses
Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single
More informationFPGAs: High Assurance through Model Based Design
FPGAs: High Assurance through Based Design AADL Workshop 24 January 2007 9:30 10:00 Yves LaCerte Rockwell Collins Advanced Technology Center 400 Collins Road N.E. Cedar Rapids, IA 52498 ylacerte@rockwellcollins.cm
More informationMulti processor systems with configurable hardware acceleration
Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationUniversal Serial Bus Host Interface on an FPGA
Universal Serial Bus Host Interface on an FPGA Application Note For many years, designers have yearned for a general-purpose, high-performance serial communication protocol. The RS-232 and its derivatives
More informationFast dynamic and partial reconfiguration Data Path
Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationChapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics
Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationFPGA for Software Engineers
FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationCONTACT: ,
S.N0 Project Title Year of publication of IEEE base paper 1 Design of a high security Sha-3 keccak algorithm 2012 2 Error correcting unordered codes for asynchronous communication 2012 3 Low power multipliers
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationPerformance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path
Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More information08 - Address Generator Unit (AGU)
October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationDesign of Embedded Hardware and Firmware
Design of Embedded Hardware and Firmware Introduction on "System On Programmable Chip" NIOS II Avalon Bus - DMA Andres Upegui Laboratoire de Systèmes Numériques hepia/hes-so Geneva, Switzerland Embedded
More informationThe Design of MCU's Communication Interface
X International Symposium on Industrial Electronics INDEL 2014, Banja Luka, November 0608, 2014 The Design of MCU's Communication Interface Borisav Jovanović, Dejan Mirković and Milunka Damnjanović University
More informationIntroduction to reconfigurable systems
Introduction to reconfigurable systems Reconfigurable system (RS)= any system whose sub-system configurations can be changed or modified after fabrication Reconfigurable computing (RC) is commonly used
More informationWhat is Xilinx Design Language?
Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used
More informationCS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure
CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure Overview In order to complete the datapath for your insert-name-here machine, the register file and ALU that you designed in checkpoint
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationInterfacing a High Speed Crypto Accelerator to an Embedded CPU
Interfacing a High Speed Crypto Accelerator to an Embedded CPU Alireza Hodjat ahodjat @ee.ucla.edu Electrical Engineering Department University of California, Los Angeles Ingrid Verbauwhede ingrid @ee.ucla.edu
More informationSPARK: A Parallelizing High-Level Synthesis Framework
SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark
More informationMapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience
Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware
More informationFast Flexible FPGA-Tuned Networks-on-Chip
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe
More informationDesign of Digital Circuits
Design of Digital Circuits Lecture 3: Introduction to the Labs and FPGAs Prof. Onur Mutlu (Lecture by Hasan Hassan) ETH Zurich Spring 2018 1 March 2018 1 Lab Sessions Where? HG E 19, HG E 26.1, HG E 26.3,
More informationLogiCORE IP AXI DataMover v3.00a
LogiCORE IP AXI DataMover v3.00a Product Guide Table of Contents SECTION I: SUMMARY IP Facts Chapter 1: Overview Operating System Requirements..................................................... 7 Feature
More informationCoarse Grain Reconfigurable Arrays are Signal Processing Engines!
Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher
More informationRISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER
RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER Miss. Sushma kumari IES COLLEGE OF ENGINEERING, BHOPAL MADHYA PRADESH Mr. Ashish Raghuwanshi(Assist. Prof.) IES COLLEGE OF ENGINEERING, BHOPAL
More informationCS/EE Prerequsites. Hardware Infrastructure. Class Goal CS/EE Computer Design Lab. Computer Design Lab Fall 2010
CS/EE 3710 Computer Design Lab Fall 2010 CS/EE 3710 Computer Design Lab T Th 3:40pm-5:00pm Lectures in WEB 110, Labs in MEB 3133 (DSL) Instructor: Erik Brunvand MEB 3142 Office Hours: After class, when
More informationCS/EE Computer Design Lab Fall 2010 CS/EE T Th 3:40pm-5:00pm Lectures in WEB 110, Labs in MEB 3133 (DSL) Instructor: Erik Brunvand
CS/EE 3710 Computer Design Lab Fall 2010 CS/EE 3710 Computer Design Lab T Th 3:40pm-5:00pm Lectures in WEB 110, Labs in MEB 3133 (DSL) Instructor: Erik Brunvand MEB 3142 Office Hours: After class, when
More informationAutomated RTR Temporal Partitioning for Reconfigurable Embedded Real-Time System Design
Automated RTR Temporal Partitioning for Reconfigurable Embedded Real-Time System Design C. Tanougast, Y. Berviller, P. Brunet and S. Weber L. I. E. N. Laboratoire d Instrumentation Electronique de Nancy
More informationGeneration of Multigrid-based Numerical Solvers for FPGA Accelerators
Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationA Novel Design Framework for the Design of Reconfigurable Systems based on NoCs
Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationDIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING
1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation
More informationFPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP
FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College
More informationChapter 5 Embedded Soft Core Processors
Embedded Soft Core Processors Coarse Grained Architecture. The programmable gate array (PGA) has provided the opportunity for the design and implementation of a soft core processor in embedded design.
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationSection III. Transport and Communication
Section III. Transport and Communication This section describes communication and transport peripherals provided for SOPC Builder systems. This section includes the following chapters: Chapter 16, SPI
More informationECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego
Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies
More informationCost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo
Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Low-Power Capacity- based Measurement Application on Xilinx FPGAs Abstract The application of Field Programmable
More informationSection 6. Memory Components Chapter 5.7, 5.8 Physical Implementations Chapter 7 Programmable Processors Chapter 8
Section 6 Memory Components Chapter 5.7, 5.8 Physical Implementations Chapter 7 Programmable Processors Chapter 8 Types of memory Two major types of memory Volatile When power to the device is removed
More informationReconfigurable Computing. Design and Implementation. Chapter 4.1
Design and Implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design In System Integration System Integration Rapid Prototyping Reconfigurable devices (RD) are usually
More informationRECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA
RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA 1 HESHAM ALOBAISI, 2 SAIM MOHAMMED, 3 MOHAMMAD AWEDH 1,2,3 Department of Electrical and Computer Engineering, King Abdulaziz University
More informationPlace Your Logo Here. K. Charles Janac
Place Your Logo Here K. Charles Janac President and CEO Arteris is the Leading Network on Chip IP Provider Multiple Traffic Classes Low Low cost cost Control Control CPU DSP DMA Multiple Interconnect Types
More information100M Gate Designs in FPGAs
100M Gate Designs in FPGAs Fact or Fiction? NMI FPGA Network 11 th October 2016 Jonathan Meadowcroft, Cadence Design Systems Why in the world, would I do that? ASIC replacement? Probably not! Cost prohibitive
More informationECE 448 Lecture 15. Overview of Embedded SoC Systems
ECE 448 Lecture 15 Overview of Embedded SoC Systems ECE 448 FPGA and ASIC Design with VHDL George Mason University Required Reading P. Chu, FPGA Prototyping by VHDL Examples Chapter 8, Overview of Embedded
More informationA Hardware Filesystem Implementation for High-Speed Secondary Storage
A Hardware Filesystem Implementation for High-Speed Secondary Storage Dr.Ashwin A. Mendon, Dr.Ron Sass Electrical & Computer Engineering Department University of North Carolina at Charlotte Presented by:
More informationReconfigurable Computing. On-line communication strategies. Chapter 7
On-line communication strategies Chapter 7 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design On-line connection - Motivation Routing-conscious temporal placement algorithms consider
More informationRe-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
More informationNISC Application and Advantages
NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical
More informationASIC Design of Shared Vector Accelerators for Multicore Processors
26 th International Symposium on Computer Architecture and High Performance Computing 2014 ASIC Design of Shared Vector Accelerators for Multicore Processors Spiridon F. Beldianu & Sotirios G. Ziavras
More informationIntelop. *As new IP blocks become available, please contact the factory for the latest updated info.
A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationSystem Debugging Tools Overview
9 QII53027 Subscribe About Altera System Debugging Tools The Altera system debugging tools help you verify your FPGA designs. As your product requirements continue to increase in complexity, the time you
More informationArchitecture of Computers and Parallel Systems Part 2: Communication with Devices
Architecture of Computers and Parallel Systems Part 2: Communication with Devices Ing. Petr Olivka petr.olivka@vsb.cz Department of Computer Science FEI VSB-TUO Architecture of Computers and Parallel Systems
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More information8. Migrating Stratix II Device Resources to HardCopy II Devices
8. Migrating Stratix II Device Resources to HardCopy II Devices H51024-1.3 Introduction Altera HardCopy II devices and Stratix II devices are both manufactured on a 1.2-V, 90-nm process technology and
More informationEE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination
1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question
More informationAscenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005
Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:
More informationPS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor
PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor K.Rani Rudramma 1, B.Murali Krihna 2 1 Assosiate Professor,Dept of E.C.E, Lakireddy Bali Reddy Engineering College, Mylavaram
More informationSEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010
SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single
More informationComputer Architecture: Dataflow/Systolic Arrays
Data Flow Computer Architecture: Dataflow/Systolic Arrays he models we have examined all assumed Instructions are fetched and retired in sequential, control flow order his is part of the Von-Neumann model
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationReNoC: A Network-on-Chip Architecture with Reconfigurable Topology
1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic
More informationDigital Systems Design. System on a Programmable Chip
Digital Systems Design Introduction to System on a Programmable Chip Dr. D. J. Jackson Lecture 11-1 System on a Programmable Chip Generally involves utilization of a large FPGA Large number of logic elements
More informationEMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES
EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES Pong P. Chu Cleveland State University A JOHN WILEY & SONS, INC., PUBLICATION PREFACE An SoC (system on a chip) integrates a processor, memory
More informationEmbedded Systems. "System On Programmable Chip" NIOS II Avalon Bus. René Beuchat. Laboratoire d'architecture des Processeurs.
Embedded Systems "System On Programmable Chip" NIOS II Avalon Bus René Beuchat Laboratoire d'architecture des Processeurs rene.beuchat@epfl.ch 3 Embedded system on Altera FPGA Goal : To understand the
More informationComputer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01
Adapting Systems by Evolving Hardware Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Jim Torresen Group Department of Informatics University of Oslo, Norway E-mail: jimtoer@ifi.uio.no
More informationRapidly Developing Embedded Systems Using Configurable Processors
Class 413 Rapidly Developing Embedded Systems Using Configurable Processors Steven Knapp (sknapp@triscend.com) (Booth 160) Triscend Corporation www.triscend.com Copyright 1998-99, Triscend Corporation.
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationReconfigurable Computing. Design and implementation. Chapter 4.1
Reconfigurable Computing Design and implementation Chapter 4.1 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software Software-Co-Design Reconfigurable Computing In System Integration Reconfigurable
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationBuses. Maurizio Palesi. Maurizio Palesi 1
Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationBibliography. Measuring Software Reuse, Jeffrey S. Poulin, Addison-Wesley, Practical Software Reuse, Donald J. Reifer, Wiley, 1997.
Bibliography Books on software reuse: 1. 2. Measuring Software Reuse, Jeffrey S. Poulin, Addison-Wesley, 1997. Practical Software Reuse, Donald J. Reifer, Wiley, 1997. Formal specification and verification:
More informationAgenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs
New Directions in Programming FPGAs for DSP Dr. Jim Hwang Xilinx, Inc. Agenda Introduction FPGA DSP platforms Design challenges New programming models for FPGAs System Generator Getting your math into
More informationHigh Performance Embedded Applications. Raja Pillai Applications Engineering Specialist
High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationDataflow Architectures. Karin Strauss
Dataflow Architectures Karin Strauss Introduction Dataflow machines: programmable computers with hardware optimized for fine grain data-driven parallel computation fine grain: at the instruction granularity
More informationMOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden
High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling
More informationQuality-of-Service for a High-Radix Switch
Quality-of-Service for a High-Radix Switch Nilmini Abeyratne, Supreet Jeloka, Yiping Kang, David Blaauw, Ronald G. Dreslinski, Reetuparna Das, and Trevor Mudge University of Michigan 51 st DAC 06/05/2014
More information