Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process

Size: px
Start display at page:

Download "Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process"

Transcription

1 Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore, Jairam Ramanathan, Cory Myers {john.c.aino, ken.smith, paul.d.fiore, Fourth Biennial Ptolemy Miniconference March 200 Objective/Approach/Process Reconfigurable computing technology offers significant performance gains, e.g. 0X ops per watt and/or ops per cubic inch, over general purpose programmable solutions without the need to develop custom hardware. Today however, development of a working implementation requires hardware design expertise and generation of a good implementation requires many slow iterations between an algorithm designer and a hardware developer. Objective - reduce the design time for an initial implementation to hours and for an optimied implementation to days, for a range of signal processing applications Approach - provide the algorithm developer with tools to help analye algorithms, understand their implications for hardware, and rapidly implement their chosen solutions In the process, isolate the algorithm developer from the hardware designer through a set of library elements that provide well-defined interfaces to both communities Direct mapping of algorithm to adaptive computing system implementation. Automatic Implementation 03/07/0 Page - 2

2 Technical Attributes Development of Adaptive Computing Systems domain under Ptolemy Classic Allows alternative implementations from same dataflow graph Provides floating point simulation, fixed point simulation, C code generation and VHDL code generation Released first three versions of ACS domain in Ptolemy Classic End-to-end capability to map signal processing dataflow graph to working reconfigurable computing implementation Design space exploration automated Bit width optimiation theory (Markovian modeling) developed for algorithm analysis Bit width optimiation tool implemented to trade signal to noise ratio versus hardware complexity Pipeline alignment and scheduling algorithms implemented Automatically generate algorithm-specific sequencer and memory control logic Uni-rate and multi-rate signal processing Single and multi-fpga implementations Smart Generators- parameteriable algorithmic blocks 03/07/0 Page - 3 Analysis and Mapping in ACS Environment Dataflow Graph Bit Width Analysis Noise Distribution Analysis Precision Analysis Floating Point Simulation Fixed Point Simulation Algorithm Analysis Algorithm Rearrangement Alternative Implementations SNR analysis Alternative implementations Functional approximations Dataflow Graph Common Database in Ptolemy Automatic Scheduling Performance Metrics Performance Modeling Partitioning and Mapping Algorithm Mapping Timing and siing estimation Scheduling Partitioning across multiple FPGAs Allocated Functions Generator Selection Smart Generators Device program Interface program Device Programming VHDL Interface Libraries Adaptive Computing Resource 03/07/0 Page - 4

3 Algorithm Analysis Bit Width Analysis Noise Distribution Analysis Precision Analysis Algorithm Mapping Automatic Scheduling Performance Metrics Smart Generators Design Time Performance Modeling Partitioning & Mapping Allocated Functions Design Approach Dataflow Graph Floating Point Simulation Algorithm Rearrangement Fixed Point Simulation Alternative Implementations Signal Flow Graph Generator Selection Common Database in Ptolemy Signal Processing Algorithm Represent in Dataflow Ptolemy Environment Analysis and Simulation Hardware Configuration Library Application Interface Generation Run-Time Manager Operating System Device Driver Reconfigurable Hardware Run Time Application Software Compute Libraries Host Processor VHDL Interface Libraries Logic Generation Floorplanner Routing Device Program Legend Enhanced or New Capability Existing Tool or Hardware 03/07/0 Page - 5 Program Progress Algorithm analysis Representation for alternative implementations was incorporated as part of Ptolemy integration Side-by-side simulation capability incorporated as part of Ptolemy integration Developed bit width optimiation theory for algorithm analysis and extended to include multiple devices and constraints Implemented wordlength optimiation tool Algorithm mapping Cost analysis included as part of wordlength analysis Implemented uni-rate and multi-rate pipeline alignment and scheduling algorithm for signal processing dataflow graphs One-to-one and one-to-many mapping of functions to blocks supported 03/07/0 Page - 6

4 Program Progress Smart generators Implemented portable logic synthesis methodology with VHDL as first target Integrated Xilinx Core 4,000-series generators capability within VHDL code generation Implemented smart generators for state machine sequencer and memory control (address generator) Ptolemy integration Released initial version of Adaptive Computing Systems domain in Ptolemy in June 998. Second release April 999. Third release in August ACS domain supports alternative implementations from a common interface. Floating point simulation, fixed point simulation, C code generation, and VHDL code generation. Demonstration Selected Annapolis Micro Systems Wildforce TM board for demonstrations Established ACS demonstration environment for Solaris Integrated Wildforce TM board under Ptolemy Demonstrated Winograd-based FSIC receiver and FFT-based signal detector Procured and installed Annapolis Micro systems Wildstar TM board under Solaris SHARP/HRR (High Range Resolution Radar ATR) algorithm modeled - hardware development & testing nearing completion 03/07/0 Page - 7 ACS Domain Determined that extending old domains could not be justified New paradigm for Ptolemy, e.g. multiple implementations of a single star 03/07/0 Page - 8

5 ACS Domain New ACS domain to facilitate movement among simulation and code/design generation Corona contains interface specification Core contains an implementation ACS Stars are composed of one corona and multiple cores Core selection via targeting defines implementation Corona Core Targets Corona Floating_Point Simulation Core Fixed_Point Simulation Core C Code Generation Core FPGA Design Generation Core 03/07/0 Page - 9 Selecting Among Alternative Implementations Alternative implementations are represented as targets with cores for each star/functional block Targets can have parameters Floating point simulation, fixed point simulation, C code generation, and FPGA design generation are available. 03/07/0 Page - 0

6 Yn+=a0 Xn++a Xn +a2xn- P() Loop Filter Quanti e Algorithm Analysis I Q Angle N DELAYS Algorithm Analysis I Q Angle N/3 DELAYS 2N/3 DELAYS Multiple Representations N Mults N Adds Freq. Est. Scaling 7 Adds Freq. Est. Basic FIR Systolic Perform Trade-Offs Precision (float vs. fixed, wordlengths) Speed Sie/Area Latency FA FA FA FA Bit Serial Coeffs Data Low Power Acc Acc2 Acc3 Y n =a 0 X n +a X n- +a 2 X n-2 Y n+2 =a 0 X n+2 +a X n+ +a 2 X n FA FA Reduce Taplength Reduce Wordlength Multirate Implementation Trades 03/07/0 Page - Wordlength Optimiation Analysis Dynamic Range Optimal Design Choices Quantiation Noise (SNR) Hardware Cost 03/07/0 Page - 2

7 Algorithm Mapping Objectives Performance Modeling provide feedback on utiliation, throughput, efficiency, etc. Feedback should be used by algorithm analysis capabilities. Partitioning and Mapping break large dataflow graphs into groups and map those groups across multiple devices and across time Automatic Scheduling automatically determine firing sequence, optimal mappings and sequence of configurations Progress Cost analysis included as part of wordlength analysis Implemented uni-rate and multi-rate pipeline alignment and scheduling algorithms Memory allocation support Signal Flow Graph Performance Modeling Common Database in Ptolemy Automatic Scheduling Performance Metrics Partitioning and Mapping Allocated Functions 03/07/0 Page - 3 Automatic Scheduling Input PORT N A N2 B C N3 I2 P=2 I3 P= I4 P= I = Instance N=Node P=Pipeline Delays N6 N8 N7 Pipeline alignment and schedule determination required for logic synthesis I5 P= I6 P= N4 D N5 E PORT2 MEM ADDED TO NETLIST BY SEQUENCER GENERATOR MODIFIED ALGORITHM DATAFLOW GRAPH Output LDEN N N6 I2 N4 LDEN2 P=2 I5 2-MUX N2 N7 P= I3 MEM2 DELAYN9 LDEN2 P= N5 I7 I6 SEL N3 P= P= I4 N8 P= THE ALGORITHM DATAFLOW GRAPH RAM BANK A B C FPGA DATAPATH AND VARIABLE LOCATIONS RAM BANK 2 D E Node Activation Sequence N N2 N3 N4 N5 N6 N7 N8 N9 SEL LD LD2 LD3 PORT PORT2 FINAL ALGORITHM SCHEDULE 03/07/0 Page - 4

8 Processing Model Well-matched to Ptolemy Synchronous Dataflow (SDF) Domain Unit or block token produce and consume amounts Netlist structure determines execution order constraints Pipeline delay information required to determine absolute timing Delays are set to align pipelines for maximum throughput Delay can be automatically determined from block parameters Combination of fully synchronous model and tagged synchronous models No handshaking or tags but data is not always valid Data validity is implicit in timing of latch signals Memory access fits same model Data from common memory demuxed into separate streams running at lower rate Data to common memory multiplexed to a single port Multiple FPGAs introduce additional pipeline delays Multi-rate parameteried execution 03/07/0 Page - 5 Smart Generators Objectives Parameteried libraries generate node implementations for specified bit widths and parameter values Hierarchical representations provide generators that can recursively call other generators Interface generation automatically generate software to move data between generalpurpose processor and reconfigurable platform and to manage sequences of configurations General synthesis provide device independent representation of implementation Progress Implemented portable logic synthesis methodology with VHDL as first target Integrated Xilinx Core Generators (4000 Series) capability within VHDL code generation Implemented smart generators for state machine and memory control Hierarchical generation Allocated Functions Common Database in Ptolemy Generator Selection VHDL Interface Libraries Device Programming Adaptive Computing Resource 03/07/0 Page - 6

9 Multi-FPGA Capability Design Generation for Single or Multiple FPGAs Single FPGA Implementation FPGA Logic Multi-FPGA Implementation FPGA Logic FPGA 3 Logic FPGA Routing FPGA 2 Routing FPGA 3 Routing FPGA 4 Routing 03/07/0 Page - 7 Winograd DFT-Based FSK Communications Receiver FPGA Implementation 03/07/0 Page - 8

10 Results from FPGA-target / Back-end Tools Generated VHDL Generated Schedule FPGA Design 03/07/0 Page - 9 Hardware-in-the-Loop SDF Galaxy SDF Wildforce TM Star executes complete FPGA design in hardware on Annapolis Wildforce FPGA board 03/07/0 Page - 20

11 Processing Results 03/07/0 Page - 2 SHARP*/HRR Algorithm Test Vector Template Vector (one per target, per aimuth, per elevation) Non- Linearity Shift Least Squares Fit Modeling Error Can be done with correlation if templates are suitably pre-processed Algorithm Given test vector For each template For each shift Compute least squares error Select template with minimum error * System-oriented High Range Resolution (HRR) Automatic Recognition Program Complexity 70 data points per vector Number of shifts = (in range) Number templates = 3,600/class 86 sec/class for shifts on a Sun Ultra 5 (360 MHZ) workstation Expect 30x improvement 03/07/0 Page - 22

12 SHARP/HRR Algorithm NORMALIZATION CORRELATION Schedule FPGA Design Schedule FPGA Design Normaliation Results ( vs. SW) Correlation Results Across Range Shifts (typical expected) 03/07/0 Page - 23 ACS Tools - Facts and Figures 23 Functional Blocks (ACS stars) developed ~00 lines of code needed for new block/star Two ACS Architectures supported Wildforce TM (4062XL) Wildstar TM (XCV000) (in progress) ~6,000 lines of C++ code developed ~5 min to generate VHDL for five FPGA design Explore ~000 bit-width combinations in minute Ptolemy Classic runs under Solaris and Linux 03/07/0 Page - 24

Algorithm Analysis and Mapping Environment for Adaptive Computing Systems. Statement of the Problem

Algorithm Analysis and Mapping Environment for Adaptive Computing Systems. Statement of the Problem Algorithm Analysis and Mapping Environment for Adaptive Computing Systems Eric Pauer, Cory Myers, Ken Smith, and Paul Fiore {pauer,cory,jmsmith,pfiore}@sanders.com Sanders, a Lockheed Martin Company Nashua,

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design Catapult C Synthesis High Level Synthesis Webinar Stuart Clubb Technical Marketing Engineer April 2009 Agenda How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware

More information

FFT/IFFTProcessor IP Core Datasheet

FFT/IFFTProcessor IP Core Datasheet System-on-Chip engineering FFT/IFFTProcessor IP Core Datasheet - Released - Core:120801 Doc: 130107 This page has been intentionally left blank ii Copyright reminder Copyright c 2012 by System-on-Chip

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC Chun Te Ewe, Peter Y. K. Cheung and George A. Constantinides Department of Electrical & Electronic Engineering,

More information

User Manual for FC100

User Manual for FC100 Sundance Multiprocessor Technology Limited User Manual Form : QCF42 Date : 6 July 2006 Unit / Module Description: IEEE-754 Floating-point FPGA IP Core Unit / Module Number: FC100 Document Issue Number:

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (FFT_PIPE) Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com Core Facts Documentation

More information

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays Kris Gaj and Pawel Chodowiec Electrical and Computer Engineering George Mason University Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable

More information

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining

Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining Pawel Chodowiec, Po Khuon, Kris Gaj Electrical and Computer Engineering George Mason University Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining http://ece.gmu.edu/crypto-text.htm

More information

Cache Aware Optimization of Stream Programs

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Vivado HLx Design Entry. June 2016

Vivado HLx Design Entry. June 2016 Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page

More information

Wordlength Optimization

Wordlength Optimization EE216B: VLSI Signal Processing Wordlength Optimization Prof. Dejan Marković ee216b@gmail.com Number Systems: Algebraic Algebraic Number e.g. a = + b [1] High level abstraction Infinite precision Often

More information

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Master s Thesis Pawel Chodowiec MS CpE Candidate, ECE George Mason University Advisor: Dr. Kris Gaj, ECE George

More information

Quixilica Floating-Point QR Processor Core

Quixilica Floating-Point QR Processor Core Data sheet Quixilica Floating-Point QR Processor Core With 13 processors on XC2V6000-5 - 20 GFlop/s at 100MHz With 10 processors on XC2V6000-5 - 15 GFlop/s at 97MHz With 4 processors on XC2V3000-5 - 81

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Part 2: Principles for a System-Level Design Methodology

Part 2: Principles for a System-Level Design Methodology Part 2: Principles for a System-Level Design Methodology Separation of Concerns: Function versus Architecture Platform-based Design 1 Design Effort vs. System Design Value Function Level of Abstraction

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (ULFFT) November 3, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E-mail: info@dilloneng.com URL: www.dilloneng.com Core

More information

Advanced Synthesis Techniques

Advanced Synthesis Techniques Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:

More information

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

Interfacing a High Speed Crypto Accelerator to an Embedded CPU Interfacing a High Speed Crypto Accelerator to an Embedded CPU Alireza Hodjat ahodjat @ee.ucla.edu Electrical Engineering Department University of California, Los Angeles Ingrid Verbauwhede ingrid @ee.ucla.edu

More information

High Level Abstractions for Implementation of Software Radios

High Level Abstractions for Implementation of Software Radios High Level Abstractions for Implementation of Software Radios J. B. Evans, Ed Komp, S. G. Mathen, and G. Minden Information and Telecommunication Technology Center University of Kansas, Lawrence, KS 66044-7541

More information

Software Synthesis from Dataflow Models for G and LabVIEW

Software Synthesis from Dataflow Models for G and LabVIEW Software Synthesis from Dataflow Models for G and LabVIEW Hugo A. Andrade Scott Kovner Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 andrade@mail.utexas.edu

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Developing Applications for HPRCs

Developing Applications for HPRCs Developing Applications for HPRCs Esam El-Araby The George Washington University Acknowledgement Prof.\ Tarek El-Ghazawi Mohamed Taher ARSC SRC SGI Cray 2 Outline Background Methodology A Case Studies

More information

INTRODUCTION TO CATAPULT C

INTRODUCTION TO CATAPULT C INTRODUCTION TO CATAPULT C Vijay Madisetti, Mohanned Sinnokrot Georgia Institute of Technology School of Electrical and Computer Engineering with adaptations and updates by: Dongwook Lee, Andreas Gerstlauer

More information

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder

ESE532: System-on-a-Chip Architecture. Today. Programmable SoC. Message. Process. Reminder ESE532: System-on-a-Chip Architecture Day 5: September 18, 2017 Dataflow Process Model Today Dataflow Process Model Motivation Issues Abstraction Basic Approach Dataflow variants Motivations/demands for

More information

Adaptive Stream Mining: A Novel Dynamic Computing Paradigm for Knowledge Extraction

Adaptive Stream Mining: A Novel Dynamic Computing Paradigm for Knowledge Extraction Adaptive Stream Mining: A Novel Dynamic Computing Paradigm for Knowledge Extraction AFOSR DDDAS Program PI Meeting Presentation PIs: Shuvra S. Bhattacharyya, University of Maryland Mihaela van der Schaar,

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

FPGA Based Digital Design Using Verilog HDL

FPGA Based Digital Design Using Verilog HDL FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology

More information

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation Contents Part I Basic Concepts 1 The Nature of Hardware and Software... 3 1.1 Introducing Hardware/Software Codesign... 3 1.1.1 Hardware... 3 1.1.2 Software... 5 1.1.3 Hardware and Software... 7 1.1.4

More information

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (FFT_MIXED) November 26, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E mail: info@dilloneng.com URL: www.dilloneng.com

More information

A Library of Parameterized Floating-point Modules and Their Use

A Library of Parameterized Floating-point Modules and Their Use A Library of Parameterized Floating-point Modules and Their Use Pavle Belanović and Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115, USA {pbelanov,mel}@ece.neu.edu

More information

Two-level Reconfigurable Architecture for High-Performance Signal Processing

Two-level Reconfigurable Architecture for High-Performance Signal Processing International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing

More information

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra

More information

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007 EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides

More information

Reconfigurable Cell Array for DSP Applications

Reconfigurable Cell Array for DSP Applications Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Design and Verification of FPGA Applications

Design and Verification of FPGA Applications Design and Verification of FPGA Applications Giuseppe Ridinò Paola Vallauri MathWorks giuseppe.ridino@mathworks.it paola.vallauri@mathworks.it Torino, 19 Maggio 2016, INAF 2016 The MathWorks, Inc. 1 Agenda

More information

DFT Compiler for Custom and Adaptable Systems

DFT Compiler for Custom and Adaptable Systems DFT Compiler for Custom and Adaptable Systems Paolo D Alberto Electrical and Computer Engineering Carnegie Mellon University Personal Research Background Embedded and High Performance Computing Compiler:

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

Unstructured Finite Element Computations on. Configurable Computers

Unstructured Finite Element Computations on. Configurable Computers Unstructured Finite Element Computations on Configurable Computers by Karthik Ramachandran Thesis submitted to the Faculty of Virginia Polytechnic Institute and State University in partial fulfillment

More information

Advanced Design System 1.5. DSP Synthesis

Advanced Design System 1.5. DSP Synthesis Advanced Design System 1.5 DSP Synthesis December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP) Floating-point to Fixed-point Conversion for Efficient i Implementation ti of Digital Signal Processing Programs (Short Version for FPGA DSP) Version 2003. 7. 18 School of Electrical Engineering Seoul

More information

Computational Process Networks

Computational Process Networks Computational Process Networks for Real-Time High-Throughput Signal and Image Processing Systems on Workstations Gregory E. Allen EE 382C - Embedded Software Systems 17 February 2000 http://www.ece.utexas.edu/~allen/

More information

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park

More information

Lecture 2B. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 2B. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 2B RTL Design Methodology Transition from Pseudocode & Interface to a Corresponding Block Diagram Structure of a Typical Digital Data Inputs Datapath (Execution Unit) Data Outputs System Control

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Distributed Vision Processing in Smart Camera Networks

Distributed Vision Processing in Smart Camera Networks Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP

More information

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design

Two HDLs used today VHDL. Why VHDL? Introduction to Structured VLSI Design Two HDLs used today Introduction to Structured VLSI Design VHDL I VHDL and Verilog Syntax and ``appearance'' of the two languages are very different Capabilities and scopes are quite similar Both are industrial

More information

CSE 140 Lecture 16 System Designs. CK Cheng CSE Dept. UC San Diego

CSE 140 Lecture 16 System Designs. CK Cheng CSE Dept. UC San Diego CSE 140 Lecture 16 System Designs CK Cheng CSE Dept. UC San Diego 1 System Designs Introduction Methodology and Framework Components Specification Implementation 2 Introduction Methodology Approach with

More information

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications Research Journal of Applied Sciences, Engineering and Technology 7(23): 5021-5025, 2014 DOI:10.19026/rjaset.7.895 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators

More information

EE178 Spring 2018 Lecture Module 4. Eric Crabill

EE178 Spring 2018 Lecture Module 4. Eric Crabill EE178 Spring 2018 Lecture Module 4 Eric Crabill Goals Implementation tradeoffs Design variables: throughput, latency, area Pipelining for throughput Retiming for throughput and latency Interleaving for

More information

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

ADSL Transmitter Modeling and Simulation. Department of Electrical and Computer Engineering University of Texas at Austin. Kripa Venkatachalam.

ADSL Transmitter Modeling and Simulation. Department of Electrical and Computer Engineering University of Texas at Austin. Kripa Venkatachalam. ADSL Transmitter Modeling and Simulation Department of Electrical and Computer Engineering University of Texas at Austin Kripa Venkatachalam Qiu Wu EE382C: Embedded Software Systems May 10, 2000 Abstract

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics Yojana Jadhav 1, A.P. Hatkar 2 PG Student [VLSI & Embedded system], Dept. of ECE, S.V.I.T Engineering College, Chincholi,

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Reducing the cost of FPGA/ASIC Verification with MATLAB and Simulink

Reducing the cost of FPGA/ASIC Verification with MATLAB and Simulink Reducing the cost of FPGA/ASIC Verification with MATLAB and Simulink Graham Reith Industry Manager Communications, Electronics and Semiconductors MathWorks Graham.Reith@mathworks.co.uk 2015 The MathWorks,

More information

Introduction to High level. Synthesis

Introduction to High level. Synthesis Introduction to High level Synthesis LISHA/UFSC Prof. Dr. Antônio Augusto Fröhlich Tiago Rogério Mück http://www.lisha.ufsc.br/~guto June 2007 http://www.lisha.ufsc.br/ 1 What is HLS? Example: High level

More information

FPGA Based Digital Signal Processing Applications & Techniques. Nathan Eddy Fermilab BIW12 Tutorial

FPGA Based Digital Signal Processing Applications & Techniques. Nathan Eddy Fermilab BIW12 Tutorial FPGA Based Digital Signal Processing Applications & Techniques BIW12 Tutorial Outline Digital Signal Processing Basics Modern FPGA Overview Instrumentation Examples Advantages of Digital Signal Processing

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

White Paper Assessing FPGA DSP Benchmarks at 40 nm

White Paper Assessing FPGA DSP Benchmarks at 40 nm White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of

More information

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs Adam Arnesen NSF Center for High-Performance Reconfigurable Computing (CHREC) Dept. of Electrical and Computer Engineering Brigham Young

More information

Design and Verification of FPGA and ASIC Applications Graham Reith MathWorks

Design and Verification of FPGA and ASIC Applications Graham Reith MathWorks Design and Verification of FPGA and ASIC Applications Graham Reith MathWorks 2014 The MathWorks, Inc. 1 Agenda -Based Design for FPGA and ASIC Generating HDL Code from MATLAB and Simulink For prototyping

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM

ESE532: System-on-a-Chip Architecture. Today. Message. Clock Cycle BRAM ESE532: System-on-a-Chip Architecture Day 20: April 3, 2017 Pipelining, Frequency, Dataflow Today What drives cycle times Pipelining in Vivado HLS C Avoiding bottlenecks feeding data in Vivado HLS C Penn

More information

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology An Ultra ow-power WOA Filterbank Implementation in Deep Submicron Technology R. Brennan, T. Schneider Dspfactory td 611 Kumpf Drive, Unit 2 Waterloo, Ontario, Canada N2V 1K8 Abstract The availability of

More information

Simulink Design Environment

Simulink Design Environment EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

RTL Coding General Concepts

RTL Coding General Concepts RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable

More information

VHDL. VHDL History. Why VHDL? Introduction to Structured VLSI Design. Very High Speed Integrated Circuit (VHSIC) Hardware Description Language

VHDL. VHDL History. Why VHDL? Introduction to Structured VLSI Design. Very High Speed Integrated Circuit (VHSIC) Hardware Description Language VHDL Introduction to Structured VLSI Design VHDL I Very High Speed Integrated Circuit (VHSIC) Hardware Description Language Joachim Rodrigues A Technology Independent, Standard Hardware description Language

More information

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression Divakara.S.S, Research Scholar, J.S.S. Research Foundation, Mysore Cyril Prasanna Raj P Dean(R&D), MSEC, Bangalore Thejas

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

A Hardware/Software Co-design Flow and IP Library Based on Simulink

A Hardware/Software Co-design Flow and IP Library Based on Simulink A Hardware/Software Co-design Flow and IP Library Based on Simulink L.M.Reyneri, F.Cucinotta, A.Serra Dipartimento di Elettronica Politecnico di Torino, Italy email:reyneri@polito.it L.Lavagno DIEGM Università

More information

HDL Cosimulation August 2005

HDL Cosimulation August 2005 HDL Cosimulation August 2005 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard to this material, including,

More information

The CompSOC Design Flow for Virtual Execution Platforms

The CompSOC Design Flow for Virtual Execution Platforms NEST COBRA CA104 The CompSOC Design Flow for Virtual Execution Platforms FPGAWorld 10-09-2013 Sven Goossens*, Benny Akesson*, Martijn Koedam*, Ashkan Beyranvand Nejad, Andrew Nelson, Kees Goossens* * Introduction

More information

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013 Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.

More information

Exercises in DSP Design 2016 & Exam from Exam from

Exercises in DSP Design 2016 & Exam from Exam from Exercises in SP esign 2016 & Exam from 2005-12-12 Exam from 2004-12-13 ept. of Electrical and Information Technology Some helpful equations Retiming: Folding: ω r (e) = ω(e)+r(v) r(u) F (U V) = Nw(e) P

More information

A Stream Compiler for Communication-Exposed Architectures

A Stream Compiler for Communication-Exposed Architectures A Stream Compiler for Communication-Exposed Architectures Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Floating-Point Bitwidth Analysis via Automatic Differentiation

Floating-Point Bitwidth Analysis via Automatic Differentiation Floating-Point Bitwidth Analysis via Automatic Differentiation Altaf Abdul Gaffar 1, Oskar Mencer 2, Wayne Luk 1, Peter Y.K. Cheung 3 and Nabeel Shirazi 4 1 Department of Computing, Imperial College, London

More information

SoC Design for the New Millennium Daniel D. Gajski

SoC Design for the New Millennium Daniel D. Gajski SoC Design for the New Millennium Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski Outline System gap Design flow Model algebra System environment

More information

Xilinx System Generator v Xilinx Blockset Reference Guide. for Simulink. Introduction. Xilinx Blockset Overview.

Xilinx System Generator v Xilinx Blockset Reference Guide. for Simulink. Introduction. Xilinx Blockset Overview. Xilinx System Generator v1.0.1 for Simulink Introduction Xilinx Blockset Overview Blockset Elements Xilinx Blockset Reference Guide Printed in U.S.A. Xilinx System Generator v1.0.1 Reference Guide About

More information

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture FPGA Architecture Overview dr chris dick dsp chief architect wireless and signal processing group xilinx inc. Generic FPGA Architecture () Generic FPGA architecture consists of an array of logic tiles

More information

IMPLICIT+EXPLICIT Architecture

IMPLICIT+EXPLICIT Architecture IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

HDL Cosimulation May 2007

HDL Cosimulation May 2007 HDL Cosimulation May 2007 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard to this material, including,

More information

Advanced Design System DSP Synthesis

Advanced Design System DSP Synthesis Advanced Design System 2002 DSP Synthesis February 2002 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard

More information

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO 2402 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016 A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO Antony Xavier Glittas,

More information