Platform-Based Behavior-Level and System-Level Synthesis. Prof. Jason Cong UCLA Computer Science Department

Size: px
Start display at page:

Download "Platform-Based Behavior-Level and System-Level Synthesis. Prof. Jason Cong UCLA Computer Science Department"

Transcription

1 Platform-Based Behavior-Level and System-Level Synthesis Prof. Jason Cong UCLA Computer Science Department

2 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions

3 ASICs SOC Example: Philips Nexperia General-purpose scalable RISC processor 50 to 300+ MHz 32-bit or 64-bit Library of device IP blocks Image coprocessors DSPs UART 1394 USB Courtesy Philips MIPS MIPS CPU D$ PRxxxx I$ DEVICE IP BLOCK DEVICE IP BLOCK.. DEVICE IP BLOCK. PI BUS DVP SYSTEM SILICON SDRAM MMI DVP MEMORY BUS PI BUS TriMedia TriMedia CPU D$ TM-xxxx I$ DEVICE IP BLOCK DEVICE IP. BLOCK DEVICE IP. BLOCK Philips Nexperia SoC platform for high-end digital video. Scalable VLIW media processor: MPEG VIDEO MSP MIPS 100 to 300+ MHz 32-bit or 64-bit Nexperia system buses bit ACCESS CTL. VLIW

4 Field-Programmable SOC Example: Xilinx Virtex-4 4 FPGA Soft core µproc MicroBlaze 180MHz < ~1300 LUTs 166 DMIPS IP IP IBM CoreConnect Bus Micro- Blaze H.264/AVC hardware blocks PowerPC 405 (PPC405) core 450 MHz, 700+ DMIPS RISC core (32-bit Harvard architecture) Courtesy Xilinx

5 Needs for Electronic System-Level (ESL) Design Automation Need executable models for system-level specification Need common specification for SW/HW co-design Need better complexity management

6 ESL Landscape Modeling SystemC -- OpenSource SystemVerilog Simulation and Verification Behavior-level simulation & verification System-level simulation & verification SystemC provides behavior-level and system-level synthesis capabilities for free -- rapidly gaining popularity Synthesis Behavior-level synthesis: from behavior specification (e.g. C, SystemC, or o Matlab) ) to RTL or netlists System-level synthesis: from system specification to system implementation ion

7 xpilot: Platform-Based Synthesis System SystemC/C Platform Description & Constraints xpilot xpilot Front End SSDM (System-Level Synthesis Data Model) Profiling Analysis Mapping Processor & Architecture Synthesis Processor Cores + Executables Interface Synthesis Drivers + Glue Logic Behavioral Synthesis Custom Logic Embedded SoC Uniqueness of xpilot Platform-based synthesis and optimization Communication-centric centric synthesis with interconnect optimization

8 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions

9 xpilot: Behavioral-to to-rtl Synthesis Flow Platform description SSDM Behavioral spec. in C/SystemC RTL + constraints FPGAs/ASICs Frontend compiler Presynthesis optimizations Loop unrolling/shifting Strength reduction / Tree height reduction Bitwidth analysis Memory analysis Core synthesis optimizations Scheduling Resource binding, e.g., functional unit binding register/port binding µarch-generation & RTL/constraints generation Verilog/VHDL/SystemC FPGAs: Altera, Xilinx ASICs: Magma, Synopsys,

10 xpilot Advantages Advanced algorithms for platform-based, communication- centric optimization E.g. a versatile scheduling engine based on solving system of difference constraints (SDC) Platform-based behavior and system synthesis E.g. resource binding based on distributed register architecture Communication/interconnect-centric centric approach E.g. behavior and communication co-optimization optimization Complete validation through final P&R on FPGAs

11 Advanced Behavior System Algorithms: Example: Versatile Scheduling Algorithm Based on SDC Scheduling problem in behavioral synthesis is NP- Complete under general design constraints ILP-based solutions are versatile but very inefficient Exponential time complexity Our solution: An efficient and versatile scheduler based on SDC (system of difference constraints) Applicable to a broad spectrum of applications Computation/Data-intensive, intensive, control-intensive, memory- intensive, partially timed. Salable to large-size designs (finishes in a few seconds) Amenable to a rich set of scheduling constraints: Resource constraints, latency constraints, frequency constraints, relative IO timing constraints. Capable of a variety of synthesis optimizations: Operation chaining, pipelining, multi-cycle communication, incremental scheduling, etc. CS0 CS1 * *5 * + *1 *

12 Scheduling Our Approach Overall approach Current objective: high-performance Use a system of integer difference constraints to express all kinds of scheduling constraints Represent the design objective in a linear function + * Platform characterization: adder (+/ ) ) 2ns multipiler (*): 5ns Target cycle time: 10ns Resource constraint: Only ONE multiplier is available v 1 v 2 * v 3 v 5 + v Dependency constraint v 1 v 3 : x 3 x 1 0 v 2 v 3 : x 3 x 2 0 v 3 v 5 : x 4 x 3 0 v 4 v 5 : x 5 x 4 0 Frequency constraint <v 2, v 5 > : x 5 x 2 1 Resource constraint <v 2, v 3 >: x 3 x 2 1 X 1 X 2 X 3 X 4 X A x b Totally unimodular matrix: guarantees integral solutions

13 Platform Modeling & Characterization Target platform specification High-level resource library with delay/latency/area/power curve for various input/bitwidth configurations Functional units: adders, ALUs, multipliers, comparators, etc. Connectors: mux, demux,, etc. Memories: registers, synchronous memories, etc. Chip layout description On-chip resource distributions On-chip interconnect delay/power estimation ALU Two binding solutions for same behavior: Which one is better? Answer is platform-dependent: How large/fast are the MUX and ALU? MUX ALU ALU 3X3 Delay Matrix for Stratix-EP1S40

14 Communication- and Interconnect-Centric Synthesis: Example: Use of Distributed Register-File Architectures Island C Island A Island B 2 3 Local Local Register Register File File Data-Routing Logic Input Buffers Binding using discrete registers FUP MUX Functional Unit Pool ALU MUL ALU A scheduled DFG with register binding indicated on each variable (assume one-functional unit constraint) Binding using a register file: more efficient design! Distributed register-file micro-architecture: Efficiently use on-chip embedded memories Fully explore operation and data-transfer transfer parallelism

15 Distributed Register-File Microarchitecture Island B Island A Local Local Register Register File File Data-Routing Logic Input Buffers On-chip memory blocks Island C FUP MUX Xilinx XC-2V 2000 Functional Unit Pool ALU MUL ALU Island A Island B Island C #18Kb BRAM Dist. RAM(Kb) , ,456 FP-SoC On-chip RAM resource on Virtex II

16 Resource Binding for DRF-Microarchitecture Intra-island transfers Island (Chain) v 1 A v 2 v 3 v 4 v 6 v 7 v 5 v 8 v 10 B C D Inter-island connections = 5 (A,B)=(A,D)=1 (A,C)=1, two data transfers share one connection (C,D)=2 Inter-island transfers v 9 Facts under simplified assumptions Operations bound onto an island form a chain in the given scheduled DFG Inter-chain data transfers may share a physical inter-island island connection The number of inter-island island connections (IIC) is crucial to the QoR of a DRFM instance

17 Example: Behavior and Communication Co-Optimization in Platform-Based Interface Synthesis Focus on sequential communication media (SCM) FIFOs (e.g., Xilinx FSLs), Buses (e.g., Xilinx CoreConnect.. Altera Avalon, etc.) Order may have dramatic impact on performance Best order should guarantee that no data transmission on critical l path are delayed by non-critical transmission Interface synthesis for SCM Consider both behavior and communication to determine the optimal l transmission order for (int i=0; i <8; i++) { S1: data[i] = ; } P1 C data[8] int s07 = data[0] + data[7]; Int s16 = data[1] + data[6];.. P2 Custom Logic 1 PE1 FIFO DCT example Custom logic 2 PE2

18 Proposed SCM Co-Optimization Design Flow Process Network Platform Description & Constraints Front End System-Level Synthesis Data Model SCOOP (SCM CO-Optimization) Optimization) Communication order detection Code transformation and interface generation Indices compression for loop reordering Drivers + Glue Logics Process Behavior

19 Initial Results of Interface Synthesis Target for sequential communication channels In particular, FSL in VirtexII Consider two communicating processes Total latency (Cycle#) RAs Compress Designs Trad. SCOOP Reduction Before After DCT % 0 0 Haar % 0 0 DWT % 0 0 Mat_mul % DCT % Masking % Dot % An average of 26% improvement in total latency can be achieved.

20 SystemC/C-to to-rtl Design Flow SystemC/C specification xpilot behavioral synthesis Front-end compiler SSDM (System-Level Synthesis Data Model) SSDM/CDFG Behavioral synthesis SSDM/FSMD RTL generation Platform description & constraints FSM with Datapath in VHDL Floorplan and/or multi- cycle path constraints RTL synthesis ASICs/FPGAs platform

21 Preliminary Results of xpilot Shorter Simulation/Verification Cycle From From other projects: Simulation speed on behavior model 100X faster than RTL-based method [NEC, ASPDAC04] Our Our experience: Motion-compensation module in a Mpeg4-decoder Behavior level (in C language) simulation Less than 1 second per frame RTL SystemC simulation About 310 second per frame

22 Preliminary Results of xpilot Better Complexity Management Significant code size reduction RTL design Behavioral design: 10x code size reduction VHDL code generated by UCLA xpilot targeting Altera Stratix platform

23 Preliminary Results of xpilot Rapid System Exploration Quick evaluation of different hardware/software boundaries Example: Motion-JPEG implementation -All HW implementation -All SW implementation (using embedded processors) -SW/HW co-design: optimal partitioning? -Repeated manual RTL coding is not solution!

24 Preliminary Results on Motion-JPEG Example Preprocess DCT Quant Huffman RAW Images Encoded JPEG Images OR Table Modification Preprocess HW-DCT Quant Huffman System Cycle# Table Modification Fmax (MHZ) Model #1 : 5 Microblazes FSL-based communication Model #2 : 4 Microblazes + DCT on FPGA fabrics Exe Time (ms) Area (Slice#) Model # Model #2 Xilinx XUP Board (-38%)

25 Preliminary Result of xpilot Better QoR (Comparison with UCI/UCSD SPARK) SPARK xpilot Delay Ratio Designs Slice Resource Usage Slice (LUT) Slice (FF) DSP Fmax (MHz) Slice Resource Usage Slice (LUT) Slice (FF) DSP Fmax (MHz) xpilot /SPARK PR WANG LEE MCM DIR Ave Ratio n/a Device setting: Xilinx Virtex-II pro (xc2v4000-6) Target frequency: 200 MHz

26 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions

27 Design Exploration for Heterogeneous MPSoC Platforms Heterogeneous MPSoCs exploration Processors Heterogeneous vs. homogeneous General-purpose vs. application-specific On-chip communication architecture (OCA) Bus (e.g. AMBA, CoreConnect), packet switching network (e.g. Alpha 21364) Memory hierarchy tasks µp µp OS Driver µp IP tasks µp µp OS µp Driver tasks µp OS Driver µp µp FPGA DSP Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Communication Network

28 Configurable SoC Platforms General General purpose processor cores + programmable fabric Tight integration using extended instructions (ASIPs( ASIPs) Example: Altera Nios / Nios II Loose integration using FIFOs/busses for communications Example: Xilinx MicroBlaze, etc. Custom instruction logic for Nios II [source: Xilinx MicroBlaze [source:

29 ASIP Compilation: Problem Statement Given: CDFG G(V, E) The basic instruction set I Pattern constraints: Number of inputs PI(pi) Nin; Number of outputs PO(pi) = 1; 1 Total area Objective: 1 i N area( p ) < A Generate a pattern library P Map G to the extended instruction set I P,, so that the total execution time is minimized i t 1 = a * b; t 2 = b * c; ; t 3 = d * e; t 4 = t 1 + t 2 ; t 5 = t 2 + t 3 ; t 6 = t 5 + t 4 ; a b c d e * * + ext-inst 1 (MAC 1 : 2 cycles) t 4 = ext-inst 1 (a, b, c); t 5 = ext-inst 2 (b, c, d, e); t 6 = t 4 + t 5 ; Performance speedup = 9 / 5 = 1.8X t 4 t * ext-inst 2 (MAC 2 : 2 cycles) t 6 * 2 clock cycles + 1 clock cycle

30 Target Core Processor Model Core processor model Classic single-issue issue pipelined RISC core (fetch / decode / execute / mem / write-back) The number of input and output operands of an instruction is pre-determined An instruction reads the core register file during the execute stage, s and commits the result during the write-back stage PC 4 Adder Inst Cache IF / ID RS1 RS2 Reg File ID / EX OP 1 OP 2 ALU EX / MEM Memory MEM / WB MUX Core Processor Result Custom Logic

31 ASIP Compilation Flow C code Front-end compilation CDFG 3. Application mapping & Graph covering Optimized CDFG Backend compilation µarch constraint 1. Pattern generation 2. Pattern selection Pattern library Pattern Generation Satisfying input/output constraints Pattern Selection Select a subset to maximize the potential speedup while satisfying the resource constraint Application Mapping Graph covering to minimize the total execution time Optimized assembly

32 Experimental Results on Altera Nios Altera Nios is used for ASIP implementation 5 extended instruction formats up to 2048 instructions for each format Small DSP applications are taken as benchmark Extended Instruction# Speedup Estimation Nios LE Resource Overhead Memory DSP Block fft_br % 65, % 16 iir % 4, % 40 fir % 1, % 8 pr % % 14 dir % % 16 mcm % % 56 Average % % -

33 Architecture Extension for ASIPs Data bandwidth problem Limited register file bandwidth (two read ports, one write port) ~40% of the ideal performance speedup will be lost Shadow-register register-based architectural extension Core registers are augmented by an extra set of shadow registers Conditionally written during write-back stage Low power/area overhead Novel shadow-register binding algorithms are developed PC 4 Adder Inst Cache IF / ID RS1 RS2 Reg File ID / EX OP 1 OP 2 ALU EX / MEM Memory MEM / WB MUX Core Processor Result k = hash(j) Hashing Unit SR SR 11 SR SR K Custom Logic

34 Ongoing Work : Mapping for Heterogeneous Integration with Multiple Processing Cores Given: A library of processing cores P and communication library C Task graph G(V, E) For each v in V,, execution time t(v, p i ) on p i For each (u,( v) in E,, communication data size s(u,v) Throughput constraint Problem: Select and instantiate the processing elements and communication channels from P and C respectively Map the tasks onto the processing elements and communications to the channels so that The optimal latency is achieved subject to the throughput constraint The implementation cost is minimized

35 MPEG-4 4 Simple Profile Decoder: Architecture Profiling C specification overview Module Name Orig. C Source File Orig. C line # Copy Controller copycontrol.c 287 Display Controller displaycontrol.c 358 Runtime Profiling (PowerPC/XUP board) Parser/VLD 59.0% Motion Comp. Parser /VLD Motion- Compensation.c parser.c texture_vld.c Texture/IDCT Motion Comp. Copy Controller 18.1% 15.7% 3.6% Texture /IDCT Texture Update texture_idct.c textureupdate.c

36 MPEG-4 4 Simple Profile Decoder: Hyprid HW/SW Impmentation HW block Integrated with PowerPC single process design: Software blocks running on PowerPC 15% speed improvement

37 MPEG-4 4 Simple Profile Decoder: Alternate Implementations Single ublaze 7-uBlaze Single PowerPC Single PowerPC w/ HW Motion Comp. Throughput (Frame per Second) Improvement % % % xpilot Synthesis Report of HW blocks C Line counts RTL SystemC RTL VHDL Slices ( FFs, LUTs) MUL Clock period (ns) Latency (Cycles) Motion Comp (1111, 1017) Block IDCT (2376, 2438) Texture Update (1696, 1931)

38 Conclusions xpilot has fairly mature and advanced behavior synthesis capability ity from C or SystemC to RTL code with necessary design constraints xpilot advantages include Platform-based behavior and system synthesis Communication/interconnect-centric centric approach Advanced algorithms for platform-based, communication-centric centric optimization Promising results demonstrated on available FPGAs xpilot system synthesis capabilities Performance simulation of multi-processor systems Exploration the efficient use of (multiple) on-chip processors Compilation and optimization for reconfigurable processors

39 Acknowledgements We would like to thank the supports from Gigascale Systems Research Center (GSRC) National Science Foundation (NSF) Semiconductor Research Corporation (SRC) Industrial sponsors under the California MICRO programs (Altera, Xilinx) Team members: Yiping Fan Guoling Han Wei Jiang Zhiru Zhang

Prof. Jason Cong UCLA Computer Science Department. Advantages of behavioral synthesis Scheduling Resource binding

Prof. Jason Cong UCLA Computer Science Department. Advantages of behavioral synthesis Scheduling Resource binding xpilot: A Platform-Based System-Level Synthesis for Reconfigurable SOCs Prof. Jason Cong cong@cs.ucla.edu UCLA Computer Science Department Outline Motivation xpilot system framework Behavior-level synthesis

More information

xpilot: A Platform-Based Behavioral Synthesis System

xpilot: A Platform-Based Behavioral Synthesis System xpilot: A Platform-Based Behavioral Synthesis System Deming Chen, Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, Zhiru Zhang University of California, Los Angeles Email: {demingc, cong, fanyp, leohgl,

More information

Pilot: A Platform-based HW/SW Synthesis System

Pilot: A Platform-based HW/SW Synthesis System Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Towards Layout-Friendly High-Level Synthesis

Towards Layout-Friendly High-Level Synthesis Towards Layout-Friendly High-Level Synthesis Jason Cong Bin Liu Guojie Luo Raghu Prabhakar UCLA UCLA Peking University UCLA Outline High-level synthesis and layout-friendly architecture Evaluation of the

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

asoc: : A Scalable On-Chip Communication Architecture

asoc: : A Scalable On-Chip Communication Architecture asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

ORCA FPGA- Optimized VectorBlox Computing Inc.

ORCA FPGA- Optimized VectorBlox Computing Inc. ORCA FPGA- Optimized 2016 VectorBlox Computing Inc. 1 ORCA FPGA- Optimized Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s < $5.00 ISA: RV32IM hw multiply, sw divider < 2,000 LUTs ~ 20MHz What is ORCA? Family

More information

Architecture-Level Synthesis for Automatic Interconnect Pipelining

Architecture-Level Synthesis for Automatic Interconnect Pipelining Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu

More information

Performance Verification for ESL Design Methodology from AADL Models

Performance Verification for ESL Design Methodology from AADL Models Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr

More information

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20 University of Pannonia Dept. Of Electrical Engineering and Information Systems Hardware Design MicroBlaze v.8.10 / v.8.20 Instructor: Zsolt Vörösházi, PhD. This material exempt per Department of Commerce

More information

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

SoC Design for the New Millennium Daniel D. Gajski

SoC Design for the New Millennium Daniel D. Gajski SoC Design for the New Millennium Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski Outline System gap Design flow Model algebra System environment

More information

FPGA Polyphase Filter Bank Study & Implementation

FPGA Polyphase Filter Bank Study & Implementation FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

Platform-based Design

Platform-based Design Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the

More information

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination 1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question

More information

Design Space Exploration Using Parameterized Cores

Design Space Exploration Using Parameterized Cores RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE

More information

NANOMETER process technologies allow billions of transistors

NANOMETER process technologies allow billions of transistors 550 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 Architecture and Synthesis for On-Chip Multicycle Communication Jason Cong, Fellow, IEEE, Yiping

More information

Digital Integrated Circuits

Digital Integrated Circuits Digital Integrated Circuits Lecture 9 Jaeyong Chung Robust Systems Laboratory Incheon National University DIGITAL DESIGN FLOW Chung EPC6055 2 FPGA vs. ASIC FPGA (A programmable Logic Device) Faster time-to-market

More information

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,

More information

A New Design Methodology for Composing Complex Digital Systems

A New Design Methodology for Composing Complex Digital Systems A New Design Methodology for Composing Complex Digital Systems S. L. Chu* 1, M. J. Lo 2 1,2 Department of Information and Computer Engineering Chung Yuan Christian University Chung Li, 32023, Taiwan *slchu@cycu.edu.tw

More information

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers.

A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers. A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms Xilinx Research OUTLINE Historical Perspective Conventional FPGAs Applications and Programming

More information

Architecture and Synthesis for Multi-Cycle Communication

Architecture and Synthesis for Multi-Cycle Communication Architecture and Synthesis for Multi-Cycle Communication Jason Cong, Yiping Fan, Xun Yang, Zhiru Zhang Computer Science Department University of California, Los Angeles Los Angeles CA 90095 USA {cong,

More information

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA L2: FPGA HARDWARE 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545: FALL 2014 2 Admin stuff Project Proposals happen on Monday Be prepared to give an in-class presentation Lab 1 is

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs Pieter Anemaet (1159100), Thijs van As (1143840) {P.A.M.Anemaet, T.vanAs}@student.tudelft.nl Computer Architecture (Special

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

System-on Solution from Altera and Xilinx

System-on Solution from Altera and Xilinx System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable

More information

Vivado HLx Design Entry. June 2016

Vivado HLx Design Entry. June 2016 Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page

More information

Overview of SOC Architecture design

Overview of SOC Architecture design Computer Architectures Overview of SOC Architecture design Tien-Fu Chen National Chung Cheng Univ. SOC - 0 SOC design Issues SOC architecture Reconfigurable System-level Programmable processors Low-level

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

TKT-2431 SoC design. Introduction to exercises. SoC design / September 10

TKT-2431 SoC design. Introduction to exercises. SoC design / September 10 TKT-2431 SoC design Introduction to exercises Assistants: Exercises and the project work Juha Arvio juha.arvio@tut.fi, Otto Esko otto.esko@tut.fi In the project work, a simplified H.263 video encoder is

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013 Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

FPGA Based Digital Design Using Verilog HDL

FPGA Based Digital Design Using Verilog HDL FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

An FPGA based rapid prototyping platform for wavelet coprocessors

An FPGA based rapid prototyping platform for wavelet coprocessors An FPGA based rapid prototyping platform for wavelet coprocessors Alonzo Vera a, Uwe Meyer-Baese b and Marios Pattichis a a University of New Mexico, ECE Dept., Albuquerque, NM87131 b FAMU-FSU, ECE Dept.,

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

Lecture 21: High-level Synthesis (2)

Lecture 21: High-level Synthesis (2) Lecture 21: High-level Synthesis (2) Slides courtesy of Deming Chen Outline Binding for DFG Left-edge algorithm Network flow algorithm Binding to reduce interconnects Simultaneous scheduling and binding

More information

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu

More information

FPGA for Software Engineers

FPGA for Software Engineers FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

ESL design with the Agility Compiler for SystemC

ESL design with the Agility Compiler for SystemC ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs Deming Chen Department of ECE University of Illinois, Urbana-Champaign dchen@uiuc.edu Jason Cong, Yiping Fan, Zhiru Zhang Computer

More information

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems

More information

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for

More information

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

Simultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture

Simultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture Simultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture JASON CONG University of California, Los Angeles YIPING FAN AutoESL Inc. and JUNJUAN

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Intellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus

Intellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus Intellectual Property Macrocell for SpaceWire Interface Compliant with AMBA-APB Bus L. Fanucci, A. Renieri, P. Terreni Tel. +39 050 2217 668, Fax. +39 050 2217522 Email: luca.fanucci@iet.unipi.it - 1 -

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Table 1: Example Implementation Statistics for Xilinx FPGAs

Table 1: Example Implementation Statistics for Xilinx FPGAs logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform Design of Transport Triggered Architecture Processor for Discrete Cosine Transform by J. Heikkinen, J. Sertamo, T. Rautiainen,and J. Takala Presented by Aki Happonen Table of Content Introduction Transport

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Course Overview Revisited

Course Overview Revisited Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for

More information

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional

More information

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

Embedded System Design

Embedded System Design Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner 9/29/2011 Outline System design trends Model-based synthesis Transaction level model generation Application

More information

Key technologies for many core architectures

Key technologies for many core architectures Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Vector IRAM: A Microprocessor Architecture for Media Processing

Vector IRAM: A Microprocessor Architecture for Media Processing IRAM: A Microprocessor Architecture for Media Processing Christoforos E. Kozyrakis kozyraki@cs.berkeley.edu CS252 Graduate Computer Architecture February 10, 2000 Outline Motivation for IRAM technology

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

The Growing Designer Productivity Gap

The Growing Designer Productivity Gap RAM Interface 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017 2021 CprE 488 Embedded Systems Design Lecture 2 Embedded Platforms The Growing Designer Productivity Gap Embedded systems today are characterized

More information

CprE 488 Embedded Systems Design. Lecture 2 Embedded Platforms

CprE 488 Embedded Systems Design. Lecture 2 Embedded Platforms CprE 488 Embedded Systems Design Lecture 2 Embedded Platforms Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu Don t reinvent

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information