Platform-Based Behavior-Level and System-Level Synthesis. Prof. Jason Cong UCLA Computer Science Department
|
|
- Maria Caldwell
- 5 years ago
- Views:
Transcription
1 Platform-Based Behavior-Level and System-Level Synthesis Prof. Jason Cong UCLA Computer Science Department
2 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions
3 ASICs SOC Example: Philips Nexperia General-purpose scalable RISC processor 50 to 300+ MHz 32-bit or 64-bit Library of device IP blocks Image coprocessors DSPs UART 1394 USB Courtesy Philips MIPS MIPS CPU D$ PRxxxx I$ DEVICE IP BLOCK DEVICE IP BLOCK.. DEVICE IP BLOCK. PI BUS DVP SYSTEM SILICON SDRAM MMI DVP MEMORY BUS PI BUS TriMedia TriMedia CPU D$ TM-xxxx I$ DEVICE IP BLOCK DEVICE IP. BLOCK DEVICE IP. BLOCK Philips Nexperia SoC platform for high-end digital video. Scalable VLIW media processor: MPEG VIDEO MSP MIPS 100 to 300+ MHz 32-bit or 64-bit Nexperia system buses bit ACCESS CTL. VLIW
4 Field-Programmable SOC Example: Xilinx Virtex-4 4 FPGA Soft core µproc MicroBlaze 180MHz < ~1300 LUTs 166 DMIPS IP IP IBM CoreConnect Bus Micro- Blaze H.264/AVC hardware blocks PowerPC 405 (PPC405) core 450 MHz, 700+ DMIPS RISC core (32-bit Harvard architecture) Courtesy Xilinx
5 Needs for Electronic System-Level (ESL) Design Automation Need executable models for system-level specification Need common specification for SW/HW co-design Need better complexity management
6 ESL Landscape Modeling SystemC -- OpenSource SystemVerilog Simulation and Verification Behavior-level simulation & verification System-level simulation & verification SystemC provides behavior-level and system-level synthesis capabilities for free -- rapidly gaining popularity Synthesis Behavior-level synthesis: from behavior specification (e.g. C, SystemC, or o Matlab) ) to RTL or netlists System-level synthesis: from system specification to system implementation ion
7 xpilot: Platform-Based Synthesis System SystemC/C Platform Description & Constraints xpilot xpilot Front End SSDM (System-Level Synthesis Data Model) Profiling Analysis Mapping Processor & Architecture Synthesis Processor Cores + Executables Interface Synthesis Drivers + Glue Logic Behavioral Synthesis Custom Logic Embedded SoC Uniqueness of xpilot Platform-based synthesis and optimization Communication-centric centric synthesis with interconnect optimization
8 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions
9 xpilot: Behavioral-to to-rtl Synthesis Flow Platform description SSDM Behavioral spec. in C/SystemC RTL + constraints FPGAs/ASICs Frontend compiler Presynthesis optimizations Loop unrolling/shifting Strength reduction / Tree height reduction Bitwidth analysis Memory analysis Core synthesis optimizations Scheduling Resource binding, e.g., functional unit binding register/port binding µarch-generation & RTL/constraints generation Verilog/VHDL/SystemC FPGAs: Altera, Xilinx ASICs: Magma, Synopsys,
10 xpilot Advantages Advanced algorithms for platform-based, communication- centric optimization E.g. a versatile scheduling engine based on solving system of difference constraints (SDC) Platform-based behavior and system synthesis E.g. resource binding based on distributed register architecture Communication/interconnect-centric centric approach E.g. behavior and communication co-optimization optimization Complete validation through final P&R on FPGAs
11 Advanced Behavior System Algorithms: Example: Versatile Scheduling Algorithm Based on SDC Scheduling problem in behavioral synthesis is NP- Complete under general design constraints ILP-based solutions are versatile but very inefficient Exponential time complexity Our solution: An efficient and versatile scheduler based on SDC (system of difference constraints) Applicable to a broad spectrum of applications Computation/Data-intensive, intensive, control-intensive, memory- intensive, partially timed. Salable to large-size designs (finishes in a few seconds) Amenable to a rich set of scheduling constraints: Resource constraints, latency constraints, frequency constraints, relative IO timing constraints. Capable of a variety of synthesis optimizations: Operation chaining, pipelining, multi-cycle communication, incremental scheduling, etc. CS0 CS1 * *5 * + *1 *
12 Scheduling Our Approach Overall approach Current objective: high-performance Use a system of integer difference constraints to express all kinds of scheduling constraints Represent the design objective in a linear function + * Platform characterization: adder (+/ ) ) 2ns multipiler (*): 5ns Target cycle time: 10ns Resource constraint: Only ONE multiplier is available v 1 v 2 * v 3 v 5 + v Dependency constraint v 1 v 3 : x 3 x 1 0 v 2 v 3 : x 3 x 2 0 v 3 v 5 : x 4 x 3 0 v 4 v 5 : x 5 x 4 0 Frequency constraint <v 2, v 5 > : x 5 x 2 1 Resource constraint <v 2, v 3 >: x 3 x 2 1 X 1 X 2 X 3 X 4 X A x b Totally unimodular matrix: guarantees integral solutions
13 Platform Modeling & Characterization Target platform specification High-level resource library with delay/latency/area/power curve for various input/bitwidth configurations Functional units: adders, ALUs, multipliers, comparators, etc. Connectors: mux, demux,, etc. Memories: registers, synchronous memories, etc. Chip layout description On-chip resource distributions On-chip interconnect delay/power estimation ALU Two binding solutions for same behavior: Which one is better? Answer is platform-dependent: How large/fast are the MUX and ALU? MUX ALU ALU 3X3 Delay Matrix for Stratix-EP1S40
14 Communication- and Interconnect-Centric Synthesis: Example: Use of Distributed Register-File Architectures Island C Island A Island B 2 3 Local Local Register Register File File Data-Routing Logic Input Buffers Binding using discrete registers FUP MUX Functional Unit Pool ALU MUL ALU A scheduled DFG with register binding indicated on each variable (assume one-functional unit constraint) Binding using a register file: more efficient design! Distributed register-file micro-architecture: Efficiently use on-chip embedded memories Fully explore operation and data-transfer transfer parallelism
15 Distributed Register-File Microarchitecture Island B Island A Local Local Register Register File File Data-Routing Logic Input Buffers On-chip memory blocks Island C FUP MUX Xilinx XC-2V 2000 Functional Unit Pool ALU MUL ALU Island A Island B Island C #18Kb BRAM Dist. RAM(Kb) , ,456 FP-SoC On-chip RAM resource on Virtex II
16 Resource Binding for DRF-Microarchitecture Intra-island transfers Island (Chain) v 1 A v 2 v 3 v 4 v 6 v 7 v 5 v 8 v 10 B C D Inter-island connections = 5 (A,B)=(A,D)=1 (A,C)=1, two data transfers share one connection (C,D)=2 Inter-island transfers v 9 Facts under simplified assumptions Operations bound onto an island form a chain in the given scheduled DFG Inter-chain data transfers may share a physical inter-island island connection The number of inter-island island connections (IIC) is crucial to the QoR of a DRFM instance
17 Example: Behavior and Communication Co-Optimization in Platform-Based Interface Synthesis Focus on sequential communication media (SCM) FIFOs (e.g., Xilinx FSLs), Buses (e.g., Xilinx CoreConnect.. Altera Avalon, etc.) Order may have dramatic impact on performance Best order should guarantee that no data transmission on critical l path are delayed by non-critical transmission Interface synthesis for SCM Consider both behavior and communication to determine the optimal l transmission order for (int i=0; i <8; i++) { S1: data[i] = ; } P1 C data[8] int s07 = data[0] + data[7]; Int s16 = data[1] + data[6];.. P2 Custom Logic 1 PE1 FIFO DCT example Custom logic 2 PE2
18 Proposed SCM Co-Optimization Design Flow Process Network Platform Description & Constraints Front End System-Level Synthesis Data Model SCOOP (SCM CO-Optimization) Optimization) Communication order detection Code transformation and interface generation Indices compression for loop reordering Drivers + Glue Logics Process Behavior
19 Initial Results of Interface Synthesis Target for sequential communication channels In particular, FSL in VirtexII Consider two communicating processes Total latency (Cycle#) RAs Compress Designs Trad. SCOOP Reduction Before After DCT % 0 0 Haar % 0 0 DWT % 0 0 Mat_mul % DCT % Masking % Dot % An average of 26% improvement in total latency can be achieved.
20 SystemC/C-to to-rtl Design Flow SystemC/C specification xpilot behavioral synthesis Front-end compiler SSDM (System-Level Synthesis Data Model) SSDM/CDFG Behavioral synthesis SSDM/FSMD RTL generation Platform description & constraints FSM with Datapath in VHDL Floorplan and/or multi- cycle path constraints RTL synthesis ASICs/FPGAs platform
21 Preliminary Results of xpilot Shorter Simulation/Verification Cycle From From other projects: Simulation speed on behavior model 100X faster than RTL-based method [NEC, ASPDAC04] Our Our experience: Motion-compensation module in a Mpeg4-decoder Behavior level (in C language) simulation Less than 1 second per frame RTL SystemC simulation About 310 second per frame
22 Preliminary Results of xpilot Better Complexity Management Significant code size reduction RTL design Behavioral design: 10x code size reduction VHDL code generated by UCLA xpilot targeting Altera Stratix platform
23 Preliminary Results of xpilot Rapid System Exploration Quick evaluation of different hardware/software boundaries Example: Motion-JPEG implementation -All HW implementation -All SW implementation (using embedded processors) -SW/HW co-design: optimal partitioning? -Repeated manual RTL coding is not solution!
24 Preliminary Results on Motion-JPEG Example Preprocess DCT Quant Huffman RAW Images Encoded JPEG Images OR Table Modification Preprocess HW-DCT Quant Huffman System Cycle# Table Modification Fmax (MHZ) Model #1 : 5 Microblazes FSL-based communication Model #2 : 4 Microblazes + DCT on FPGA fabrics Exe Time (ms) Area (Slice#) Model # Model #2 Xilinx XUP Board (-38%)
25 Preliminary Result of xpilot Better QoR (Comparison with UCI/UCSD SPARK) SPARK xpilot Delay Ratio Designs Slice Resource Usage Slice (LUT) Slice (FF) DSP Fmax (MHz) Slice Resource Usage Slice (LUT) Slice (FF) DSP Fmax (MHz) xpilot /SPARK PR WANG LEE MCM DIR Ave Ratio n/a Device setting: Xilinx Virtex-II pro (xc2v4000-6) Target frequency: 200 MHz
26 Outline Motivation xpilot system framework Behavior-level synthesis in xpilot Advantages of behavioral synthesis Scheduling Resource binding System-level synthesis in xpilot Synthesis for ASIP platforms Design exploration for heterogeneous MPSoCs Conclusions
27 Design Exploration for Heterogeneous MPSoC Platforms Heterogeneous MPSoCs exploration Processors Heterogeneous vs. homogeneous General-purpose vs. application-specific On-chip communication architecture (OCA) Bus (e.g. AMBA, CoreConnect), packet switching network (e.g. Alpha 21364) Memory hierarchy tasks µp µp OS Driver µp IP tasks µp µp OS µp Driver tasks µp OS Driver µp µp FPGA DSP Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Network Interface Communication Network
28 Configurable SoC Platforms General General purpose processor cores + programmable fabric Tight integration using extended instructions (ASIPs( ASIPs) Example: Altera Nios / Nios II Loose integration using FIFOs/busses for communications Example: Xilinx MicroBlaze, etc. Custom instruction logic for Nios II [source: Xilinx MicroBlaze [source:
29 ASIP Compilation: Problem Statement Given: CDFG G(V, E) The basic instruction set I Pattern constraints: Number of inputs PI(pi) Nin; Number of outputs PO(pi) = 1; 1 Total area Objective: 1 i N area( p ) < A Generate a pattern library P Map G to the extended instruction set I P,, so that the total execution time is minimized i t 1 = a * b; t 2 = b * c; ; t 3 = d * e; t 4 = t 1 + t 2 ; t 5 = t 2 + t 3 ; t 6 = t 5 + t 4 ; a b c d e * * + ext-inst 1 (MAC 1 : 2 cycles) t 4 = ext-inst 1 (a, b, c); t 5 = ext-inst 2 (b, c, d, e); t 6 = t 4 + t 5 ; Performance speedup = 9 / 5 = 1.8X t 4 t * ext-inst 2 (MAC 2 : 2 cycles) t 6 * 2 clock cycles + 1 clock cycle
30 Target Core Processor Model Core processor model Classic single-issue issue pipelined RISC core (fetch / decode / execute / mem / write-back) The number of input and output operands of an instruction is pre-determined An instruction reads the core register file during the execute stage, s and commits the result during the write-back stage PC 4 Adder Inst Cache IF / ID RS1 RS2 Reg File ID / EX OP 1 OP 2 ALU EX / MEM Memory MEM / WB MUX Core Processor Result Custom Logic
31 ASIP Compilation Flow C code Front-end compilation CDFG 3. Application mapping & Graph covering Optimized CDFG Backend compilation µarch constraint 1. Pattern generation 2. Pattern selection Pattern library Pattern Generation Satisfying input/output constraints Pattern Selection Select a subset to maximize the potential speedup while satisfying the resource constraint Application Mapping Graph covering to minimize the total execution time Optimized assembly
32 Experimental Results on Altera Nios Altera Nios is used for ASIP implementation 5 extended instruction formats up to 2048 instructions for each format Small DSP applications are taken as benchmark Extended Instruction# Speedup Estimation Nios LE Resource Overhead Memory DSP Block fft_br % 65, % 16 iir % 4, % 40 fir % 1, % 8 pr % % 14 dir % % 16 mcm % % 56 Average % % -
33 Architecture Extension for ASIPs Data bandwidth problem Limited register file bandwidth (two read ports, one write port) ~40% of the ideal performance speedup will be lost Shadow-register register-based architectural extension Core registers are augmented by an extra set of shadow registers Conditionally written during write-back stage Low power/area overhead Novel shadow-register binding algorithms are developed PC 4 Adder Inst Cache IF / ID RS1 RS2 Reg File ID / EX OP 1 OP 2 ALU EX / MEM Memory MEM / WB MUX Core Processor Result k = hash(j) Hashing Unit SR SR 11 SR SR K Custom Logic
34 Ongoing Work : Mapping for Heterogeneous Integration with Multiple Processing Cores Given: A library of processing cores P and communication library C Task graph G(V, E) For each v in V,, execution time t(v, p i ) on p i For each (u,( v) in E,, communication data size s(u,v) Throughput constraint Problem: Select and instantiate the processing elements and communication channels from P and C respectively Map the tasks onto the processing elements and communications to the channels so that The optimal latency is achieved subject to the throughput constraint The implementation cost is minimized
35 MPEG-4 4 Simple Profile Decoder: Architecture Profiling C specification overview Module Name Orig. C Source File Orig. C line # Copy Controller copycontrol.c 287 Display Controller displaycontrol.c 358 Runtime Profiling (PowerPC/XUP board) Parser/VLD 59.0% Motion Comp. Parser /VLD Motion- Compensation.c parser.c texture_vld.c Texture/IDCT Motion Comp. Copy Controller 18.1% 15.7% 3.6% Texture /IDCT Texture Update texture_idct.c textureupdate.c
36 MPEG-4 4 Simple Profile Decoder: Hyprid HW/SW Impmentation HW block Integrated with PowerPC single process design: Software blocks running on PowerPC 15% speed improvement
37 MPEG-4 4 Simple Profile Decoder: Alternate Implementations Single ublaze 7-uBlaze Single PowerPC Single PowerPC w/ HW Motion Comp. Throughput (Frame per Second) Improvement % % % xpilot Synthesis Report of HW blocks C Line counts RTL SystemC RTL VHDL Slices ( FFs, LUTs) MUL Clock period (ns) Latency (Cycles) Motion Comp (1111, 1017) Block IDCT (2376, 2438) Texture Update (1696, 1931)
38 Conclusions xpilot has fairly mature and advanced behavior synthesis capability ity from C or SystemC to RTL code with necessary design constraints xpilot advantages include Platform-based behavior and system synthesis Communication/interconnect-centric centric approach Advanced algorithms for platform-based, communication-centric centric optimization Promising results demonstrated on available FPGAs xpilot system synthesis capabilities Performance simulation of multi-processor systems Exploration the efficient use of (multiple) on-chip processors Compilation and optimization for reconfigurable processors
39 Acknowledgements We would like to thank the supports from Gigascale Systems Research Center (GSRC) National Science Foundation (NSF) Semiconductor Research Corporation (SRC) Industrial sponsors under the California MICRO programs (Altera, Xilinx) Team members: Yiping Fan Guoling Han Wei Jiang Zhiru Zhang
Prof. Jason Cong UCLA Computer Science Department. Advantages of behavioral synthesis Scheduling Resource binding
xpilot: A Platform-Based System-Level Synthesis for Reconfigurable SOCs Prof. Jason Cong cong@cs.ucla.edu UCLA Computer Science Department Outline Motivation xpilot system framework Behavior-level synthesis
More informationxpilot: A Platform-Based Behavioral Synthesis System
xpilot: A Platform-Based Behavioral Synthesis System Deming Chen, Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, Zhiru Zhang University of California, Los Angeles Email: {demingc, cong, fanyp, leohgl,
More informationPilot: A Platform-based HW/SW Synthesis System
Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationTowards Layout-Friendly High-Level Synthesis
Towards Layout-Friendly High-Level Synthesis Jason Cong Bin Liu Guojie Luo Raghu Prabhakar UCLA UCLA Peking University UCLA Outline High-level synthesis and layout-friendly architecture Evaluation of the
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More informationasoc: : A Scalable On-Chip Communication Architecture
asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationEmbedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory
Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationNISC Application and Advantages
NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical
More informationORCA FPGA- Optimized VectorBlox Computing Inc.
ORCA FPGA- Optimized 2016 VectorBlox Computing Inc. 1 ORCA FPGA- Optimized Tiny, Low-Power FPGA 3,500 LUT4s 4 MUL16s < $5.00 ISA: RV32IM hw multiply, sw divider < 2,000 LUTs ~ 20MHz What is ORCA? Family
More informationArchitecture-Level Synthesis for Automatic Interconnect Pipelining
Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu
More informationPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr
More informationHardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20
University of Pannonia Dept. Of Electrical Engineering and Information Systems Hardware Design MicroBlaze v.8.10 / v.8.20 Instructor: Zsolt Vörösházi, PhD. This material exempt per Department of Commerce
More informationHardware Software Co-design and SoC. Neeraj Goel IIT Delhi
Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationA Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific
More informationSoC Design for the New Millennium Daniel D. Gajski
SoC Design for the New Millennium Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski Outline System gap Design flow Model algebra System environment
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationPlatform-based Design
Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationEE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination
1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question
More informationDesign Space Exploration Using Parameterized Cores
RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE
More informationNANOMETER process technologies allow billions of transistors
550 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 4, APRIL 2004 Architecture and Synthesis for On-Chip Multicycle Communication Jason Cong, Fellow, IEEE, Yiping
More informationDigital Integrated Circuits
Digital Integrated Circuits Lecture 9 Jaeyong Chung Robust Systems Laboratory Incheon National University DIGITAL DESIGN FLOW Chung EPC6055 2 FPGA vs. ASIC FPGA (A programmable Logic Device) Faster time-to-market
More informationEarly Performance-Cost Estimation of Application-Specific Data Path Pipelining
Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel
More informationHardware/Software Co-design
Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction
More informationAnand Raghunathan
ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,
More informationA New Design Methodology for Composing Complex Digital Systems
A New Design Methodology for Composing Complex Digital Systems S. L. Chu* 1, M. J. Lo 2 1,2 Department of Information and Computer Engineering Chung Yuan Christian University Chung Li, 32023, Taiwan *slchu@cycu.edu.tw
More informationThe Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationA qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms. Kees Vissers.
A qualitative analysis of the benefits of LUTs, Processors, embedded memory and interconnect in MPSoC platforms Xilinx Research OUTLINE Historical Perspective Conventional FPGAs Applications and Programming
More informationArchitecture and Synthesis for Multi-Cycle Communication
Architecture and Synthesis for Multi-Cycle Communication Jason Cong, Yiping Fan, Xun Yang, Zhiru Zhang Computer Science Department University of California, Los Angeles Los Angeles CA 90095 USA {cong,
More informationL2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA
L2: FPGA HARDWARE 18-545: ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA 18-545: FALL 2014 2 Admin stuff Project Proposals happen on Monday Be prepared to give an in-class presentation Lab 1 is
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationMicroprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs
Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs Pieter Anemaet (1159100), Thijs van As (1143840) {P.A.M.Anemaet, T.vanAs}@student.tudelft.nl Computer Architecture (Special
More informationAccelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path
Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering
More informationSystem-on Solution from Altera and Xilinx
System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable
More informationVivado HLx Design Entry. June 2016
Vivado HLx Design Entry June 2016 Agenda What is the HLx Design Methodology? New & Early Access features for Connectivity Platforms Creating Differentiated Logic 2 What is the HLx Design Methodology? Page
More informationOverview of SOC Architecture design
Computer Architectures Overview of SOC Architecture design Tien-Fu Chen National Chung Cheng Univ. SOC - 0 SOC design Issues SOC architecture Reconfigurable System-level Programmable processors Low-level
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationTKT-2431 SoC design. Introduction to exercises. SoC design / September 10
TKT-2431 SoC design Introduction to exercises Assistants: Exercises and the project work Juha Arvio juha.arvio@tut.fi, Otto Esko otto.esko@tut.fi In the project work, a simplified H.263 video encoder is
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationToday. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses
Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationFPGA Based Digital Design Using Verilog HDL
FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology
More informationESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)
ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages
More informationAn FPGA based rapid prototyping platform for wavelet coprocessors
An FPGA based rapid prototyping platform for wavelet coprocessors Alonzo Vera a, Uwe Meyer-Baese b and Marios Pattichis a a University of New Mexico, ECE Dept., Albuquerque, NM87131 b FAMU-FSU, ECE Dept.,
More informationLecture 7: Introduction to Co-synthesis Algorithms
Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today
More informationLecture 21: High-level Synthesis (2)
Lecture 21: High-level Synthesis (2) Slides courtesy of Deming Chen Outline Binding for DFG Left-edge algorithm Network flow algorithm Binding to reduce interconnects Simultaneous scheduling and binding
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationFPGA for Software Engineers
FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationESL design with the Agility Compiler for SystemC
ESL design with the Agility Compiler for SystemC SystemC behavioral design & synthesis Steve Chappell & Chris Sullivan Celoxica ESL design portfolio Complete ESL design environment Streaming Video Processing
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationHigh-Level Power Estimation and Low-Power Design Space Exploration for FPGAs
High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs Deming Chen Department of ECE University of Illinois, Urbana-Champaign dchen@uiuc.edu Jason Cong, Yiping Fan, Zhiru Zhang Computer
More informationModeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors
Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)
More informationIntroduction to System-on-Chip
Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationPARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *
PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems
More informationMapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for
More informationFast dynamic and partial reconfiguration Data Path
Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationEmbedded Computing Platform. Architecture and Instruction Set
Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationDesign of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1
Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later
More informationSimultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture
Simultaneous Resource Binding and Interconnection Optimization Based on a Distributed Register-File Microarchitecture JASON CONG University of California, Los Angeles YIPING FAN AutoESL Inc. and JUNJUAN
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationIntellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus
Intellectual Property Macrocell for SpaceWire Interface Compliant with AMBA-APB Bus L. Fanucci, A. Renieri, P. Terreni Tel. +39 050 2217 668, Fax. +39 050 2217522 Email: luca.fanucci@iet.unipi.it - 1 -
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationTable 1: Example Implementation Statistics for Xilinx FPGAs
logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationDesign of Transport Triggered Architecture Processor for Discrete Cosine Transform
Design of Transport Triggered Architecture Processor for Discrete Cosine Transform by J. Heikkinen, J. Sertamo, T. Rautiainen,and J. Takala Presented by Aki Happonen Table of Content Introduction Transport
More informationMulti MicroBlaze System for Parallel Computing
Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need
More informationMath 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro
Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationCourse Overview Revisited
Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for
More informationProgrammable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures
Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional
More informationResource Efficiency of Scalable Processor Architectures for SDR-based Applications
Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationEmbedded System Design
Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner 9/29/2011 Outline System design trends Model-based synthesis Transaction level model generation Application
More informationKey technologies for many core architectures
Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationVector IRAM: A Microprocessor Architecture for Media Processing
IRAM: A Microprocessor Architecture for Media Processing Christoforos E. Kozyrakis kozyraki@cs.berkeley.edu CS252 Graduate Computer Architecture February 10, 2000 Outline Motivation for IRAM technology
More informationHardware-Software Codesign
Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual
More informationThe Growing Designer Productivity Gap
RAM Interface 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017 2021 CprE 488 Embedded Systems Design Lecture 2 Embedded Platforms The Growing Designer Productivity Gap Embedded systems today are characterized
More informationCprE 488 Embedded Systems Design. Lecture 2 Embedded Platforms
CprE 488 Embedded Systems Design Lecture 2 Embedded Platforms Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu Don t reinvent
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More information