Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Size: px
Start display at page:

Download "Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture"

Transcription

1 Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster with lower energy consumption Profile application to determine critical regions Dynamic Part. Module () Partition critical regions to hardware SW Only HW/SW 4 Program configurable logic & update software binary Roman Lysecky US Patent Pending, 4 / Time Energy Applications Fingerprint Detection SW/profiling (.s) Dynamic Partitioning (.s) HW/SW >X Potential (Currently X) 5 MHz Warp Processor 5 MHz Processor SW Only Execution Fingerprint DB (5,+) Standard binary - Separating Function and Architecture Software binaries of the past reflected specific language of underlying architecture limited portability Current standard binary Concept: separate function from detailed architecture Develop new architectures for existing applications Trend towards dynamic translation and optimization Expansion Ideally, improve performance by simply adding additional, similar to adding memory SW Standard Profiling Compiler x86 4 Execution Time (s) (CAD) Roman Lysecky / Roman Lysecky 4/ Why configurable logic (s)? C Code for Bit Reversal x = (x >>6) (x <<6); x = ((x >> 8) & xffff) ((x << 8) & xffff); x = ((x >> 4) & xffff) ((x << 4) & xffff); x = ((x >> ) & x) ((x << ) & xcccccccc); x = ((x >> ) & x ) ((x << ) & xaaaaaaaa); sll $v[],$v[],x srl $v[],$v[],x or $v[],$v[],$v[] srl $v[],$v[],x8 and $v[],$v[],$t5[] sll $v[],$v[],x8 and $v[],$v[],$t4[] or $v[],$v[],$v[] srl $v[],$v[],x4 and $v[],$v[],$t[] sll $v[],$v[],x4 and $v[],$v[],$t[]... Processor Hardware for Bit Reversal Bit Original Reversed X Value X Value Bit Reversed X Value Processor Traditional partitioning done here Dynamic HW/SW Partitioning SW Standard Profiling Compiler CAD Profiling Tools CAD Profiling Tools Proc. Dynamic HW/SW Partitioning Enabler Synthesis from Binaries [Stitt & Vahid, 5][Stitt & Vahid, ] Advantages Does not require any special compilers Completely transparent Provides separation of function and architecture for architectures incorporating s Avoid complexities of supporting different s Opens additional market segments (i.e., all software developers) that otherwise would not use s and CAD Requires between and 8 cycles Requires only cycle (speedup of x to 8x) Roman Lysecky 5/ Roman Lysecky 6/

2 () Warp Processor Tools (CAD) Updater Partitioning Decompilation RT Synthesis Std. HW Existing s Not Suitable for Existing s require extremely complex CAD tools Designed to handle large arbitrary circuits, ASIC prototyping, etc. Require long execution times and very large memory usage Not suitable for dynamic on-chip execution min MB min MB - mins - mins 5 MB 6 MB *My Research Focus Updated HW Bitstream Roman Lysecky 7/ Roman Lysecky 8/ CAD-Oriented Solution: Develop a custom CAD-oriented Careful simultaneous design of and CAD features evaluated for impact on CAD Add architecture features for SW kernels Enables development of fast, lean compilation tools s <s <s MB MB s.6 MB Updater Updated Partitioning Decompilation RT Synthesis Std. HW HW Bitstream Warp Configurable Logic Architecture () Warp Configurable Logic Architecture () Need a fast, efficient coprocessor interface Analyzed digital signal processors (DSP) and existing coprocessors Data address generators (DADG) and Loop control hardware (LCH) Provide fast loop execution Supports memory accesses with regular access pattern Integrated -bit multiplier-accumulator (MAC) Frequently found in within critical SW kernels Fast, single-cycle multipliers are large and require many interconnections ARM DADG & LCH Reg Reg -bit MAC Reg Configurable Logic Fabric Roman Lysecky 9/ Roman Lysecky A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 / - Configurable Logic Fabric - Combinational Logic Block Configurable Logic Fabric (CLF) Hundreds of existing commercial and research fabrics Most designed to balance circuit density and speed Analyzed s features to determine their impact of CAD Designed our CLF in conjunction with compilation tools Array of configurable logic blocks (s) surrounded by switch matrices (s) is directly connected to a Along with design, allows for design of lean JIT routing DADG LCH -bit MAC Configurable Logic Fabric s Flexibility/Density: Large s, various internal routing resources Combinational Logic Block Simplicity: Limited internal routing, reduce on-chip CAD complexity Incorporate two -input -output LUTs a b c Equivalent to four -input LUTs with fixed internal routing Allows for good quality circuit while reducing JIT technology mapping complexity Provide routing resources between adjacent s to support carry chains Reduces number of nets we need to route Adj. LUT o o d e f LUT o o4 Adj. A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 Roman Lysecky A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 / Roman Lysecky /

3 - Switch Matrix s Flexibility/Speed: Large routing resources, various routing options Switch Matrix Simplicity: Allow for design of fast, lean routing algorithm L L L L All nets are routed using only a single pair of channels throughout the configurable logic fabric Each short channel is associated with single long channel Designed for fast, lean routing L L L L L L L L L L L L (CAD) Updater Updated Partitioning Decompilation RT Synthesis Std. HW HW Bitstream A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 Roman Lysecky 4/ Roman Lysecky / ROCM Riverside On-Chip Minimizer ROCM - Riverside On-Chip Minimizer Two-level minimization tool Utilized a combination of approaches from Espresso-II [Brayton, et al., 984][Hassoun & Sasoa, ] and Presto [Svoboda & White, 979] Utilizes a single expand phase instead of multiple iterations Eliminate the need to compute the off-set to reduce memory usage On average only % larger than optimal solution - Results min MB min MB - mins - mins 5 MB 6 MB Expand s Reduce on-set dc-set off-set MB Irredundant On-Chip Logic Minimization, DAC Roman Lysecky 5/ A Codesigned On-Chip Logic Minimizer, CODES+ISSS On-Chip Logic Minimization, DAC Roman Lysecky 6/ A Codesigned On-Chip Logic Minimizer, CODES+ISSS ROCTM Riverside On-Chip Technology Mapper ROCTM - Technology Mapping/Packing Decompose hardware circuit into DAG Nodes correspond to basic -input logic gates (AND, OR, XOR, etc.) Hierarchical bottom-up graph clustering algorithm Breadth-first traversal combining nodes to form single-output LUTs Combine LUTs with common inputs to form final -output LUTs Pack LUTs in which output from one LUT is input to second LUT - Results min MB min MB - mins - mins 5 MB 6 MB s <s MB Dynamic Hardware/Software Partitioning: A First Approach, DAC Roman Lysecky 7/ A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 Dynamic Hardware/Software Partitioning: A First Approach, DAC Roman Lysecky 8/ A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4

4 ROCPLACE Riverside On-Chip r ROCPLACE - Dependency-based positional placement algorithm Identify critical path, placing critical nodes in center of CLF Use dependencies between remaining s to determine placement Attempt to use adjacent routing whenever possible - Results min MB min MB - mins - mins 5 MB 6 MB s <s <s MB MB Dynamic Hardware/Software Partitioning: A First Approach, DAC Roman Lysecky 9/ A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 Dynamic Hardware/Software Partitioning: A First Approach, DAC Roman Lysecky / A Configurable Logic Fabric for Dynamic Hardware/Software Partitioning, DATE 4 ROCR Riverside On-chip r Find a path within to connect source and sinks of each net within our hardware circuit Pathfinder [Ebeling, et al., 995] Introduced negotiated congestion During each routing iteration, route nets using shortest path Allows overuse (congestion) of resources If congestion exists (illegal routing) Update cost of congested resources Rip-up all routes and reroute all nets VPR [Betz, et al., 997] Increased performance over Pathfinder Routability-driven: Use fewest tracks possible Timing-driven: Optimize circuit speed Many techniques are used in commercial CAD tools congestion ROCR - Riverside On-Chip r Resource Graph Nodes correspond to s Edges correspond to channels between s Capacity of edge equal to the number of wires within the channel Requires much less memory than VPR as resource graph is smaller Produces circuits with critical path % shorter than VPR (RD) Rip-up yes illegal? no Done! Resource Resource Graph Graph Roman Lysecky / Roman Lysecky Dynamic for Just-in-Time, DAC 4 / ROCR - Memory Usage ROCR - Algorithm Performance VPR requires over 5MB of memory with an average of over MB ROCR requires at most.6 MB VPR requires up to 6X more memory ROCR is on average X faster than VPR (TD) Up to X faster for ex5p Memory Usage (KB) V PR (RD) VPR (TD) ROCR Execution Time (s) VPR (TD) ROCR alu4 apex apex4 bigkey des diffeq dsip e64 elliptic ex5p frisc misex s4 Benchmark s98 s847 s8584. seq tseng Average alu4 apex apex4 bigkey des diffeq dsip e64 elliptic ex5p frisc misex s4 s98 s847 s8584. Benchmark seq tseng Average Dynamic for Just-in-Time, DAC 4 Roman Lysecky Dynamic for Just-in-Time, DAC 4 4/ Roman Lysecky / 4

5 - Results Experimental Setup s <s <s MB min MB MB min MB s.6 MB - mins - mins 5 MB 6 MB Warp Processor MHz ARM7 processor Configurable logic fabric with maximum frequency of 5 MHz Used dynamic on-chip CAD tools to map critical region to hardware Requires less than seconds to perform synthesis and compilation Traditional HW/SW Partitioning MHz ARM7 processor Xilinx Virtex-E (executing at maximum possible speed) Manually partitioned software using VHDL VHDL synthesized using Xilinx ISE 4. on desktop ARM7 ARM7 Xilinx Virtex-E Dynamic for Just-in-Time, DAC 4 Roman Lysecky 6/ Roman Lysecky 5/ Performance Speedup (Critical Region, Single Kernel) Performance Speedup (Overall, Multiple Kernels) Speedup brev Average critical region speedup of 4 vs. for Virtex-E 9 Warp Proc. Xilinx Virtex-E gfax url rocm pktflow canrdr bitmnp tblook ttsprk matrix idct g7 mpeg fir matmul Average: simplicity results in faster HW circuits SW Only Execution Speedup brev gfax Average speedup of 7.4 Energy reduction of 8% - 94% Warp Proc. url rocm pktflow canrdr bitmnp tblook ttsprk matrix idct g7 mpeg fir matmul Average: SW Only Execution Roman Lysecky 7/ Roman Lysecky 8/. s.6mb - Results (CAD) (CAD) (75MHz ARM7) Xilinx ISE 9. s 6 MB Conclusions Developed Dynamically and transparently re-implements SW kernel as HW implemented using on-chip Developed Warp Configurable Logic Architecture Designed specifically to allow development of lean on-chip CAD tools Developed fast, lean on-chip compilation tools Requires order of magnitude less memory requirements and execution time, capable of on-chip execution Speedups are significant Average speedups of 7.4X Speedup of over X possible for many applications Energy reduction of 8% to 94%.4s.6MB Roman Lysecky 9/ Roman Lysecky / 5

6 Future Directions Patents & Publications Extend to desktop/sever/pda domains Increase parallelism within Efficient memory/data reuse methods Development of a standard HW binary Support more complex architectures High Performance HW/SW Partitioning Operating system aware HW/SW partitioning HW/SW partitioning must be tightly integrated with OS What OS support is required for HW/SW partitioning and warp processing? Low Power Design Dynamic power management within s Requires development of new architectures and CAD tools Patents F. Vahid, R. Lysecky, G. Stitt. Warp Processor for Dynamic Hardware/Software Partitioning. US Patent Pending, 4. Publications R. Lysecky, F. Vahid, S. Tan. A Study of the Scalability of On-Chip for Justin-Time. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 5. R. Lysecky, F. Vahid. A Study of the Speedups and Competitiveness of Soft Processor Cores using Dynamic Hardware/Software Partitioning. Design Automation and Test in Europe Conference (DATE), 5. R. Lysecky, F. Vahid, S. Tan. Dynamic for Just-in-Time. Design Automation Conference (DAC), 4. R. Lysecky, F. Vahid. A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning. Design Automation and Test in Europe Conference (DATE), 4. R. Lysecky, F. Vahid. On-Chip Logic Minimization. Design Automation Conference (DAC),. G. Stitt, R. Lysecky, F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC),. Roman Lysecky / Roman Lysecky / 6

Warp Processors (a.k.a. Self-Improving Configurable IC Platforms)

Warp Processors (a.k.a. Self-Improving Configurable IC Platforms) (a.k.a. Self-Improving Configurable IC Platforms) Frank Vahid Department of Computer Science and Engineering University of California, Riverside Faculty member, Center for Embedded Computer Systems, UC

More information

[Processor Architectures] [Computer Systems Organization]

[Processor Architectures] [Computer Systems Organization] Warp Processors ROMAN LYSECKY University of Arizona and GREG STITT AND FRANK VAHID* University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine We describe a new

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Software consists of bits downloaded into a

Software consists of bits downloaded into a C O V E R F E A T U R E Warp Processing: Dynamic Translation of Binaries to FPGA Circuits Frank Vahid, University of California, Riverside Greg Stitt, University of Florida Roman Lysecky, University of

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid To cite this version: Roman Lysecky, Frank Vahid. A Study

More information

A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Abstract Keywords 1. Introduction

A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Abstract Keywords 1. Introduction A Configurable Logic Architecture for Dynamic Hardware/Software artitioning Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration

Christophe HURIAUX. Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Mid-term Evaluation March 19 th, 2015 Christophe HURIAUX Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Accélérateurs matériels reconfigurables embarqués avec reconfiguration

More information

Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning

Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning Scalability and Parallel Execution of Warp Processing - Dynamic Hardware/Software Partitioning Roman Lysecky Department of Electrical and Computer Engineering University of Arizona rlysecky@ece.arizona.edu

More information

A Codesigned On-Chip Logic Minimizer

A Codesigned On-Chip Logic Minimizer A Codesigned On-Chip Logic Minimizer Roman Lysecky, Frank Vahid* Department of Computer Science and ngineering University of California, Riverside {rlysecky, vahid}@cs.ucr.edu *Also with the Center for

More information

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Ashokkumar A, Dr. Niranjan N Chiplunkar, Vinay S Abstract The Auction Based methodology for routing of 3D FPGA (Field Programmable

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Hardware JIT Compilation for Off-the-Shelf Dynamically Reconfigurable FPGAs

Hardware JIT Compilation for Off-the-Shelf Dynamically Reconfigurable FPGAs Hardware JIT Compilation for Off-the-Shelf Dynamically Reconfigurable FPGAs Etienne Bergeron, Marc Feeley, and Jean Pierre David DIRO, Université demontréal GRM, École Polytechnique de Montréal {bergeret,feeley}@iro.umontreal.ca,

More information

Custom Sensor-Based Embedded Computing Systems. Frank Vahid

Custom Sensor-Based Embedded Computing Systems. Frank Vahid Custom Sensor-Based Embedded Computing Systems Frank Vahid Professor Dept. of Computer Science and Engineering University of California, Riverside Assoc. Director, Center for Embedded Computer Systems,

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003 Title Reconfigurable Logic and Hardware Software Codesign Class EEC282 Author Marty Nicholes Date 12/06/2003 Abstract. This is a review paper covering various aspects of reconfigurable logic. The focus

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

General Terms Design, Performance. Keywords Hardware/software partitioning, floating point to fixed conversion, floating point, fixed point.

General Terms Design, Performance. Keywords Hardware/software partitioning, floating point to fixed conversion, floating point, fixed point. Hardware/Software Partitioning of Floating Point Software Applications to Fixed-Point Coprocessor Circuits Lance Saldanha, Roman Lysecky Department of Electrical and Computer Engineering University of

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs

Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs Hardware JIT compilation for off-the-shelf dynamically reconfigurable FPGAs Etienne Bergeron, Marc Feeley, Jean Pierre David {bergeret,feeley}@iro.umontreal.ca, jpdavid@polymtl.ca DIRO, Université de Montréal

More information

Variation Aware Routing for Three-Dimensional FPGAs

Variation Aware Routing for Three-Dimensional FPGAs Variation Aware Routing for Three-Dimensional FPGAs Chen Dong, Scott Chilstedt, and Deming Chen Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign {cdong3, chilste1,

More information

Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms

Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms 18 Static and Dynamic Memory Footprint Reduction for FPGA Routing Algorithms SCOTT Y. L. CHIN and STEVEN J. E. WILTON University of British Columbia This article presents techniques to reduce the static

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques

Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques Accelerating FPGA Routing Using Architecture-Adaptive A* Techniques Akshay Sharma Actel Corporation Mountain View, CA 9443, USA Akshay.Sharma@actel.com Scott Hauck University of Washington Seattle, WA

More information

FPGA architecture and design technology

FPGA architecture and design technology CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA

More information

Memory Footprint Reduction for FPGA Routing Algorithms

Memory Footprint Reduction for FPGA Routing Algorithms Memory Footprint Reduction for FPGA Routing Algorithms Scott Y.L. Chin, and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C., Canada email:

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

FPGAs & Multi-FPGA Systems. FPGA Abstract Model. Logic cells imbedded in a general routing structure. Logic cells usually contain:

FPGAs & Multi-FPGA Systems. FPGA Abstract Model. Logic cells imbedded in a general routing structure. Logic cells usually contain: s & Multi- Systems Fit logic into a prefabricated system Fixed inter-chip routing Fixed on-chip logic & routing XBA Partitioning Global outing Technology Map. XBA XBA Placement outing 23 Abstract Model

More information

CAD Flow for FPGAs Introduction

CAD Flow for FPGAs Introduction CAD Flow for FPGAs Introduction What is EDA? o EDA Electronic Design Automation or (CAD) o Methodologies, algorithms and tools, which assist and automatethe design, verification, and testing of electronic

More information

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA 1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

TROUTE: A Reconfigurability-aware FPGA Router

TROUTE: A Reconfigurability-aware FPGA Router TROUTE: A Reconfigurability-aware FPGA Router Karel Bruneel and Dirk Stroobandt Hardware and Embedded Systems Group, ELIS Dept., Ghent University, Sint-Pietersnieuwstraat 4, B-9000 Gent, Belgium {karel.bruneel;dirk.stroobandt}@ugent.be

More information

MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER

MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER MAXIMIZING THE REUSE OF ROUTING RESOURCES IN A RECONFIGURATION-AWARE CONNECTION ROUTER Elias Vansteenkiste, Karel Bruneel and Dirk Stroobandt Department of Electronics and Information Systems Ghent University

More information

Timing Optimization of FPGA Placements by Logic Replication

Timing Optimization of FPGA Placements by Logic Replication 13.1 Timing Optimization of FPGA Placements by Logic Replication Giancarlo Beraudo ECE Department, University of Illinois at Chicago 851 S. Morgan St., Chicago IL, 60607 gberaudo@ece.uic.edu John Lillis

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Runtime and Quality Tradeoffs in FPGA Placement and Routing

Runtime and Quality Tradeoffs in FPGA Placement and Routing Runtime and Quality Tradeoffs in FPGA Placement and Routing Chandra Mulpuri Department of Electrical Engineering University of Washington, Seattle, WA 98195, USA chandi@ee.washington.edu Scott Hauck Department

More information

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction 44.1 Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures *

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures * Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt Center for Embedded Computer Systems

More information

Development of a Design Framework for Platform-Independent Networked Reconfiguration of Software and Hardware

Development of a Design Framework for Platform-Independent Networked Reconfiguration of Software and Hardware Development of a Design Framework for Platform-Independent Networked Reconfiguration of Software and Hardware Yajun Ha 12, Bingfeng Mei 12, Patrick Schaumont 1, Serge Vernalde 1, Rudy Lauwereins 1, and

More information

Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure

Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure Greg Stitt, Zhi Guo, Frank Vahid*, Walid Najjar Department of Computer Science and Engineering University of California, Riverside

More information

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, RS - Brazil

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ABSTRACT As Field-Programmable Gate Array (FPGA) power consumption continues to increase, lower power FPGA circuitry, architectures, and Computer-Aided

More information

FPGA Power Reduction Using Configurable Dual-Vdd

FPGA Power Reduction Using Configurable Dual-Vdd FPGA Power Reduction Using Configurable Dual-Vdd 45.1 Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA {feil, ylin, lhe}@ee.ucla.edu ABSTRACT Power

More information

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose Right Track CAD Corp. #313-72 Spadina Ave. Toronto, ON, Canada M5S 2T9 {arm, vaughn,

More information

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5) EN2911X: Lecture 13: Design Flow: Physical Synthesis (5) Prof. Sherief Reda Division of Engineering, rown University http://scale.engin.brown.edu Fall 09 Summary of the last few lectures System Specification

More information

Statistical Analysis and Design of HARP Routing Pattern FPGAs

Statistical Analysis and Design of HARP Routing Pattern FPGAs Statistical Analysis and Design of HARP Routing Pattern FPGAs Gang Wang Ý, Satish Sivaswamy Þ, Cristinel Ababei Þ, Kia Bazargan Þ, Ryan Kastner Ý and Eli Bozorgzadeh ÝÝ Ý Dept. of ECE Þ ECE Dept. ÝÝ Computer

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

Timing-Driven Placement for FPGAs

Timing-Driven Placement for FPGAs Timing-Driven Placement for FPGAs Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose 1 {arm, vaughn, jayar}@rtrack.com Right Track CAD Corp., Dept. of Electrical and Computer Engineering, 720

More information

Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs

Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs David Sheldon Department of Computer Science and Engineering, UC Riverside dsheldon@cs.ucr.edu Frank Vahid Department

More information

Research Article FPGA Interconnect Topologies Exploration

Research Article FPGA Interconnect Topologies Exploration International Journal of Reconfigurable Computing Volume 29, Article ID 259837, 13 pages doi:1.1155/29/259837 Research Article FPGA Interconnect Topologies Exploration Zied Marrakchi, Hayder Mrabet, Umer

More information

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical & Computer Engineering

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation

Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Cross-layer Optimized Placement and Routing for FPGA Soft Error Mitigation Keheng Huang Yu Hu Xiaowei Li Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

Challenges of FPGA Physical Design

Challenges of FPGA Physical Design Challenges of FPGA Physical Design Larry McMurchie 1 and Jovanka Ciric Vujkovic 2 1 Principal Engineer, Solutions Group, Synopsys, Inc., Mountain View, CA, USA 2 R&D Manager, Solutions Group, Synopsys,

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Stratix vs. Virtex-II Pro FPGA Performance Analysis

Stratix vs. Virtex-II Pro FPGA Performance Analysis White Paper Stratix vs. Virtex-II Pro FPGA Performance Analysis The Stratix TM and Stratix II architecture provides outstanding performance for the high performance design segment, providing clear performance

More information

mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration

mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration mrfpga: A Novel FPGA Architecture with Memristor-Based Reconfiguration Jason Cong Bingjun Xiao Department of Computer Science University of California, Los Angeles {cong, xiao}@cs.ucla.edu Abstract In

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson IMPROVING LOGIC DENITY THROUGH YNTHEI-INPIRED ARCHITECTURE Jason H. Anderson Dept. of ECE, Univ. of Toronto Toronto, ON Canada email: janders@eecg.toronto.edu ABTRACT We leverage properties of the logic

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

An LP-based Methodology for Improved Timing-Driven Placement

An LP-based Methodology for Improved Timing-Driven Placement An LP-based Methodology for Improved Timing-Driven Placement Qingzhou (Ben) Wang, John Lillis and Shubhankar Sanyal Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {qwang,

More information

According to the Moore s law, the number of transistors. Parallel FPGA Router using Sub-Gradient method. Steiner tree.

According to the Moore s law, the number of transistors. Parallel FPGA Router using Sub-Gradient method. Steiner tree. 1 Parallel FPGA Router using Sub-Gradient method and Steiner tree Rohit Agrawal, Chin Hau Hoo, Kapil Ahuja, and Akash Kumar arxiv:1803.03885v2 [cs.dc] 19 Aug 2018 Abstract In the FPGA (Field Programmable

More information

A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes

A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes Presented at Design Automation Conference (DAC) San Francisco, CA, June 4, 2012. Presented by Chuck Cruse FPGA Hardware

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform Design of Transport Triggered Architecture Processor for Discrete Cosine Transform by J. Heikkinen, J. Sertamo, T. Rautiainen,and J. Takala Presented by Aki Happonen Table of Content Introduction Transport

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput Reduce FPGA Power With Automatic Optimization & Power-Efficient Design Vaughn Betz & Sanjay Rajput Previous Power Net Seminar Silicon vs. Software Comparison 100% 80% 60% 40% 20% 0% 20% -40% Percent Error

More information

On pin-to-wire routing in FPGAs. Niyati Shah

On pin-to-wire routing in FPGAs. Niyati Shah On pin-to-wire routing in FPGAs by Niyati Shah A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical & Computer

More information

Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs

Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs Fault-Free: A Framework for Supporting Fault Tolerance in FPGAs Kostas Siozios 1, Dimitrios Soudris 1 and Dionisios Pnevmatikatos 2 1 School of Electrical & Computer Engineering, National Technical University

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunat 2 Electrical Engineering Dept., UCLA, Los Angeles, CA 1 Purdue

More information

Pricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation

Pricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation Pricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation Prof. Dr. Joachim K. Anlauf Universität Bonn Institut für Informatik II Technische Informatik Römerstr. 164 53117 Bonn E-Mail: anlauf@informatik.uni-bonn.de

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Fault Grading FPGA Interconnect Test Configurations

Fault Grading FPGA Interconnect Test Configurations * Fault Grading FPGA Interconnect Test Configurations Mehdi Baradaran Tahoori Subhasish Mitra* Shahin Toutounchi Edward J. McCluskey Center for Reliable Computing Stanford University http://crc.stanford.edu

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

FPGA design with National Instuments

FPGA design with National Instuments FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software

More information

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two major languages Verilog (IEEE 1364), latest version is

More information

Measuring and Utilizing the Correlation Between Signal Connectivity and Signal Positioning for FPGAs Containing Multi-Bit Building Blocks

Measuring and Utilizing the Correlation Between Signal Connectivity and Signal Positioning for FPGAs Containing Multi-Bit Building Blocks Measuring and Utilizing the Correlation Between Signal Connectivity and Signal Positioning for FPGAs Containing Multi-Bit Building Blocks Andy Ye and Jonathan Rose The Edward S. Rogers Sr. Department of

More information

An efficient FPGA priority queue implementation with application to the routing problem

An efficient FPGA priority queue implementation with application to the routing problem An efficient FPGA priority queue implementation with application to the routing problem Joseph Rios Technical Report ucsc-crl-07-01 Department of Computer Engineering School of Engineering University of

More information

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth

More information

An Efficient FPGA Overlay for Portable Custom Instruction Set Extensions

An Efficient FPGA Overlay for Portable Custom Instruction Set Extensions An Efficient FPGA Overlay for Portable Custom Instruction Set Extensions Dirk Koch,, Christian Beckhoff, and Guy G. F. Lemieux Dept. of Informatics, University of Oslo, Norway, Dept. of ECE, UBC Vancouver,

More information

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011 FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level

More information

State of the art. 2.1 Introduction to FPGAs

State of the art. 2.1 Introduction to FPGAs 2 State of the art 2.1 Introduction to FPGAs A Field programmable Gate Array (FPGA) is an integrated circuit that is designed to be configured after manufacturing. FPGAs can be used to implement any logic

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Logic Design Process Combinational logic networks Functionality. Other requirements: Size. Power. Primary inputs Performance.

More information

System-on Solution from Altera and Xilinx

System-on Solution from Altera and Xilinx System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable

More information

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs .1 Fast Timing-driven Partitioning-based Placement for Island Style FPGAs Pongstorn Maidee Cristinel Ababei Kia Bazargan Electrical and Computer Engineering Department University of Minnesota, Minneapolis,

More information

THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS

THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS Chi Wai Yu 1, Julien Lamoureux 2, Steven J.E. Wilton 2, Philip H.W. Leong 3, Wayne Luk 1 1 Dept

More information

HW SW Partitioning. Reading. Hardware/software partitioning. Hardware/Software Codesign. CS4272: HW SW Codesign

HW SW Partitioning. Reading. Hardware/software partitioning. Hardware/Software Codesign. CS4272: HW SW Codesign CS4272: HW SW Codesign HW SW Partitioning Abhik Roychoudhury School of Computing National University of Singapore Reading Section 5.3 of textbook Embedded System Design Peter Marwedel Also must read Hardware/software

More information

Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion

Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion Minghua Shen and Guojie Luo Peking University FPGA-February 23, 2017 1 Contents Motivation Background Search Space Reduction for

More information