Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

Similar documents
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

INTRODUCTION TO FPGA ARCHITECTURE

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology

Lecture 7: Introduction to Co-synthesis Algorithms

FPGA: What? Why? Marco D. Santambrogio

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes

Hardware Software Codesign of Embedded Systems

Reconfigurable Computing. Introduction

New development within the FPGA area with focus on soft processors

Advanced FPGA Design Methodologies with Xilinx Vivado

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Hardware Software Codesign of Embedded System

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

Introduction to Embedded Systems

Digital Electronics 27. Digital System Design using PLDs

CS310 Embedded Computer Systems. Maeng

Very Large Scale Integration (VLSI)

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Spiral 2-8. Cell Layout

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

EN2911X: Reconfigurable Computing Lecture 01: Introduction

ASIC, Customer-Owned Tooling, and Processor Design

Design Methodologies. Full-Custom Design

FPGAs: FAST TRACK TO DSP

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Overview of Microcontroller and Embedded Systems

RED: A Reconfigurable Datapath

The QR code here provides a shortcut to go to the course webpage.

Hardware-Software Codesign. 1. Introduction

Software consists of bits downloaded into a

Co-synthesis and Accelerator based Embedded System Design

Design Methodologies and Tools. Full-Custom Design

A Time-Multiplexed FPGA

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

Embedded System Design

Workspace for '4-FPGA' Page 1 (row 1, column 1)

COE 561 Digital System Design & Synthesis Introduction

Graduate course on FPGA design

A Hardware / Software Co-Design System using Configurable Computing Technology

ECE 448 Lecture 15. Overview of Embedded SoC Systems

The Design of Mixed Hardware/Software Systems

Design Space Exploration Using Parameterized Cores

8-Bit Microcontroller with Flash. Application Note. Controlling FPGA Configuration with a Flash-Based Microcontroller

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

FPGA architecture and design technology

RTL Coding General Concepts

Master of Engineering Preliminary Thesis Proposal For Prototyping Research Results. December 5, 2002

FPGA VHDL Design Flow AES128 Implementation

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

Chapter 5: ASICs Vs. PLDs

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

High Speed Pipelined Architecture for Adaptive Median Filter

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

Electrical Engineering and Computer Sciences (EECS)

Hardware JIT Compilation for Off-the-Shelf Dynamically Reconfigurable FPGAs

Introduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Introduction to Modern FPGAs

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Run-time reconfiguration for automatic hardware/software partitioning

Boost FPGA Prototype Productivity by 10x

LSN 6 Programmable Logic Devices

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems

Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder

Programmable Logic Devices

Systems Development Tools for Embedded Systems and SOC s

Pricing of Derivatives by Fast, Hardware-Based Monte-Carlo Simulation

Modeling HDL components for FPGAs in control applications

The Embedded computing platform. Four-cycle handshake. Bus protocol. Typical bus signals. Four-cycle example. CPU bus.

Employing Multi-FPGA Debug Techniques

The SOCks Design Platform. Johannes Grad

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

SP3Q.3. What makes it a good idea to put CRC computation and error-correcting code computation into custom hardware?

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

SigmaRAM Echo Clocks

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Programmable Logic Devices UNIT II DIGITAL SYSTEM DESIGN

Digital Systems Design. System on a Programmable Chip

Implementing the Top Five Control-Path Applications with Low-Cost, Low-Power CPLDs

FPGA Technology and Industry Experience

EITF35: Introduction to Structured VLSI Design

FPGA Implementation of MIPS RISC Processor

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Hardware/Software Codesign

Long Term Trends for Embedded System Design

Parallelized Radix-4 Scalable Montgomery Multipliers

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017

Transcription:

Title Reconfigurable Logic and Hardware Software Codesign Class EEC282 Author Marty Nicholes Date 12/06/2003 Abstract. This is a review paper covering various aspects of reconfigurable logic. The focus is on hardware that assists a general purpose processor. Some of the solutions discussed reconfigure dynamically, while others are more static in nature. Reconfigurable computing, when done dynamically is the pinnacle of success in this field. This would allow hardware that could be customized on the fly to the task that needs to be performed. Some discussion of static reconfiguration is important, because this work provides the foundational algorithms needed to efficiently configure the hardware to the task. The main paper referenced is [a], which describes Programmable Active Memories. The work described has become the basis for much subsequent research. 1 Introduction Reconfigurable logic is critical to the issues of hardware/software codesign. The main reason is that it is this logic that is either used to prototype an ASIC solution, or to implement the final hardware assist solution. The area of reconfigurable logic spans a variety of implementation techniques. One scale to measure the techniques is the frequency of reconfiguration. [f] describes a fanciful handheld device that is able to reconfigure on the fly for new network protocols, or security algorithms, and even deconfigure unused logic to save power. Although the article does not describe any unique research, it does provide a vivid picture of future uses of reconfigurable logic. Starting with the low frequency of reconfiguration is the traditional use of the FPGA in the role of assist hardware, where reconfigurations are only to fix bugs or slightly enhance functionality. Next comes hardware that is configured for a particular program, and is dedicated to that application. Use of the FPGA as a prototyping vehicle for ASIC development falls in this category as well. The next area is the area of faulttolerant hardware which can be reconfigured during operation to replace a defective part of the hardware. This is also called enbryonics, due to the fact that the hardware is structured like living cells [e]. [a] reviews the PAM hardware system that can be reconfigured for each application. The paper asserts that a PAM could be time-shared among 12 applications, but there is no supporting evidence for this claim. The final stage is reached with evolvable systems where the hardware can be reconfigured while it is being utilized. This will require many aspects of the various hardware systems described in this paper to work perfectly. This final stage exhibits the most important benefits of reconfigurable logic, including flexibility in the purposing of the hardware resources available. This flexibility allows the system to make runtime adjustments to hardware to tradeoff the critical factors of performance and power consumption. As [f] describes, the

Reconfigurable Logic and Hardware Software Codesign Page : 2 move is to mobile computing, and reconfigurable logic has much to offer in this area. The rest of the paper is organized as follows. Section 2 provides a quick overview of the FPGA. Section 3 covers the various techniques used to provide reconfigurable logic. Section 4 describes some of the applications for these techniques. Section 5 raises the issues preventing faster progress in this field. Section 6 is the conclusion. 2 FPGA Basic Building Block The field programmable gate array is the hardware basis for most of the papers. This chip contains logic that can be wired up on the fly in order to implement a design. The work being performed in this area utilizes the FPGAs in a variety of ways in the associated configurable hardware subsystem. FPGAs have good characteristics for this application. They are flexible, and of course, reprogrammable. The drawbacks of FPGAs are that they can be slow to reconfigure, they are expensive, and they require a complex tool chain to calculate the bitstream required to reconfigure an FPGA. The solutions discussed in the next section have interesting ways to work around these limitations. 3 Reconfiguration Techniques Reconfigurable logic requires some common steps: 1) profile the software, 2) find interesting code sections, 3) implement interesting code in hardware, 4) modify the software to use the hardware, and 5) run the partitioned system. Of course, many decisions about the amount of hardware resources to make available for reconfiguration must be made. In [a], the PAM prototype P1, uses a driver to provide access to hardware reconfiguration. A 1.5 Mb bitstream reconfigures the hardware. The hardware consists of 23 Xilinx FPGAs (5 switch FPGAs, 2 controller FPGAs, 16 FPGAs in a matrix), 4 blocks of SRAM @ 1MB each,, and 2 FIFOs. Figure 1 shows the structure of the P1 design. Figure 1 P1 PAM Design The programming language chosen was C++ with enhancements to describe nets. A simulation environment was also available. Results showed that the tools allowed non-ee students to successfully use the toolchain in a few weeks, compared with similar results with ASICs requiring highly skilled engineers. The main design guideline described in [a] is: cast the inner loop in PAM hardware; let the software handle the rest! Figure 2 Dynamic HW/SW Partitioning System Architecture [c] [c] describes a self-contained system which is a processor module that contains the following: a general-purpose microprocessor, memory, configurable

Reconfigurable Logic and Hardware Software Codesign Page : 3 Figure 3 Dynamic Partioning and Configurable Logic Module Detail [c] logic, and a dynamic partitioning module. Figure 2 shows the overall structure of the processor module, while Figure 3 shows details of the special logic. This early prototype design attempts at runtime to determine the location of candidate code loops by snooping instruction fetches from main memory. The system then disassembles the code and creates control and data flow graphs. Using this information, a bitfile for hardware reconfiguration is created, and the software binary is patched to trigger the hardware. While the reconfigured hardware is executing, the processor transitions to a low power state. This prototype has many limitations, including: 1) supports 1 cycle loops only, 2) memory accesses must be sequential, 3) provides only basic hardware logic, and 4) requires manual binary patching. However, this avenue is promising, because of the possibility of conserving on power, while improving performance on various algorithms. More promising in the runtime reconfiguration space is the design discussed in [d]. The PipeRench design consists of processing units (PE) attached to a reconfigurable data path. Figure 4 shows how the hardware reconfigures as needed, taking only 1 clock cycle for each stripe, which is the basic building block. Figure 5 shows the internal structure of a stripe. The big advantage of the PipeRench design is the hardware is abstracted from the software, allowing software to be ported between different PipeRench hardware implementations. The performance and energy efficiency are very impressive. [d] compares PipeRench running at 120 Mhz with an 800 Mhz PentiumIII processor. The algorithm is for encryption and the PipeRench hardware outperforms the processor by a factor of 5. Figure 4 PipeRench Overview [b] describes related work which is used to assist with software analysis to determine which code should be implemented in hardware. [b] describes a tool (LOOAN) that is used to detect critical loops in the software. The loop code is then recoded in a special C language called SA-C (single assignment C). The Toolchain flow is shown in Figure 6. The target architecture for this work is a processor and FPGA connected on a memory bus. One interesting aspect of the work in [b] is the result achieved in the area of energy improvement. The combined FPGA/processor system was capable of an average speedup of 1.6, while achieving an average energy savings of 25%. This is very promising, since this allows not only a lower power solution, but also allows a

Reconfigurable Logic and Hardware Software Codesign Page : 4 Figure 5 PipeRench Dynamic Reconfiguration slower processor to be designed into the system, which save design cost. However, the cost of the FPGA must be factored into the full analysis. Figure 6 Design Flow for Hardware/software Partitioning [b] 4 Applications [a] described various PAM applications. RSA encryption and decryption, faster than any previous implementation by an order of magnitude. Genetic applications such as DNA matching. A company called Compugen sells a PAM that speeds up biological searches. It looks like a co-processor to the the host. Applications like heat and Laplace equations are perfect for PAM. For example, a PAM at 20 Mhz can achieve 5 G operations (add and shift) each second. [a] compares this result with a super computer which would have to operate at 20 B instructions per second to match this. This illustrates the power of parallel operations. Further PAM examples are similar: Boltzmann algorithm to minimize quadratic equations (used in circuit placement). Again, a formula that allows a high amount of parallelism. Similarly, the video compression usage relies on the fact that the operation is highly suited to pipelining, operating on small squares of video frames [a]. In high-energy physics, where images from particle collisions must be evaluated, the algorithms, which like video compression, operate on small images, which lend themselves to pipelining. In physics, the images are 20x20x32 b that must be processed every 10 microseconds. [a] describes one interesting application that lays out the speed difference between the various implementations used in correlating pairs of images for stereo vision. Software performs the operation in 59 seconds on a SPARC-Station II, a hardware design using 4 DSPs takes 9.6 seconds, while the P1 takes 0.28 seconds. It would be interesting to compare the power used in each of these three implementations. [a] continues on with more examples of sound synthesis, and finally a Viterbi encoder/decoder with large constraint length codes. So, what is the common thread in all the examples? The fact the algorithms

Reconfigurable Logic and Hardware Software Codesign Page : 5 operate on large quantities of data in a repetitive fashion. Larger speedups are possible when the operation is computationally expensive for a general purpose processor, like multiplication, division, etc. 5 Issues One critical issue is the lack of a transparent tool chain to support development onto a platform with a reconfigurable hardware subsystem. Both [c] and [d] make some progress in this area. The approach used in [c] is to place a very simple tool chain in the hardware itself. As hardware density continues to increase, this may become a more viable solution. [d] makes the choice of limiting the runtime configuration to the datapath connections, and so is able to achieve single cycle reconfiguration. This simplification also allows the customization of the hardware to be done in the application code. [d] makes progress in the most critical area, the application is built on an application programming interface that abstracts the hardware implementation from the software code. [a] describes the fact that a PAM could time-share with 12 applications. The implementation of this was not discussed. This is the main problem with trying to combine a general-purpose machine running many applications with some reconfigurable hardware. How will it be possible to share that hardware, when jobs are being swapped in and out. The time to reconfigure the hardware will be a large issue, as will the ability to swap out the internal state of the reconfigurable logic. Finally [f] covers some of the hardware issues that are limiting this work. PLDs use expensive SRAM, which raises the price of the parts. In addition, these devices use more power than ASICs, and run at slower speeds. 6 Conclusion It would be interesting to combine the work of [c] with [b]. The architecture targeted with [b] is the same architecture used in [c]. The biggest issue would be to take the more extensive loop analyzer and logic synthesis capabilities and place them into hardware. Both designs have the processor in a low power state while the hardware assist is operating. The field of reconfigurable logic is very exciting. This will be a critical area for continued research and development, as vendor s seek to increase system performance, while keeping clock speed and power constrained. Reconfigurable logic may the answer. 7 References [a] J. Vuillemin, P.Bertin, et. al, Programmable Active Memories: Reconfigurable Systems Come of Age, IEEE Transactions on VLSI Systems, March 1996 [b] J. Villareal, D. Suresh, G. Stitt, F. Vahid, W. Najjar, Improving Software Performance with Configurable Logic, Design Automation for Embedded Systems, 2002 [c] [d] [e] G. Stitt, R Lysecky, F. Vahid, Dynamic Hardware/Software Partitioning: A First Approach, DAC, June 2003 H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levin, R. Taylor, PipeRench: A Virtualized Programmable Datapath in 0.18 Micron Technology, CICC, 2002 G. De Micheli, R. Ernst, W. Wolf, Readings in Hardware/Software Co-Design, Morgan Kaufmann Publishers, 2002

Reconfigurable Logic and Hardware Software Codesign Page : 6 [f] N. Tredennick, B. Shimamato, Go Reconfigure ; Programmable logic devices will give us a handheld that does everything-well, IEEE Spectrum, 10/01/2003

Ref. : Reconfigurable Logic and Hardware Software Codesign Page : 2 of -7