Reconfigurable Computing. Introduction

Similar documents
ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Design of Reusable Context Pipelining for Coarse Grained Reconfigurable Architecture

FPGA Power and Timing Optimization: Architecture, Process, and CAD

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Coarse Grained Reconfigurable Architecture

INTRODUCTION TO FPGA ARCHITECTURE

The S6000 Family of Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Introduction to reconfigurable systems

EN2911X: Reconfigurable Computing Topic 01: Programmable Logic

A Reconfigurable Multifunction Computing Cache Architecture

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

EN2911X: Reconfigurable Computing Lecture 01: Introduction

Re-configurable VLIW processor for streaming data

Design methodology for multi processor systems design on regular platforms

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Coarse Grain Reconfigurable Arrays are Signal Processing Engines!

Digital Integrated Circuits

What is Configurable Computing?

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Lecture 41: Introduction to Reconfigurable Computing

Hardware/Software Codesign

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

Implementation of DSP Algorithms

Memory Systems IRAM. Principle of IRAM

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

R.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp.

Towards Optimal Custom Instruction Processors

The Nios II Family of Configurable Soft-core Processors

Co-synthesis and Accelerator based Embedded System Design

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

High performance, power-efficient DSPs based on the TI C64x

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

NEW APPROACHES TO HARDWARE ACCELERATION USING ULTRA LOW DENSITY FPGAs

Embedded Systems. 7. System Components

Programmable Logic Devices UNIT II DIGITAL SYSTEM DESIGN

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Flexible Architecture Research Machine (FARM)

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

What is Configurable Computing?

Hardware-Software Codesign. 1. Introduction

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

Dataflow Supercomputers

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

Field Program mable Gate Arrays

Parameterized System Design

FABRICATION TECHNOLOGIES

RFU Based Computational Unit Design For Reconfigurable Processors

Hardware/Software Co-design

Chapter 5. Introduction ARM Cortex series

XPU A Programmable FPGA Accelerator for Diverse Workloads

Atmel s s AT94K Series Field Programmable System Level Integrated Circuit (FPSLIC)

Secure Function Evaluation using an FPGA Overlay Architecture

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

FPGA BASED ACCELERATION OF THE LINPACK BENCHMARK: A HIGH LEVEL CODE TRANSFORMATION APPROACH

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

Specializing Hardware for Image Processing

The MorphoSys Parallel Reconfigurable System

High Level, high speed FPGA programming

The extreme Adaptive DSP Solution to Sensor Data Processing

The Xilinx XC6200 chip, the software tools and the board development tools

QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection

LECTURE 10: Improving Memory Access: Direct and Spatial caches

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

Performance of computer systems

Multi processor systems with configurable hardware acceleration

Design Space Exploration for Memory Subsystems of VLIW Architectures

An Introduction to Parallel Programming

Energy scalability and the RESUME scalable video codec

FPGA architecture and design technology

Embedded Controller Design. CompE 270 Digital Systems - 5. Objective. Application Specific Chips. User Programmable Logic. Copyright 1998 Ken Arnold 1

A Process Model suitable for defining and programming MpSoCs

Automatic Compilation to a Coarse-Grained Reconfigurable System-opn-Chip

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

RISC Processors and Parallel Processing. Section and 3.3.6

Computing System Element Choices

Automatic Compilation to a Coarse-grained Reconfigurable System-on-Chip

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Introduction to Parallel Programming

Computer Architecture

Design Methodologies. Full-Custom Design

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

System-on Solution from Altera and Xilinx

Field Programmable Gate Array (FPGA)

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

Reconfigurable Computing

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

EITF35: Introduction to Structured VLSI Design

Overview of ROCCC 2.0

Transcription:

Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally based on FPGA technology! Underlying technology enables swapping digital circuits to create custom logic: In real-time, i.e., matter of seconds or milliseconds In circuit During execution time, i.e., on-the-fly! Digital hardware (custom logic) used for speedup Data parallel systems (roots in array processing) DSP, multimedia Long integer arithmetic! Rapid prototyping of high performance systems (application specific)! Computing structure granularity varies Fine grained (FPGA) to coarse grained (array processors) 2

Why is Custom Logic Faster Than Software?! Spatial vs. Temporal Computation Processors divide computation across time, dedicated logic divides across space 2 y = Ax + Bx + C Temporal Computation Spatial Computation t1 = x t2 = t1 * A t2 = t2 + B t2 = t2 * t1 y = t2 + C t1 t2 A B x A * * * + B C C + Y 3 Why is Custom Logic Faster Than Software?! Specialization Instruction set may not provide the operations your program needs Processors provide hardware that may not be useful in every program or in every cycle of a given program Multipliers Dividers! Instruction Memory Processors need lots of memory to hold the instructions that make up a program and to hold intermediate results.! Bit Width Mismatches In general, processors have a fixed bit width, and all computations are performed on that many bits Multimedia vector instructions (MMX) a response to this 4

Good Applications for Reconfigurable Computing! Relatively small application graph FPGAs have limited capacity Simple control flow helps a lot! Data Parallelism Execute same computations on many independent data elements Pipeline computations through the hardware! Small and/or varying bit widths Take advantage of the ability to customize the size of operators 5 Field Programmable Gate Array (FPGA)! Programmable logic blocks (PLB)! Programmable connections between logic blocks (PIC)! Programmable I/O (PIO)! Digital only! Low/medium/high performance (50/100/300 MHz)! Low/medium/high density (20K/200K/2M gates)! Bit stream programmable: SRAM, EEROM, Flash, Anti-fuse, etc.! Execute/programming mode 6

FPGA Internals! Traditionally 3 components: PI/O, PIC, PLB! Look up table (LUT) form the basic PLBs for logic generation 3 to 4 inputs wide Multiple LUTs / cell! Programmable communication grid (PIC)! Recent trends: Incorporate memory and processor on chip LUT LUT 7 Why Compute With FPGAs?! Huge performance gap between software and handdesigned hardware systems Often 100-to-1 ratio of performance or performance/area! Hardware systems not so good for general computing Big design, cost barriers to implementation Not practical to buy a new machine every time you want to run a different program! Reconfigurable systems offer best-of-both-worlds Run-time programmability Hardware-level performance 8

Why Haven t FPGA s Defeated CPUs?! Capacity: Instructions are very dense representation, logic blocks aren t! Tools: Compilers for reconfigurable logic aren t very good Some operations are hard to implement on FPGAs! One approach to capacity is to exploit the 90-10 rule of software Run the 90% of code that takes 10% of execution time on a conventional processor Run the 10% of code that takes 90% of execution time on reconfigurable logic! Programmable-reconfigurable processors 9 Reconfigurable Computing Architecture! Processor implements most of the function! Reconfigurable fabric (e.g., FPGA) implements compute/data intensive! Shared memory (dual port or DMA driven)! Shared bus bottleneck Reconfig Fabric Processor M E M O R Y 10

Reconfigurable Computing Alternate Architecture! One or more processors on FPGA chip! One or more memory blocks on FPGA chip up MEM FPGA Fabric up! FPGA fabric used to design the communication and storage infrastructure up MEM up 11 Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally based on FPGA technology! Underlying technology enables swapping digital circuits to create custom logic: In real-time, i.e., matter of seconds or milliseconds In circuit During execution time, i.e., on-the-fly! Digital hardware (custom logic) used for speedup Data parallel systems (roots in array processing) DSP, multimedia Long integer arithmetic! Rapid prototyping of high performance systems (application specific)! Computing structure granularity varies Fine grained (FPGA) to coarse grained (array processors) Let s look at some examples of how they re used 12

Fine-Grained System: CHIMERAE Treat reconfigurable array as ALU within superscalar Array implements some number of custom instructions for each program Register file is interface between programmable and reconfigurable 13 Programmed in C Instruction combining Control localization SIMD Within a Register Simulation Studies CHIMERAE Example applications only require 8 RFUOPs in the reconfigurable array Equivalent to 32 rows in RFU Performance Results Vary strongly from application to application Also dependent on model used for RFU delay Average speedup of 20-30%, one application sees >2x improvement 14

Coarse-Grained System: Garp! Small programmable processor with large reconfigurable array Interface through memory system 15 Again, Programmed in C Garp Compiler attempts to map loop nests onto the reconfigurable array Data Encryption Standard Estimate 24x speedup over UltraSPARC Image Dithering 9x Speedup Sorting 2x Speedup 16

Mid-Grain: Amalgam 17 Amalgam Performance 24 24 22 20 18 16 N-Queens 2D IDCT Rijndael Encryption Image Dithering DNA Pattern Match 22 20 18 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 1 2 3 4 5 6 7 8 Number of Clusters 0 18

Computing Abstraction! One or more FPGA contexts (bit streams) pre-synthesized B 0, B 1, B 2 B n! Cost for reconfiguration C reconfig! FPGA holds one context at a time Entire computation/data can not be configured onto a single FPGA! Notion of scheduling Processing stops during reconfiguration Next context being configured while current context is executing (double buffering) NP complete problem (similar to partitioning/scheduling in codesign) 19 State of the Art in Reconfigurable Computing! Some see RC as what will replace ASIC or even system design! But: Conventional FPGAs aren t really designed for reconfigurable computing Design methodology not standard or automated Architecture Computation model Compiler challenges! A hot research topic 20

Some Reconfigurable Computing Successes! RSA Decryption Programmable-Active-Memory machine set record for decryption of RSA-encrypted data! DNA Sequence Matching Reconfigurable hardware has achieved 100x better performance than contemporary supercomputers! Signal Processing FPGA-based filters often get 10x better performance than DSP chips Benefit from customization of hardware to the application! Emulation Use reconfigurable logic to simulate new processors at high speeds! Cryptographic Attacks High-performance low-cost implementations for breaking encryption algorithms 21