Lecture 41: Introduction to Reconfigurable Computing

Similar documents
FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

INTRODUCTION TO FPGA ARCHITECTURE

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Introduction to reconfigurable systems

Readings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.

Field Programmable Gate Array (FPGA)

Field Program mable Gate Arrays

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

The Xilinx XC6200 chip, the software tools and the board development tools

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

FPGA architecture and design technology

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

A Time-Multiplexed FPGA

ECE 331 Digital System Design

FABRICATION TECHNOLOGIES

HW / SW Implementation Overview (0A) Young Won Lim 7/16/16

Spiral 2-8. Cell Layout

FPGA: What? Why? Marco D. Santambrogio

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

ECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I

EECS150 - Digital Design Lecture 17 Memory 2

Design Methodologies

BUILDING BLOCKS OF A BASIC MICROPROCESSOR. Part 1 PowerPoint Format of Lecture 3 of Book

www-inst.eecs.berkeley.edu/~cs61c/

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

CS310 Embedded Computer Systems. Maeng

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

discrete logic do not

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

Memory. Lecture 22 CS301

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

Field Programmable Gate Array

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

Programmable Logic. Any other approaches?

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

EITF35: Introduction to Structured VLSI Design

CS61C : Machine Structures

EECS150 - Digital Design Lecture 09 - Parallelism

Introduction to Field Programmable Gate Arrays

Flexible Architecture Research Machine (FARM)

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

DINI Group. FPGA-based Cluster computing with Spartan-6. Mike Dini Sept 2010

Monolithic 3D IC Design for Deep Neural Networks

VLSI Programming 2016: Lecture 3

Handouts. FPGA-related documents

Computer Architecture. R. Poss

Motivation for Lecture. Market for Memories. Example: FFT Design. Sequential Circuits & D flip-flop. Latches and Registers.

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

Review. EECS Components and Design Techniques for Digital Systems. Lec 03 Field Programmable Gate Arrays

EECS150 - Digital Design Lecture 3 - Field Programmable Gate Arrays (FPGAs) FPGA Overview

Field Programmable Gate Array (FPGA) Devices

Reconfigurable Computing. Introduction

Handouts. 1. Project Guidelines and DSP Function Generator Design Specifications. (We ll discuss the project at the beginning of lab on Wednesday)

EECS150 - Digital Design Lecture 16 - Memory

More Course Information

UCB CS61C : Machine Structures

PLAs & PALs. Programmable Logic Devices (PLDs) PLAs and PALs

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

A CAD Framework for MALIBU: An FPGA with Time-multiplexed Coarse-Grained Elements. David Grant

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

EN2911X: Reconfigurable Computing Lecture 01: Introduction

EITF35: Introduction to Structured VLSI Design

DE2 Board & Quartus II Software

EECS Components and Design Techniques for Digital Systems. Lec 07 PLAs and FSMs 9/ Big Idea: boolean functions <> gates.

Laboratory Exercise 3

An Introduction to Programmable Logic

L2: Design Representations

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

Calendar Description

FPGA Implementations

Memory in Digital Systems

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

Workspace for '4-FPGA' Page 1 (row 1, column 1)

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Stratix vs. Virtex-II Pro FPGA Performance Analysis

UC Berkeley CS61C : Machine Structures

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

High-Tech-Marketing. Selecting an FPGA. By Paul Dillien

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

In Silico Vox: Towards Speech Recognition in Silicon

CS61C : Machine Structures

ELCT 501: Digital System Design

1" 0" d" c" b" a" ...! ! Benefits of Larger Block Size. " Very applicable with Stored-Program Concept " Works well for sequential array accesses

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Energy scalability and the RESUME scalable video codec

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

FYSE420 DIGITAL ELECTRONICS. Lecture 7

ECE 448 Lecture 5. FPGA Devices

How Much Logic Should Go in an FPGA Logic Block?

Reconfigurable Computing

Transcription:

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA

Following the tech news tradition NeuroSky of San Jose, CA aims to add more realistic elements to video games by using brain wave-reading technology to help game developers make gaming more realistic. http://news.yahoo.com/s/ap/20070430/ap_on_hi_te/mind_reading_toys 2

Outline Computing What does it mean? Processor vs ASIC FPGA-based Reconfigurable Computing Real stuff 3

Back to basics What does the word computer mean to you? Your $700 box sitting under your desk at home? The $2000 laptop you are using to check email right now? The 5-stage pipeline processor? 4

Informal Definition A computer is a machine that computes add, subtract, logical operations, decisions What have we learned about computing in this semester? 5

Calculating Class Grades* grade = 0.1 mt1 + 0.2 mt2 +0.3 hw + 0.4 proj; grade = 0; tmp = 0.1 mt1; grade = grade + tmp; tmp = 0.2 mt2; grade = grade + tmp; tmp = 0.3 hw; grade = grade + tmp; tmp = 0.4 proj; grade = grade + tmp; Time *This is not how we are going to calculate your grades 6

Computing Final Grade (2) 0.1 mt1 0.2 mt2 0.3 hw 0.4 proj + + + Time grade SPACE 7

2 Ways to Compute 0.1 0.1 0.2 0.2 0.4 0.4 mt1 tmp mt1 tmp + mt2 tmp mt2 tmp proj + + tmp proj tmp grade grade grade grade clock cycle 1 clock cycle 2 clock cycle 3 TIME + + + Processor (CS61C) grade clock cycle n grade Application Specific Integrated Circuit ASIC (EE141) 8

Processor vs ASIC Take longer to compute slow Flexible Need instructions to determine what to do on each cycle Space is bounded Temporal Computing Take shorter time to compute fast Not Flexible No instruction Same calculation every cycle Space unbounded Branches? Spatial Computing 9

Visualizing Spatial Computing AMD Opteron 64-bit processor 1MB L2 Cache 193 mm sq 0.18 micron CMOS 89W @ 1.8GHz ~3 Op / cycle (int op) Actual computation Full Custom ASIC 4x4 Single Value Decomposition 3.5 mm sq 90nm CMOS 34mW @ 100 MHz clock 70 GOPS = 700 Op / cycle 10

Between Temporal & Spatial Computing Temporal Single Processor? ASIC Spatial Slow Flexible Reconfigurable Computing Fast Inflexible 11

Reconfigurable Computing No standard definition Computing via a post-fabrication and spatially programmed connection of processing elements. -John Wawrzynek Sp04 A computer that can RE-configure itself to perform computation spatially as needed How often do we RE-configure? Coarse-grain? Fine-grain? Example: FPGA 12

Introduction to the FPGA Field Programmable Gate Array Began as ASIC replacements ASIC that can be configured in the field At power up, configuration is loaded onto the chip Chip acts as an ASIC until power down Modern FPGA more like computers Exploit dynamic, partial reconfiguration Embedded processors Xilinx, Altera are 2 major market leaders 13

The LUT LUT: Lookup Table A direct implementation of a truth table Recall a TT uniquely defines a circuit A B C D Q 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 An n-lut implements any n-input combinational logic Depends on LUT configuration 14

Making a 2-LUT 2 from Truth Table A B FALSE AND OR NAND TRUE 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 16 possible ways 4 configuration bits 15

2-LUT: CL and MUX Based cfg3 cfg2 cfg1 cfg0 A B Q 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 cfg3 cfg2 cfg1 cfg0 1 0 1 0 1 0 Q 1 1 1 1 1 0 1 1 1 1 1 1 1 1 B A 16

LUTs in Real Life 3-LUT and 4-LUT are most common SRAM based Learn, and use, them a lot in CS150 17

Sequential logic Connecting multiple LUTs gives us ANY combinational logic we want to implement We need Flip-Flop to build sequential circuits CL CL clk FF are so important that they are included natively on FPGAs next to each LUT LUT + FF + = LB (Logic Block) 18

Logic Block D3 D2 D1 D0 LUT FF 1 0 OUT LUTCFG CLK MUXCFG Can build any 4-input circuit Synchronous OR Asynchronous Combining Logic Blocks => ANY synchronous digital circuit How to we build bigger circuit? 19

Routing of FPGA LB LB LB LB LB LB LB LB LB LB LB LB With enough smartness in placement and routing, we can implement any synchronous digital circuits! 20

Example: Xilinx Virtex2pro xc2vp70 74,448 Logic Cells (LB) 2 PowerPC cores 328 18x18 bits multipliers 5904 Kbytes on chip memory 8 Digital Clock Managers 996 I/O pins 16 high speed serial I/O ports 21

Die Photo of a FPGA Entire Chip is for Computation Spartan-3 90nm CMOS 22

Real Stuff: BEE2 Developed at Berkeley Berkeley Wireless Research Center 5 Xilinx xc2vp70 40Gbytes DDR2 memory Used for research in: Wireless Astro-Physics (SETI) Bioinformatics Speech Recognition 23

Conclusion The Processor is NOT the only way to do computation Reconfigurable computers allows different tradeoffs among speed, flexibility, cost, power, etc FPGA offers fine-grain reconfigurability 24