Using FPGAs in Supercomputing Reconfigurable Supercomputing

Similar documents
NCBI BLAST accelerated on the Mitrion Virtual Processor

NCBI BLAST accelerated on the Mitrion Virtual Processor

Introduction to reconfigurable systems

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Introduction to Field Programmable Gate Arrays

FPGA: What? Why? Marco D. Santambrogio

FPGA architecture and design technology

INTRODUCTION TO FPGA ARCHITECTURE

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

CHAPTER 1 Introduction

Chapter 5: ASICs Vs. PLDs

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

CSCI-375 Operating Systems

Top500 Supercomputer list

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Experts in Application Acceleration Synective Labs AB

Functional Programming in Hardware Design

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Chapter 1: Introduction to Parallel Computing

Overview of Computer Organization. Chapter 1 S. Dandamudi

CC312: Computer Organization

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Overview of Computer Organization. Outline

Metropolitan Road Traffic Simulation on FPGAs

New development within the FPGA area with focus on soft processors

Embedded Computing Platform. Architecture and Instruction Set

EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES

6.1 Multiprocessor Computing Environment

Lecture (01) Introducing Embedded Systems and the Microcontrollers By: Dr. Ahmed ElShafee

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

FABRICATION TECHNOLOGIES

Full Linux on FPGA. Sven Gregori

378: Machine Organization and Assembly Language

Early Models in Silicon with SystemC synthesis

Research Challenges for FPGAs

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

High-Performance Linear Algebra Processor using FPGA

Overview of Microcontroller and Embedded Systems

The extreme Adaptive DSP Solution to Sensor Data Processing

Field Program mable Gate Arrays

Design of Digital Circuits

Parallel Computing Basics, Semantics

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization

Reduced Instruction Set Computers

RISC Processors and Parallel Processing. Section and 3.3.6

Parallel Architectures

CS 101, Mock Computer Architecture

CS Computer Architecture

The Nios II Family of Configurable Soft-core Processors

Computer Architecture s Changing Definition

Announcement. Computer Architecture (CSC-3501) Lecture 20 (08 April 2008) Chapter 6 Objectives. 6.1 Introduction. 6.

Computers: Inside and Out

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

XPU A Programmable FPGA Accelerator for Diverse Workloads

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Final Exam Review. b) Using only algebra, prove or disprove the following:

FPGA for Software Engineers

Higher Level Programming Abstractions for FPGAs using OpenCL

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

FPGA Design Flow 1. All About FPGA

CSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!

AL8253 Core Application Note

CPU Architecture system clock

Support for Programming Reconfigurable Supercomputers

Scalability and Classifications

VHDL for Synthesis. Course Description. Course Duration. Goals

PINE TRAINING ACADEMY

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Addressing Heterogeneity in Manycore Applications

Embedded Systems: Hardware Components (part I) Todor Stefanov

COSC 6385 Computer Architecture - Multi Processor Systems

ECE 8823: GPU Architectures. Objectives

New Software-Designed Instruments

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

Digital Integrated Circuits

High-Level Parallel Programming of Computation-Intensive Algorithms on. Fine-Grained Architecture

Fundamentals of Quantitative Design and Analysis

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

Requirements for Scalable Application Specific Processing in Commercial HPEC

CS 475: Parallel Programming Introduction

VLSI Design Automation

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Introduction to parallel computing

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Chapter 9: Integration of Full ASIP and its FPGA Implementation

EE 3170 Microcontroller Applications

Field Programmable Gate Array

Performance potential for simulating spin models on GPU

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong

Transcription:

Using FPGAs in Supercomputing Reconfigurable Supercomputing

Why FPGAs? FPGAs are 10 100x faster than a modern Itanium or Opteron Performance gap is likely to grow further in the future Several major vendors now have FPGA modules Cray XD1 in 2004, soon on the XT3/XT4 Silicon Graphics launched RASC in 2005 Linux Networx announced in June, launching in a few months Others in progress (2)

A typical FPGA Module FPGA to Memory I/O 3.2GB/s System Bus 3.2GB/s FPGA Xilinx Virtex4 LX 200 0.5TB/s 336 512x36bit RAM 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s 1.6GB/s RAM memory RAM memory RAM memory RAM memory RAM memory (3)

FPGAs Empty re configurable silicon surface Compared to fixed silicon at the same process technology (90nm): ~10 times slower clock frequency ~100 times larger area used per gate Compared to CPUs 10 100 times faster (4)

Programming a Field Programmable Gate Array Without a circuit design, an FPGA is just an empty silicon surface What is meant by Programmable in the acronym FPGA is a circuit design can be loaded Designing a circuit is not programming from a software developer s point of view Not suitable for supercomputing application developers (biologists, astronomers, etc) (5)

Reconfigurable Supercomputing: Three new Directions

Three new directions: Libraries Hand crafted full FPGA circuits for specific tasks Called as library routines don t touch legacy source code Bandwidth limitations Sending and receiving arguments Reconfiguration of the FPGA Very hard to overlap transfers automatically Circuit design is difficult No libraries yet (7)

Three new directions: Bottom Up Raise the abstraction level of classic circuit design methods Behavioral Synthesis The Holy Grail of circuit design has been researched for over 20 years Several circuit design tools have come a long way Hard to get good performance (8)

Three new directions: Top Down Create a circuit design that can run software Processors Micro blaze and Nios won t cut it Configurable processors FPGAs require a new Processor Architecture Must be automatically adaptable for the program it will run Must be massively parallel (9)

The Mitrion Processor Architecture

The von Neumann processor architecture int:48<30> main() { int:48 prev = 1; int:48 fib = 1; int:48<30> fibonnacci = for(i in <1..30>) { fib = fib+prev; prev = fib; } <>fib; } fibonnacci; Program mov r4,1 add r1,r0,r4 mov r5,3 mul r2,r1,r5 Registers RAM [] + * / The von Neumann architecture is based on a State Machine. It is characterized by operating on one instruction at a time from an instruction stream. + Easily programmable + Executes programs of any size Sequential; works on an instruction stream Low silicon utilization I/O intensive Needs very high clock frequency (11)

The Purpose Of A Processor Architecture int:48<30> main() { int:48 prev = 1; int:48 fib = 1; int:48<30> fibonnacci = for(i in <1..30>) { fib = fib+prev; prev = fib; } <>fib; } fibonnacci;? A processor is an abstraction layer A machine, built in hardware that performs your program, written in software Completely separates hardware from software ISA: branches, arithmetic operations, indirect indexing, status registers, etc, etc Circuits: gates, flip flops, clocks, PLLs, UCFs, etc, etc (12)

Processor Architecture: A Cluster On A Chip Traditional Clusters: A set of nodes connected in a fixed network Network latency on the order of many thousands of clock cycles Each node needs to run a block of code to mask network latency (13)

Processor Architecture: A Cluster On A Chip Network topology specific for algorithm Ideal properties Very Fine Grain Parallelism A single instruction on each node Each node adapted to run its instruction Fully non von Neumann architecture The architecture replaces sequential instruction scheduling with parallel packet switching (14)

Mitrion Tools Do Not Allow Circuit Design The Mitrion tools can only program and configure the Mitrion Virtual Processor architecture The tools can not be used to create circuit designs Since circuits can not be designed, no errors regarding circuit design can be made The Mitrion Processor lets you run software in your FPGA (15)

Mitrion Offers Portability And Scalability You program the processor, you do not design a circuit for a specific FPGA platform Just configure a processor for the new platform from your old source code. Easy upgrades to the next generation of performance available. Exchange code with colleagues on other platforms (16)

The Mitrion Platform 1) The Mitrion Virtual Processor A configurable processor design for a fine grain massively parallel, soft core processor 10 30 times faster than traditional CPUs 2) The Mitrion C programming language An intrinsically parallel C family language 3) The Mitrion Software Development Kit Compiler Debugger/Simulator Processor configurator (17)

Running Software in FPGAs

Mitrion C The Mitrion Processor needs a fine grain, fully parallel programming language Parallel at the level of individual instructions A C family language but not ANSI C! Steeper learning curve, but better in the long run The automatically parallelising C or Fortran compiler doesn t exist anyway The Holy Grail of parallel computing for 20 years (19)

Mitrion C Does not describe Order of Execution. Instead it describes Data Dependencies Syntactic support to make design decisions regarding parallelism salient Designed to preserve parallelism in the algorithm Designed to reveal parallelism in the algorithm Allows a complete data dependency graph to be created of the algorithm (20)

Applying the Mitrion Platform 1) Identify the time critical portion of the program 2) Write it as code in Mitrion C 3) Perform a function call to run your Mitrionprogram Most of your code is unchanged (21)

Compiling A Mitrion Program Mitrion C Source code Compiler Processor Machine code Mitrion Software Development Kit Processor Architecture Simulator & Debugger Processor Configurator Processor Circuit Design (VHDL IP Core) (22) FPGA

Compiler, Simulator And Debugger (23)

Contact Information Mitrionics AB Ideon Science Park SE 223 70 Lund www.mitrionics.com stefan@mitrionics.com (24)