NCBI BLAST accelerated on the Mitrion Virtual Processor

Similar documents
NCBI BLAST accelerated on the Mitrion Virtual Processor

Using FPGAs in Supercomputing Reconfigurable Supercomputing

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

INTRODUCTION TO FPGA ARCHITECTURE

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

Higher Level Programming Abstractions for FPGAs using OpenCL

Requirements for Scalable Application Specific Processing in Commercial HPEC

Altera SDK for OpenCL

Extending the Power of FPGAs to Software Developers:

Metropolitan Road Traffic Simulation on FPGAs

The Cray Rainier System: Integrated Scalar/Vector Computing

Highly-Scalable Reconfigurable Computing Abstract 1. Introduction 2. SGI System Architecture

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Reconfigurable Computing - (RC)

International Journal of Engineering and Scientific Research

End User Update: High-Performance Reconfigurable Computing

FPGA architecture and design technology

6.1 Multiprocessor Computing Environment

Support for Programming Reconfigurable Supercomputers

Latency Masking Threads on FPGAs

Parallel graph traversal for FPGA

Field Program mable Gate Arrays

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Trends and Challenges in Multicore Programming

NCSA Reconfigurable Systems Summer Institute July, Michael Babst DSPlogic, Inc x705. DSPlogic Proprietary

Programming Soft Processors in High Performance Reconfigurable Computing

Reconfigurable Computing. Design and Implementation. Chapter 4.1

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Yet Another Implementation of CoRAM Memory

Building supercomputers from embedded technologies

Reconfigurable Computing. Design and implementation. Chapter 4.1

Introduction to Field Programmable Gate Arrays

FPGP: Graph Processing Framework on FPGA

Shared Object-Based Storage and the HPC Data Center

Advances of parallel computing. Kirill Bogachev May 2016

FPGA: What? Why? Marco D. Santambrogio

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology

Functional Programming in Hardware Design

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

The Mobile Advantage. Erik Noreke Independent Standardization Consultant Chair, OpenSL ES. Copyright Khronos Group, Page 1

Early Experiences with the Naval Research Laboratory XD1. Wendell Anderson Dr. Robert Rosenberg Dr. Marco Lanzagorta Dr.

40Gbps+ Full Line Rate, Programmable Network Accelerators for Low Latency Applications SAAHPC 19 th July 2011

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Overview of ROCCC 2.0

Digital Integrated Circuits

Parallel Programming with MPI

A Framework to Improve IP Portability on Reconfigurable Computers

DINI Group. FPGA-based Cluster computing with Spartan-6. Mike Dini Sept 2010

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

FPGA 101. Field programmable gate arrays in action

Hardware Modelling. Design Flow Overview. ECS Group, TU Wien

CSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!

Accelerated Biological Meta- Data Generation and Indexing on the Cray XD1

The Red Storm System: Architecture, System Update and Performance Analysis

Embedded Controller Design. CompE 270 Digital Systems - 5. Objective. Application Specific Chips. User Programmable Logic. Copyright 1998 Ken Arnold 1

EN2911X: Reconfigurable Computing Lecture 01: Introduction

Flexible Architecture Research Machine (FARM)

EN105 : Computer architecture. Course overview J. CRENNE 2015/2016

INTRODUCTION TO BIOINFORMATICS

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor

FABRICATION TECHNOLOGIES

Design of Digital Circuits

Maintaining Cache Coherency with AMD Opteron Processors using FPGA s. Parag Beeraka February 11, 2009

Parallel Programming of High-Performance Reconfigurable Computing Systems with Unified Parallel C

C-Based Hardware Design

88X + PERFORMANCE GAINS USING IBM DB2 WITH BLU ACCELERATION ON INTEL TECHNOLOGY

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

All Programmable: from Silicon to System

Extending the Power of FPGAs

Chapter 5: ASICs Vs. PLDs

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Tutorial on VHDL and Verilog Applications

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

Design and Evaluation of a BLAST Ungapped Extension Accelerator, Master's Thesis

Co-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs

Why FPGAs will win the Accelerator Battle: Building Computers that Minimize Data Movement

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

ECE 448 Lecture 15. Overview of Embedded SoC Systems

XPU A Programmable FPGA Accelerator for Diverse Workloads

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

High-Performance Reconfigurable Computing the View from Edinburgh

Outline of Presentation

High-Speed Context Switching on FPGAs

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Hardware/Software Codesign

The Xilinx XC6200 chip, the software tools and the board development tools

Transcription:

NCBI BLAST accelerated on the Mitrion Virtual Processor

Why FPGAs? FPGAs are 10-30x faster than a modern Opteron or Itanium Performance gap is likely to grow further in the future Full performance at low power, uses around 25W per Chip Several major vendors now have FPGA modules Cray XD1, soon on the XT3/4 Other names that I won t mention at CUG (2)

FPGAs An empty re-configurable silicon surface Allows any circuit design to be configured into it Reconfigurable in ~100ms Reconfigurable any number of times (3)

FPGAs How a circuit is formed on an FPGA Configurable computing element (LUT) A look-up table for a simple Boolean operation Configurable interconnect element Configured to connect any LUT with any other LUT (4)

Circuit Design vs Programming The difference between using VHDL and C is much more than syntax Circuit design works in the physical world of real electronics Continual trade-offs and judgments regarding what will and what will not work Software programming works in the abstract world of pure logic Perfect and exact execution according to the code (even bugs!) (5)

Executing software To achieve the abstract world of software, a device that physically embodies a model of execution is required The device operates in the real world Its operations form the pure abstract world of software This is the essence of a computer processor (6)

The Mitrion Platform 1) The Mitrion Virtual Processor (MVP) A configurable processor design for a fine-grain massively parallel, soft-core processor 10-30 times faster than traditional CPUs 2) The Mitrion-C programming language An intrinsically parallel C-family language 3) The Mitrion Software Development Kit Compiler Debugger/Simulator Processor configurator IDE (7)

The Mitrion Platform (8)

Compiling A Mitrion-C Program Mitrion-C Source code Compiler Processor Machine-code Mitrion Software Development Kit Processor Architecture Simulator & Debugger Processor Configurator Processor Circuit Design (VHDL IP Core) (9) FPGA

Mitrion Offers Portability And Scalability You program the processor, you do not design a circuit for a specific FPGA platform Just configure a processor for the new platform from your old source-code. Easy upgrades to the next generation of performance available. Exchange code with colleagues on other platforms (10)

FPGA Memory Bandwidth An FPGA module typically has several different bandwidth regions: System memory: 3-6 GB/s Attached memory: 10-20 GB/s Internal memory: 0.5 TB/s (!) Apart from parallelism and adaptability, the third main reason for the high performance of FPGAs (11)

NCBI BLAST accelerated on the MVP

Mitrion-Accelerated NCBI BLAST Built on the NCBI blastall application Identical interface and output options BLASTN algorithm is accelerated: Critical parts ported Runs on the Mitrion Virtual Processor in FPGA hardware (13)

NCBI BLAST accelerated on the MVP Mitrion BLAST is open-source, released under GPL Requires at least MVP on Virtex4-LX200 Not a Black Box, anyone can add to it Total Mitrion-C code is less than 1300 lines Free to download Currently in Beta 17x-20x performance increase vs standard NCBI BLAST run on the SGI Altix 4700 (14)

NCBI BLAST accelerated on the MVP Main limitations: BLAST-N only BLAST-P in development Currently limited to less than ~100K queries (unlimited database sizes) Gapped extension not run on MVP Small percentage of run-time for most queries (15)

BLASTN Overview Three-stage deployment of BLAST (16)

Mitrion BLASTN Stage 1 (17)

Mitrion BLASTN Stage 2 (18)

Mitrion BLASTN Performance (19)

The Mitrion-C Open Bio Project Open source development of key Bioinformatics applications for the Mitrion Virtual Processor at Sourceforge.net: http://mitc-openbio.sourceforge.net/ Currently BLAST, others to come (20)

Contact Information Mitrionics AB Ideon Science Park SE-223 70 Lund www.mitrionics.com stefan.mohl@mitrionics.com (21)