Requirements for Scalable Application Specific Processing in Commercial HPEC

Similar documents
CASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

Parallel Matlab: RTExpress on 64-bit SGI Altix with SCSL and MPT

High-Assurance Security/Safety on HPEC Systems: an Oxymoron?

Setting the Standard for Real-Time Digital Signal Processing Pentek Seminar Series. Digital IF Standardization

Distributed Real-Time Embedded Video Processing

The extreme Adaptive DSP Solution to Sensor Data Processing

An Update on CORBA Performance for HPEC Algorithms. Bill Beckwith Objective Interface Systems, Inc.

Multi-Modal Communication

Sparse Linear Solver for Power System Analyis using FPGA

Dana Sinno MIT Lincoln Laboratory 244 Wood Street Lexington, MA phone:

High-Performance Linear Algebra Processor using FPGA

Washington University

DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience

4. Lessons Learned in Introducing MBSE: 2009 to 2012

Using Model-Theoretic Invariants for Semantic Integration. Michael Gruninger NIST / Institute for Systems Research University of Maryland

ARINC653 AADL Annex Update

DoD Common Access Card Information Brief. Smart Card Project Managers Group

Service Level Agreements: An Approach to Software Lifecycle Management. CDR Leonard Gaines Naval Supply Systems Command 29 January 2003

CENTER FOR ADVANCED ENERGY SYSTEM Rutgers University. Field Management for Industrial Assessment Centers Appointed By USDOE

Running CyberCIEGE on Linux without Windows

Kathleen Fisher Program Manager, Information Innovation Office

MODELING AND SIMULATION OF LIQUID MOLDING PROCESSES. Pavel Simacek Center for Composite Materials University of Delaware

The C2 Workstation and Data Replication over Disadvantaged Tactical Communication Links

FUDSChem. Brian Jordan With the assistance of Deb Walker. Formerly Used Defense Site Chemistry Database. USACE-Albuquerque District.

Lessons Learned in Adapting a Software System to a Micro Computer

Data Reorganization Interface

ASPECTS OF USE OF CFD FOR UAV CONFIGURATION DESIGN

Empirically Based Analysis: The DDoS Case

Considerations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

Information, Decision, & Complex Networks AFOSR/RTC Overview

Topology Control from Bottom to Top

New Dimensions in Microarchitecture Harnessing 3D Integration Technologies

ATCCIS Replication Mechanism (ARM)

Architecting for Resiliency Army s Common Operating Environment (COE) SERC

Balancing Transport and Physical Layers in Wireless Ad Hoc Networks: Jointly Optimal Congestion Control and Power Control

73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation

C2-Simulation Interoperability in NATO

A Distributed Parallel Processing System for Command and Control Imagery

2013 US State of Cybercrime Survey

SINOVIA An open approach for heterogeneous ISR systems inter-operability

Use of the Polarized Radiance Distribution Camera System in the RADYO Program

COMPUTATIONAL FLUID DYNAMICS (CFD) ANALYSIS AND DEVELOPMENT OF HALON- REPLACEMENT FIRE EXTINGUISHING SYSTEMS (PHASE II)

Corrosion Prevention and Control Database. Bob Barbin 07 February 2011 ASETSDefense 2011

Vision Protection Army Technology Objective (ATO) Overview for GVSET VIP Day. Sensors from Laser Weapons Date: 17 Jul 09 UNCLASSIFIED

Concept of Operations Discussion Summary

Data Warehousing. HCAT Program Review Park City, UT July Keith Legg,

Technological Advances In Emergency Management

LARGE AREA, REAL TIME INSPECTION OF ROCKET MOTORS USING A NOVEL HANDHELD ULTRASOUND CAMERA

Towards a Formal Pedigree Ontology for Level-One Sensor Fusion

ENVIRONMENTAL MANAGEMENT SYSTEM WEB SITE (EMSWeb)

A Review of the 2007 Air Force Inaugural Sustainability Report

FPGA Acceleration of Information Management Services

Parallelization of a Electromagnetic Analysis Tool

Wireless Connectivity of Swarms in Presence of Obstacles

ASSESSMENT OF A BAYESIAN MODEL AND TEST VALIDATION METHOD

NEW FINITE ELEMENT / MULTIBODY SYSTEM ALGORITHM FOR MODELING FLEXIBLE TRACKED VEHICLES

Accuracy of Computed Water Surface Profiles

Web Site update. 21st HCAT Program Review Toronto, September 26, Keith Legg

Development and Test of a Millimetre-Wave ISAR Images ATR System

U.S. Army Research, Development and Engineering Command (IDAS) Briefer: Jason Morse ARMED Team Leader Ground System Survivability, TARDEC

COTS Multicore Processors in Avionics Systems: Challenges and Solutions

SURVIVABILITY ENHANCED RUN-FLAT

75th Air Base Wing. Effective Data Stewarding Measures in Support of EESOH-MIS

Space and Missile Systems Center

Dr. ing. Rune Aasgaard SINTEF Applied Mathematics PO Box 124 Blindern N-0314 Oslo NORWAY

Introducing I 3 CON. The Information Interpretation and Integration Conference

SETTING UP AN NTP SERVER AT THE ROYAL OBSERVATORY OF BELGIUM

Title: An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications 1

Dr. Stuart Dickinson Dr. Donald H. Steinbrecher Naval Undersea Warfare Center, Newport, RI May 10, 2011

Exploring the Query Expansion Methods for Concept Based Representation

Advanced Numerical Methods for Numerical Weather Prediction

PEO C4I Remarks for NPS Acquisition Research Symposium

Using Templates to Support Crisis Action Mission Planning

Edwards Air Force Base Accelerates Flight Test Data Analysis Using MATLAB and Math Works. John Bourgeois EDWARDS AFB, CA. PRESENTED ON: 10 June 2010

Rockwell Science Center (RSC) Technologies for WDM Components

SCIENTIFIC VISUALIZATION OF SOIL LIQUEFACTION

Monte Carlo Techniques for Estimating Power in Aircraft T&E Tests. Todd Remund Dr. William Kitto EDWARDS AFB, CA. July 2011

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

AFRL-ML-WP-TM

Computer Aided Munitions Storage Planning

Shallow Ocean Bottom BRDF Prediction, Modeling, and Inversion via Simulation with Surface/Volume Data Derived from X-Ray Tomography

The Evaluation of GPU-Based Programming Environments for Knowledge Discovery

Space and Missile Systems Center

75 TH MORSS CD Cover Page. If you would like your presentation included in the 75 th MORSS Final Report CD it must :

BUPT at TREC 2009: Entity Track

REPORT DOCUMENTATION PAGE

Using the SORASCS Prototype Web Portal

Advanced Numerical Methods for Numerical Weather Prediction

ENHANCED FRAGMENTATION MODELING

Col Jaap de Die Chairman Steering Committee CEPA-11. NMSG, October

Ross Lazarus MB,BS MPH

INTEGRATING LOCAL AND GLOBAL NAVIGATION IN UNMANNED GROUND VEHICLES

Polarized Downwelling Radiance Distribution Camera System

2011 NNI Environment, Health, and Safety Research Strategy

DEVELOPMENT OF A NOVEL MICROWAVE RADAR SYSTEM USING ANGULAR CORRELATION FOR THE DETECTION OF BURIED OBJECTS IN SANDY SOILS

32nd Annual Precise Time and Time Interval (PTTI) Meeting. Ed Butterline Symmetricom San Jose, CA, USA. Abstract

Secure FAST: Security Enhancement in the NATO Time Sensitive Targeting Tool

Open Systems Development Initiative (OSDI) Open Systems Project Engineering Conference (OSPEC) FY 98 Status Review 29 April - 1 May 1998

Transcription:

Requirements for Scalable Application Specific Processing in Commercial HPEC Steven Miller Silicon Graphics, Inc. Phone: 650-933-1899 Email Address: scm@sgi.com Abstract: More and more High Performance Embedded Computing (HPEC) leverages technology from commercial high performance computing systems. To date, HPEC has only tapped the lower end of commercial high performance computing technology. As more of the advanced commercial technology moves into the embedded space, this presents a unique opportunity to change the fundamentals of how HPEC solutions are addressed. Within HPEC, two types of application specific processing elements, reconfigurable and custom are being used. Reconfigurable elements are comprised of Field Programmable Gate Array (FPGA) technology and custom elements are comprised of various devices as Digital Signal Processors (DSP), DARPA Polymorphic Computing Architectures (PCA) [1] and others. The integration of these devices presents significant challenges both to the system architecture and to the programming models. This presentation will describe a set of system requirements and methods to not only include these application specific processing devices but to allow effecting scaling of application specific processing devices. Introduction and System Architectural Review SGI s ccnuma (cache coherent non-uniform memory architecture) [2] global shared memory system architecture is the basis for our general purpose Origin and Altix HPC systems. The presentation will explore the architectural features used within both Origin and Altix systems which allows scaling in excess of one thousand commercial high performance processors. The architecture of SGI s systems which included 192 custom application specific processors (Tensor Processor Units) and 128 general-purpose processors will also be described. System Architectural Features for Scalability High-performance FPGAs represent an important architectural tool to provide total system performance while keeping both power and space requirements to a minimum. The use of FPGA elements have been limited to only the most computationally dense algorithms by the amount of data which can be pass through the device. But even with these limitations, significant performance and power improvements have been demonstrated [3]. This presentation will describe methods to increase the bandwidth to these devices and the communication primitives required to scale to hundreds of devices with out sacrificing performance.

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 01 FEB 2005 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Requirements for Scalable Application Specific Processing in Commercial HPEC 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Silicon Graphics, Inc. 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 11. SPONSOR/MONITOR S REPORT NUMBER(S) 13. SUPPLEMENTARY NOTES See also ADM00001742, HPEC-7 Volume 1, Proceedings of the Eighth Annual High Performance Embedded Computing (HPEC) Workshops, 28-30 September 2004 Volume 1., The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified 18. NUMBER OF PAGES 17 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Systems that include hundreds of application specific processing devices pose a significant software challenge. This presentation will describe how to effectively allocate, manage, and decommission under changing workloads various application specific processing elements. Both software methods and APIs will be presented on ways to interface application specific processing targeted at HPEC applications. References [ 1 ] http://www.darpa.mil/ipto/programs/pca/index.htm [ 2 ] http://www.sgi.com/servers/altix/ [ 3 ] W.D. Smith and A.R. Schnore. Towards an RCC-based Accelerator for Computational Fluid Dynamics Applications. In T.P. Plaks, eds., Proceedings of the International Conference of Engineering of Reconfigurable Systems and Algorithms, pp. 222-231. CSREA Press, Las Vegas, Nevada, 2003.

Requirements for Scalable Application Specific Processing in Commercial HPEC Steve Miller Chief Engineer scm 9/28/2004

The 3 Single-Paradigm Architectures Scalar Intel Itanium SGI MIPS IBM Power Sun SPARC HP PA Vector Cray X1 NEC SX App-Specific Graphics - GPU Signals - DSP Prog ble - FPGA Other ASICs

Paradigms to Applications Low Compute high Intensity Vector Application-specific Application-specific Scalar Low Data locality High

Architectural Challenges Hardware Bandwidth to/from System Scalability Software Compliers/Languages Debuggers APIs

Multi-Paradigm Computing UltraViolet Terascale to Petascale Data Set : Bring Function to Data Vector Vector Scalar Scalar Scalable Shared Memory. Globally addressable. Thousands of ports. Flat & high bandwidth. Flexible & configurable IO IO FPGA DSP Reconfigurable Graphics

Software Provide for HDL modules Integrated environment with debugger Highest performance Leverage 3 rd Party Std Language Tools Celoxia, Impulse Acceleration, Mitrion, Mentor Graphics Developed an FPGA aware version of GDB Capable of debugging the FPGA and System Software Capable of multiple CPUs and multiple FPGAs Developed RASC Abstraction Layer (RASCAL)

Software Overview Debugger (GDB) Download Utilities Application Abstraction Layer Library Device Manager User Space Algorithm Device Driver Download Driver Linux Kernel COP (TIO, Algorithm FPGA, Memory, Download FPGA) Hardware

Abstraction Layer: Algorithm API The Abstraction Layer s algorithm API mirrors the COP API with a few additions that enable wide scaling, Input Data Algorithm COP Output Data Application COP COP and deep scaling. Input Data Algorithm Output Data Application COP COP

Hardware Direct Connection to NUMAlink4 6.4GB/s/connection Fast System Level Reprogramming of FPGA Atomic Memory Operations Same set as System CPUs Hardware Barriers Configurations to 8191 NUMA/FPGA connections

MOATB Block Diagram NUMAlink 12.8 GB/s SSP 6.4 GB/s QDR SRAM 9.6GB/s 3 reads @ 1.6GB/s 3 writes @ 1.6GB/s Addr & Ctrl 2MB QDR SRAM Addr & Ctrl 36 36 2MB QDR SRAM 36 36 Algorithm FPGA Addr & Ctrl 36 36 2MB QDR SRAM Loader FPGA Select Map Programming Interface 72 72 SSP PCI 66MHz TIO NUMAlink Connectors

Requirements for Scalable Application Specific Processing in Commercial HPEC Presented by: Steve Miller Chief Engineer Silicon Graphics, Inc. SGI Proprietary

The 3 Single-Paradigm Architectures Scalar Intel Itanium SGI MIPS IBM Power Sun SPARC HP PA Vector Cray X1 NEC SX App-Specific Graphics - GPU Signals - DSP Prog ble - FPGA Other ASICs SGI Proprietary 9/20/2004 Page 2

Paradigms to Applications Low Compute high Intensity Vector Application-specific Application-specific Scalar Low Data locality High SGI Proprietary 9/20/2004 Page 3

Microprocessors & Heat Thermal Density SGI Proprietary 9/20/2004 Page 4

Architectural Challenges Ease of Use Languages Compilers Debuggers APIs Performance Bandwidth to/from System Scalability SGI Proprietary 9/20/2004 Page 5