HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Similar documents
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPU for HPC. October 2010

General Purpose GPU Computing in Partial Wave Analysis

Building NVLink for Developers

High Performance Computing with Accelerators

Tesla GPU Computing A Revolution in High Performance Computing

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

What does Heterogeneity bring?

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

GPU Clouds IGT Cloud Computing Summit Mordechai Butrashvily, CEO 2009 (c) All rights reserved

Pedraforca: a First ARM + GPU Cluster for HPC

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

Lecture 1: Gentle Introduction to GPUs

FPGA-based Supercomputing: New Opportunities and Challenges

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Numerical Algorithms on Multi-GPU Architectures

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Advances of parallel computing. Kirill Bogachev May 2016

Accelerating CFD with Graphics Hardware

Massively Parallel Architectures

GPU Computing with NVIDIA s new Kepler Architecture

HIGH-PERFORMANCE COMPUTING

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

Parallel Programming on Ranger and Stampede

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Preparing GPU-Accelerated Applications for the Summit Supercomputer

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

GPUs and GPGPUs. Greg Blanton John T. Lubia

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

CUDA. Matthew Joyner, Jeremy Williams

An approach to provide remote access to GPU computational power

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

Current Trends in Computer Graphics Hardware

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Large scale Imaging on Current Many- Core Platforms

Introduction to CELL B.E. and GPU Programming. Agenda

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Tesla Architecture, CUDA and Optimization Strategies

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

CUDA Programming Model

CS427 Multicore Architecture and Parallel Computing

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

Experts in Application Acceleration Synective Labs AB

Lecture 1: Introduction and Computational Thinking

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Parallel Programming

Tesla GPU Computing A Revolution in High Performance Computing

Advanced CUDA Optimization 1. Introduction

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

CME 213 S PRING Eric Darve

Accelerating High Performance Computing.

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

CUDA Conference. Walter Mundt-Blum March 6th, 2008

The Many-Core Revolution Understanding Change. Alejandro Cabrera January 29, 2009

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

To hear the audio, please be sure to dial in: ID#

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

GPU ARCHITECTURE Chris Schultz, June 2017

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

The Future of GPU Computing

Thinking Outside of the Tera-Scale Box. Piotr Luszczek

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

When MPPDB Meets GPU:

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA)

n N c CIni.o ewsrg.au

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Trends and Challenges in Multicore Programming

GPU. OpenMP. OMPCUDA OpenMP. forall. Omni CUDA 3) Global Memory OMPCUDA. GPU Thread. Block GPU Thread. Vol.2012-HPC-133 No.

High performance Computing and O&G Challenges

GPUs and Emerging Architectures

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA

TSUBAME-KFC : Ultra Green Supercomputing Testbed

Two-Phase flows on massively parallel multi-gpu clusters

Software and Performance Engineering for numerical codes on GPU clusters

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

IBM CORAL HPC System Solution

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Lecture 1. Introduction Course Overview

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

INSPUR and HPC Innovation

Mathematical computations with GPUs

High Performance Computing (HPC) Introduction

Transcription:

HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com

2 Agenda I. HPC with GPU II. YITIAN solution and application

3 New Moore s Law

4 HPC? HPC stands for High Heterogeneous Performance Parallel Computing Computing Hybrid/Cooperative Computing System Includes general and specialized processors/cores The future of HPC? TOP500: Roadrunner, TSUBAME, incoming supercomputer at ORNL

5 Enter Heterogeneous Parallel Computing Chip-level Desktop-level Cluster-level YITIAN with GPU HPC system

6 Why GPU Many core with massive hardware thread Dedicated to computing 1.3G * 240 * 3 1TFLOPS peak (GT200) 120 100 GPU CPU G80 Ultra Memory bandwidth (GB/s) 80 60 40 NV40 G71 G80 20 0 NV30 Hapertown Prescott EE Woodcrest Northwood 2003 2004 2005 2006 2007

7 GPU for HPC, History, Opportunities and Challenges GPGPU Opportunities Initiative GPU for general purpose computing Terrible programmability Resident narrowly as desktop level application Revolutionary architecture & programming model Brook+? CUDA Ecosystem building-up is essential Programmer wake up for chargeable lunch (Free lunch is over) Low price, easy to get Better perf/watt, perf/dollar Tremendous R&D activities not only by academic, but only by commercial Targeting TOP500 Live product, rank 41, building by GSIC Center, Tokyo Institute of Technology Challenges Programmability does matter! Mature ecosystem is essential Various architecture, lack of standard opencl, way to fix that?

8 Hardware - GPU as computation device GPU devotes more transistors to data processing

9 Programmability - Enter CUDA CUDA: Compute Unified Device Architecture Programming Model Nvidia GPU based Mapping between the hardware and the software SDK Tool-chain (compiler, debugger, profiler) Runtime Language support Library (BLAS, FFT, etc.) Thoughts about Heterogeneous/Hybrid computing CPU+GPU GPU as co-processor

10 Heterogeneous Computing with CUDA CPU + GPU MPI + CUDA? OpenMP + CUDA? Pthread + CUDA? Cooperative computing CPU for general purpose workloads GPU for offloaded specialized computational-sensitive workloads to GPU

11 CUDA application speedup GeForce 8800 GTX vs. 2.2GHz Opteron 248 10 speedup in a kernel is typical, as long as the kernel can occupy enough parallel threads 25 to 400 speedup if the function s data requirements and control flow suit the GPU and the application is optimized

12 Agenda I. Introduction II. YITIAN solution and application

13 YITIAN Solution Personal HPC Solution YITIAN desktop supercomputing Industry Solution TIANSUO 10000 with YITIAN Desktop HPC or YITIAN Tesla HPC cluster

14 YITIAN desktop HPC Personal HPC Targeting personal HPC application Cooperating computing CPU + GPU Hybrid computing Software stack Mature ecosystem including R&D, software and hardware Silencing and cooling Emphasis on silencing and cooling

15 More power, lower price YITIAN desktop HPC YITIAN personal HPC v.s. traditional cluster YITIAN personal HPC Traditional X86 Cluster Performance Tera-level Tera-level Size Regular desktop 42U rack Room requirement None, regular office Server room Cooling requirement None Air conditioning Power consumption 700W peak 7000w Noise 22~45db Above 60db Price RMB 50,000 RMB 500,000

16 Need even more power? - YITIAN Tesla HPC cluster TIANSUO 10000 Flexible cluster system Consist of YITIAN Tesla HPC cluster Scalability with GPU acceleration YITIAN Tesla HPC cluster S1070 with 4 C1060 4TFLOPS,16GB memory Industry solution Supply more power as the foundation of your application

17 YITIAN market Market Life science Graphics Finance Engineering science Computational Fluid Dynamic Defense Medical Imaging Oil & Gas Virtualization Electronic design automation Application Field YITIAN solution Industry Research Academic Chemicals & petroleum Finance

18 Applications of YITIAN Customer Applications Description State key laboratory of information security, CAS Computer network information center, CAS Information security, cryptography Network security Port application from CPU to GPU Typical 10x, up to 100x times speedup Multi-pattern matching algorithm for network security. 2.5 times speed up State key laboratory of space weather, CAS meteorology N/A National University of defense technology Beijing Institute of Genomics, CAS Radar system and electromagnetic countermeasure Bio-information Application porting on-going BLASTN algorithm acceleration, Joint development with Inspur, paper published at HPCChina 2009 XI AN jiaotong University CFD N/A JINAN University 54-th institute, CETC Bio-information N/A

19 Inspur offers Outsourcing Consultant Hardware Total HPC Solution Training Software

20 Thanks!