GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

Similar documents
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

General Purpose Computing on Graphical Processing Units (GPGPU(

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

GPUs and GPGPUs. Greg Blanton John T. Lubia

CS516 Programming Languages and Compilers II

Technology for a better society. hetcomp.com

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CS 179: GPU Programming

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

Experts in Application Acceleration Synective Labs AB

Real-time Graphics 9. GPGPU

GPU Clouds IGT Cloud Computing Summit Mordechai Butrashvily, CEO 2009 (c) All rights reserved

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Addressing Heterogeneity in Manycore Applications

Real-time Graphics 9. GPGPU

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

CUDA Programming Model

General Purpose GPU Computing in Partial Wave Analysis

Thinking Outside of the Tera-Scale Box. Piotr Luszczek

CS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay

Introduction to GPU hardware and to CUDA

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

Release Notes Compute Abstraction Layer (CAL) Stream Computing SDK New Features. 2 Resolved Issues. 3 Known Issues. 3.

Introduction to CUDA

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

GPGPU/CUDA/C Workshop 2012

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

SPOC : GPGPU programming through Stream Processing with OCaml

GPGPU. Peter Laurens 1st-year PhD Student, NSC

Real-Time Rendering Architectures

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPU for HPC. October 2010

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.

Using Graphics Chips for General Purpose Computation

GPU programming. Dr. Bernhard Kainz

Accelerating image registration on GPUs

High Performance Computing with Accelerators

Numerical Algorithms on Multi-GPU Architectures

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

ECE 574 Cluster Computing Lecture 16

Martin Kruliš, v

High Performance Graphics 2010

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Introduction to CELL B.E. and GPU Programming. Agenda

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

Preparing seismic codes for GPUs and other

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Stream Computing using Brook+

Martin Dubois, ing. Contents

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

Anatomy of AMD s TeraScale Graphics Engine

Current Trends in Computer Graphics Hardware

Accelerated k-means Clustering on Multi-Core CPU & GPGPU

Antonio R. Miele Marco D. Santambrogio

GPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Auto-tunable GPU BLAS

EE , GPU Programming

GPU Computing Master Clss. Development Tools

Parallel and Distributed Programming Introduction. Kenjiro Taura

Threading Hardware in G80

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA)

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

Getting Started with Intel SDK for OpenCL Applications

Mathematical computations with GPUs

Introduction to GPGPU and GPU-architectures

Martin Kruliš, v

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

General-purpose computing on graphics processing units (GPGPU)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

Graphics Hardware 2008

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

Introduction to Multicore Programming

From Shader Code to a Teraflop: How Shader Cores Work

Introduction to General-Purpose Computation Using GPU (GPGPU) Brook+: The Programming Language and Compiler Implementation

Ati opengl 1.4 windows 7 32bit. Ati opengl 1.4 windows 7 32bit.zip

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Scientific discovery, analysis and prediction made possible through high performance computing.

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture

CME 213 S PRING Eric Darve

Introduction to CUDA

Fundamental CUDA Optimization. NVIDIA Corporation

Exotic Methods in Parallel Computing [GPU Computing]

CUDA (Compute Unified Device Architecture)

Real-Time Support for GPU. GPU Management Heechul Yun

Fundamental CUDA Optimization. NVIDIA Corporation

Graphics Processor Acceleration and YOU

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Transcription:

GPGPU, 4th Meeting Mordechai Butrashvily, CEO moti@gass-ltd.co.il GASS Company for Advanced Supercomputing Solutions

Agenda 3rd meeting 4th meeting Future meetings Activities All rights reserved (c) 2008 - Mordechai

3rd meeting Dr. Avi Mendelson presented Intel Larrabee architecture Covered hardware details and design information All rights reserved (c) 2008 - Mordechai

4th meeting GPU computing with AMD (ATI) StreamComputing programming CAL.NET FireStream platform GPGPU for IT Questions All rights reserved (c) 2008 - Mordechai

Future meetings Software stacks and frameworks by NVIDIA and ATI: o CUDA - o StreamComputing - Upcoming OpenCL standard Developments and general talks about programming and hardware issues More advanced topics Looking for ideas All rights reserved (c) 2008 - Mordechai

Activities Basis for a platform to exchange knowledge, ideas and information Cooperation and collaborations between parties in the Israeli industry Representing parties against commercial and international companies Training, courses and meetings with leading companies All rights reserved (c) 2008 - Mordechai

AMD Hardware GPU Computing for programmers

AMD Hardware HD3870 HD4870 HD4870 X2 FirePro V8700 FireStream 9250 Core# 320 800 1600 800 800 Tflops 0.5 1.2 2.4 1.2 1.2 Core Freq. 775 Mhz 750 Mhz 750 Mhz 750 Mhz 750 Mhz Memory 0.5 GB 1 GB 2 GB 1 GB 2 GB Bandwidth 72 GB/s 115 GB/s 115 GB/s 108 GB/s 108 GB/s Power 110 W 184 W 200 W 180 W 180 W Price 150$ 300$ 550$ 2000$ 1000$

Stream Processor For example, HD3870, 320 cores: 4 SIMD engines 16 thread processors each 5 stream cores per thread

Stream Core

GPU Performance ATI formula 320 cores Each runs at 775 Mhz 1 MAD per cycle FLOPS = Cores * Freq. * FLOPS_CYCLE FLOPS = 320 * 775e6 * 2 FLOPS = 496 GFLOPS

AMD software stack GPU Computing for programmers

Agenda Software stack for GPU computing: StreamComputing SDK Brook+ Compiler, Assembler Basic examples Documentation CAL Driver Hardware

CAL Driver CAL Close to Metal Allows direct communication with GPU hardware - without using graphics API Located on top of the display driver Very similar functionality as CUDA API History - ATI was first to introduce a lowlevel interface to GPU

StreamComputing SDK Provides the runtime required to run Stream based solutions Supporting all GPUs starting from RV600 (Radeon HD2x00 series)

Cont. Includes: Brook+ compiler (brcc, an extension to Brook) C based syntax, can integrate into existing applications Assembler for IL language Documentation Runtime library

Cont. Supported platforms: Windows XP 32/64 bit Windows Vista 32/64 bit Linux 32/64 bit

StreamComputing Programming Syntax, capabilities etc.

Agenda What is StreamComputing? Why is it good? What can be done with it? Summary of capabilities

What is StreamComputing? Can be considered as another shader language for GPUs Providing low level access to the hardware Without knowing graphics API (DX, GL) A framework that provides: Development tools Runtime Defines a language

Why is it good? Provides low level access to the GPU hardware Much faster than traditional Graphics API Language that is specific for computing, without graphics terms C/C++ based syntax Porting existing code isn t that difficult

What can be done with it? Mostly stream operations Using StreamComputing we can: Allocate and transfer memory between a device and host Run specific kernel s (math computations) Configure the amount of cores to utilize Access DirectX and resources (texture data) during process

Short example Matrix multiplication

Thecode kernel void simple_matmult(float Width, float A[][], float B[][], out float result<>) { // vpos - Position of the output matrix i.e. (x,y) float2 vpos = indexof(result).xy; // index - coordinates of A & B from where the values are fetched float4 index = float4(vpos.x, 0.0f, 0.0f, vpos.y); // step - represents the step by which index is incremented float4 step = float4(0.0f, 1.0f, 1.0f, 0.0f); float accumulator = 0.0f; float i0 = Width; while(i0 > 0) { // A[i][k] * B[k][j] accumulator += A[index.zw]*B[index.xy]; index += step; i0 = i0-1.0f; } } // Writing the result back to the buffer result = accumulator;

CAL.NET.NET library for CAL and StreamComputing

CAL.NET A.NET library that provides access to AMD GPU hardware from: Windows XP/Vista Linux Manage devices, allocate memory, load and execute code

CAL.NET The future OpenCL standard gains popularity Several providers for high level languages: CUDA.NET CUDA & NVIDIA CAL.NET StreamComputing & AMD Providing a unified interface for GPU hardware for other technologies as well: Java, C++, FORTRAN etc

FireStream Platform Hardware platform for GPU computing

Agenda FireStream, another GPU card Current products Future products

FireStream, another GPU card Not just Another class of GPU cards, between gaming (Radeon) and professional (FireGL) No screen output, meant for computations only The recommended solution for GPU computing!

Some history AMD was first to announce one year ago a card with double precision BUT, it provided 0.5 TFlops and costs 2000$

Current products FireStream 9170 GPU# 1 Cores 320 Memory Performance Bandwidth 1 GB 0.5 TFlops ~75 GB/s Price 2000$

Future products FireStream 9250 GPU# 1 Cores 800 Memory Performance Bandwidth 2 GB 1 TFlops 108 GB/s Price ~1000$

GPGPU for IT GPU Computing in Organizations

Agenda GPU computing solutions Implementing GPU environment IT services

GPU computing solutions Like covered previously FireStream 9250 single GPU in a workstation Or embed within a 1U server having PCIe

Implementing GPU environment www.grid.org.il Organization usually need to implement a large scale GPU solution What about maintenance? And other IT services Training?...

IT services This issues are being solved nowadays as organizations start to think about GPU solutions At the end, these services will help: Choose the correct hardware Train your IT personnel Know how to manage replacement Monitor GPU as network resources The goal is to help executives have a solid ground for using GPUs in their solutions!

Cont. Hybrid cluster solutions (Servers with integrated FireStream) by global vendors Support for systems with replacement parts available immediately

Summary GPU computing using AMD solutions is improving Providing both hardware and software Very cost-effective solutions compared to CPU and GRID All rights reserved (c) 2008 - Mordechai

Questions