CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Size: px

Start display at page:

Download "CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar"

Lauren Bradley
6 years ago
Views:

1 CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar

2 CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary

3 INTRODUCTION The Cray XK6 supercomputer is a trifecta of scalar, network and many-core innovation. Hybrid supercomputer Combination of: Cray s Gemini interconnect, AMD's leading multi-core scalar processors and NVIDIA s powerful many-core GPU processors Enhanced version of XE6 Uses Blade architecture as in Cray XE6 Capable of scaling to 500,000 scalar processors and 50 petaflops of hybrid peak performance

HISTORY In 1988, Cray Research introduced Cray Y-MP, the world's first supercomputer Sustained over 1 gigaflop on many applications Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector

4 HISTORY In 1988, Cray Research introduced Cray Y-MP, the world's first supercomputer Sustained over 1 gigaflop on many applications Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7 gigaflops per processor. The Hitachi SR2201: peak performance of 600 gigaflops in 1996 by using 2048 The Intel Paragon had 1000 to 4000 Intel i860 processors, was ranked the fastest in the world in 1993

5 SUPER-COMPUTER STATISTICS

6 COMPARISON WITH THE PRESENT CRAY SUPERCOMPUTERS

7 CRAY XK6- ARCHITECTURE Four nodes per blade Adaptive hybrid computing Scalable compute nodes, I/Os Gemini Mezzanine Plug compatible with Cray XE6 blade Configurable processor, memory and SXM GPU AMD Opteron 6200 Series processor: Highly associative on-chip data cache supports aggressive out-of-order execution Integrated memory controller Significant performance advantage to algorithms The NVIDIA Tesla 20-series: Based on the next generation CUDA GPU architecture codenamed Fermi

8 NODE- ARCHITECTURE

9 XK6 ACCELERATOR BLADE

10 GEMINI INTERCONNECTION NETWORK

11 GEMINI INTERCONNECTION NETWORKS Each node acts as 2 nodes on a 3D Torus Each Node provided with a High Radix YARC router to support up to 168 Gbps. Parallel electrical and optical paths High Bandwidth and lower latency for both long and short messages Low cost of integration Gemini Mezzanine card to avoid memory ICN bottlenecks.

12 NVIDIA TESLA X2090 Special Embedded version of Tesla M2090. Provides High Performance Computing for highly parallel applications. 448 cores with 6 GB GDDR5 Memory. Can support up to 600+ GFLOPs High Bandwidth to host Quick Master-Slave Communication. CUDA capable for easy programmability.

13 CRAY XK6 CABINETS Each cabinet has up to 96 processors Two processors wrapped in the form of a blade (XE6 compatible) With 1536 cores, can give 70+ TFLOPs performance

14 SPECIFICATIONS

15 SPECIFICATIONS

16 PERFORMANCE- LUDWIG 10 cabinets of Cray XK6 936 GPUs (nodes) Only 4% deviation from perfect scaling between 8 and 936 GPUs Application sustaining 40+ Tflop/s and still scaling... Strong scaling also very good, but physicists want to simulate larger systems

17 PERFORMANCE - HIMENO Parallel 3D Poisson equation solver benchmark iterative loop evaluating 19-point stencil Co-Array Fortran version of code Fully ported to accelerators using 27 directive pairs Strong scaling Use asynchronous GPU data transfers and kernel launches to help avoid this

18 INDUSTRIAL ACCEPTANCE Oak Ridge National Laboratory Jaguar/TITAN High computation capacity for Scientific research 200 cabinets with > nodes. Estimated PFLOPs Currently upgrading from XT5 based Jaguar system to XK6 based Titan system with increased performance.

19 INDUSTRIAL ACCEPTANCE

20 INDUSTRIAL ACCEPTANCE CSCS- Swiss National Super Computing Centre Cray XE6 402 Tflops 1496 nodes Gemini Interconnects Cray XK6 176 nodes with one AMD and one GPU element each

21 SUMMARY Higher Supercomputing potential with GPU Accelerated computing Better Inter node communication with the Gemini Optical interconnects Backward compatible with XE6 cabinets and can be merged with XE6 systems. Highly suited to Scientific Research computations requiring high computational power of the order of 100s TFLOPs

22 REFERENCES CrayXK6Brochure.pdf Applications on Cray XK6, Roberto Ansaloni

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences