The Cray Rainier System: Integrated Scalar/Vector Computing

Similar documents
HPCS HPCchallenge Benchmark Suite

The Cascade High Productivity Programming Language

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Update on Cray Activities in the Earth Sciences

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Use of Common Technologies between XT and Black Widow

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Compute Node Linux: Overview, Progress to Date & Roadmap

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Overview of Tianhe-2

IBM Power Systems HPC Cluster

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Scalable Computing at Work

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Cray XT3 for Science

PRIMEPOWER Server Architecture Excels in Scalability and Flexibility

Microsoft Office SharePoint Server 2007

These slides contain projections or other forward-looking statements within the meaning of Section 27A of the Securities Act of 1933, as amended, and

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Unifying Heterogeneous Resources Moab Con Scott Jackson Engineering

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

W H I T E P A P E R S e r v e r R e f r e s h t o M e e t t h e C h a n g i n g N e e d s o f I T?

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

The Red Storm System: Architecture, System Update and Performance Analysis

High Performance Computing The Essential Tool for a Knowledge Economy

High Performance Computing from an EU perspective

Clusters of SMP s. Sean Peisert

HPCS HPCchallenge Benchmark Suite

Introduction to High-Performance Computing

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance

EMC Symmetrix DMX Series The High End Platform. Tom Gorodecki EMC

High Performance Computing Course Notes Course Administration

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

High Performance Computing Course Notes HPC Fundamentals

FPGA-based Supercomputing: New Opportunities and Challenges

REMEM: REmote MEMory as Checkpointing Storage

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

An Introduction to GPFS

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

MOHA: Many-Task Computing Framework on Hadoop

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Using an HPC Cloud for Weather Science

Cray RS Programming Environment

ET International HPC Runtime Software. ET International Rishi Khan SC 11. Copyright 2011 ET International, Inc.

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

ECMWF's Next Generation IO for the IFS Model and Product Generation

Networking for a dynamic infrastructure: getting it right.

An Introduction to OpenACC

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Intel High-Performance Computing. Technologies for Engineering

RapidIO.org Update. Mar RapidIO.org 1

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

Lecture 9: MIMD Architecture

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Eldorado. Outline. John Feo. Cray Inc. Why multithreaded architectures. The Cray Eldorado. Programming environment.

Porting and Optimisation of UM on ARCHER. Karthee Sivalingam, NCAS-CMS. HPC Workshop ECMWF JWCRP

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Lecture 9: MIMD Architectures

IBM System Storage DS5020 Express

High Performance Computing with Accelerators

Network Bandwidth & Minimum Efficient Problem Size

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

IBM TotalStorage Enterprise Storage Server Model 800

Top500 Supercomputer list

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Stockholm Brain Institute Blue Gene/L

Lecture 9: MIMD Architectures

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Medical practice: diagnostics, treatment and surgery in supercomputing centers

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Supercomputer SX-9 Development Concept

Cray Operating System and I/O Road Map Charlie Carroll

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Modular Platforms Market Trends & Platform Requirements Presentation for IEEE Backplane Ethernet Study Group Meeting. Gopal Hegde, Intel Corporation

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Barcelona Supercomputing Center

IBM Tivoli Monitoring (ITM) And AIX. Andre Metelo IBM SWG Competitive Project Office

Compilers and Compiler-based Tools for HPC

NCBI BLAST accelerated on the Mitrion Virtual Processor

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

QLogic TrueScale InfiniBand and Teraflop Simulations

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Market Scope. Magic Quadrant Methodology

IBM Db2 Analytics Accelerator Version 7.1

Making a Case for a Green500 List

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges

Transcription:

THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology

Topics Current Product Overview Cray Technology Strengths Rainier System: Integrated Scalar/Vector Computing Overview Motivation Benefits Cascade Project Overview Slide 2

Cray s Family of Supercomputers Cray X1 1 to 50+ TFLOPS 4 4,069 processors Vector processor for uncompromised sustained performance Cray XT3 1 to 50+ TFLOPS 256 10,000+ processors Compute system for large-scale sustained performance Cray XD1 48 GFLOPS - 2+ TFLOPS 12 288+ processors Entry/Mid range system optimized for sustained performance Purpose-Built High Performance Computers Slide 3

Growing the Addressable Market $7,000 $6,000 $5,000 $4,000 $3,000 5X 5X Addressable Addressable Market Market Cray X1 Cray XT3 Cray XD1 Departmental (7.4%) Divisional (5.3%) $2,000 $1,000 $0 ` 2002 2003 2004 2005 2006 2007 Taking Success Formula to to Broader HPC Market Enterprise (8.2%) Capability (2.4%) (CAGR) Source: IDC 2003 Slide 4

Recent XD1 Announcements Slide 5

Recent XT3 Announcements Slide 6

Recent X1 Announcements Slide 7

ORNL National Leadership Computing Facility Slide 8

A Wealth of Technology XD1 RapidArray and FPGA XT3 Compute PE Red Storm Architecture X1 Vector Node and Global Address Space Active Management Nodes Slide 9

Cray s Vision Scalable High-Bandwidth Computing X1 X1E 2004 BlackWidow 2006 Rainier Integrated Computing 2010 Cascade Sustained Petaflops 2004 2005 Mt. Adams Red Storm 2004 2005 XD1 Slide 10

Rainier Integrated Computing The Concept: Single system: Common infrastructure and high performance network. Common global address space. Common OS, storage and administration. Variety of compute nodes: Follow on nodes for vector line (X1/X1E) and Opteron (XT3/XD1) lines. Opteron based login, service and I/O nodes. First customer shipment in 2006. Slide 11

Integrated Product Concept Single hardware infrastructure (cabinet, power, cooling, etc.) Common high speed network Heterogeneous compute capabilities Service nodes based on COTS processors Global address space across machine Linux-based OS Vector Compute Vector Compute Scalar Compute Vector Compute Vector Compute Scalar Compute Vector Compute Vector Compute Scalar Compute Service Node Service Node Service Node Optional FC switch High Performance Global File System Scalar Compute Scalar Compute Other Compute Service Node Customer Network/Grid Slide 12

Motivation Different algorithms are appropriate for different architectures. Different requirements for: flops vs. memory bandwidth local memory size mixed MPI/SMP vs. pure MPI granularity of computation and communication regular vs. irregular memory accesses and communication network bandwidth global vs. nearest neighbor communication ability to tune application capacity vs. capability Benchmark suites are often split as to best platform. One size does not fit all. Design a single system with heterogeneous computing capabilities. Slide 13

Benefits of Integrated Computing Customer: Single solution for diverse workloads Maintain and foster architectural diversity Reduced administration and training costs Single, unified user interface and environment Better login, pre/post processing environment for vector machines More configuration and upgrade flexibility Improved performance by matching processor to the job Cray: Better focus Reduced development costs through commonality Reduced manufacturing costs through increased volumes Able to support specialized computing (vectors, MTA, FPGAs) Slide 14

Fit to Earth Science Requirements Rainier architecture offers strong fit for diverse, complex and evolving workload: Heterogeneity ideal for coupled modeling. Capability features well suited to: Production workload Advanced research requiring high-resolution and complexity (eg: high-resolution, sub-scale processes, atmospheric chemistry). Ability to explore alternative processor architectures (MT, FPGA). Architecture flexibility at upgrade points; better leveraging investment. Slide 15

Increased Modeling Complexity Rudiman, 2001 Slide 16

Increased Multi-Disciplinary Coupling MPI-M SDEM Structural Dynamical Economic Model Slide 17

Scalable, Common Infrastructure Interconnection Network Interconnection Network Opteron based Opteron I/O Service based I/O Nodes Service Nodes (OST) (OST) Opteron based Opteron Login based and Service Login and Nodes Service Nodes (Compile, Debug) (Compile, Debug) Other Compute Other Nodes Compute Other (Future) Nodes Compute Other (Future) Nodes Compute (Future) Nodes (Future) (e.g. FPGA, (e.g. Multi- FPGA, Thread) (e.g. Multi- FPGA, Thread) (e.g. Multi- FPGA, Thread) Multi- Thread) (Adams) Opteron (Adams) based Opteron (Adams) Compute based Opteron (Adams) Nodes Compute based Opteron Nodes Compute based Nodes Compute Nodes (BlackWidow) (BlackWidow) Vector based (BlackWidow) Vector Compute based (BlackWidow) Vector Nodes Compute based Vector Nodes Compute based Nodes Compute Nodes Lustre parallel filesystem Scalability and Growth Configurable network: according to need at whatever scale Grow network bandwidth over time to maintain balance Grow scalar and/or vector partitions as need develops Configure service and I/O partition and grow as need develops Slide 18

Compute Nodes Adams (Opteron scalar nodes) Excellent scalar performance Very low memory latency Many applications available (Linux + x86-64) Potential for both uniprocessor and SMP nodes, single and dual cores But, requires high degree of cache effectiveness BlackWidow (Cray vector nodes) Fast, high bandwidth processors 4-way vector SMP nodes Large local memory Supports hierarchical parallelism Latency tolerance for global and irregular references But, requires vectorizable code Other future planned compute capabilities FPGA: direct hardware execution of kernels MTA: highly threaded access to global memory Slide 19

Reliability and Scaling Features Fault detection, diagnoses and recovery Enhanced diagnostic error reporting retries for transmission-induced multi-bit errors Timeouts and self-cleansing datapaths (no cascading errors) Hardware firewalls for fault containment Secure, hierarchical boundaries between kernel groups Protects the rest of the system even if a kernel is corrupted Graceful network degradation Auto-degrade rails rather than lose a whole link Hot swappable boards and reconfigurable routing tables Full node translation tables (NTTs) Allows scheduling of parallel jobs across an arbitrary collection of processors, with efficient, scalable address translation Much higher system utilization under heavy workloads Slide 20

Cascade Project Cray Inc. Stanford Caltech/JPL Notre Dame Slide 21

High Productivity Computing Systems Goals: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 2010) Impact: Performance (efficiency): critical national security applications by a factor of 10X to 40X Productivity (time-to-solution) Portability (transparency): insulate research and operational application software from system Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, & programming errors HPCS Program Focus Areas Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology Fill the Critical Technology and Capability Gap Today (late 80 s HPC technology)..to..future (Quantum/Bio Computing) Slide 22

HPCS Phases CRAY SGI Sun HP IBM Phase I: Concept Development Forecast available technology 1 Year 2H 2002 1H 2003 Propose HPCS hw/sw concepts Explore productivity metrics Develop research plan for Phase II $3M/year CRAY Sun IBM Phase II: Concept Validation Focused R&D 3 Years 2H 2003 1H 2006 Hardware and software prototyping Experimentation and simulation Risk assessment and mitigation $17M/year?? Phase III: Full Scale Product Development 4 Years Commercially available system by 2010 $?/year Outreach and cooperation in software and applications areas 2H 2006 2010 The HPCS program lets us explore technologies we otherwise couldn t. A three year head start on typical development cycle. Slide 23

Cray s Approach to HPCS High system efficiency at scale Bandwidth is the most critical and expensive part of scalability Enable very high (but configurable) global bandwidth Design processor and system to use this bandwidth wisely Reduce bandwidth demand architecturally High human productivity and portability Support legacy and emerging languages Provide strong compiler, tools and runtime support Support a mixed UMA/NUMA programming model Develop higher-level programming language and tools System robustness Provide excellent fault detection and diagnosis Implement automatic reconfiguration and fault containment Make all resources virtualized and dynamically reconfigurable Slide 24

Summary Cray offers a unique range of science-driven technologies: XD1, XT3, X1/X1E Rainier architecture offers strong fit for diverse, complex and evolving earth sciences workload. Cray continues to support sustained innovation to meet the needs of the scientific community: Rainier: Integrated computing capability in 2006 Cascade: Aggressive research program for 2010 Slide 25

What Do You Need To Know? Slide 26