Cray Operating System Plans and Status. Charlie Carroll May 2012

Similar documents
Cray Operating System and I/O Road Map Charlie Carroll

Workload Managers. A Flexible Approach

Illinois Proposal Considerations Greg Bauer

Reducing Cluster Compatibility Mode (CCM) Complexity

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Blue Waters System Overview. Greg Bauer

Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer.

Wednesday : Basic Overview. Thursday : Optimization

Scaling Across the Supercomputer Performance Spectrum

MPI for Cray XE/XK Systems & Recent Enhancements

Future Trends in Hardware and Software for use in Simulation

Compiling applications for the Cray XC

Introduction to Cray Data Virtualization Service S

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

*University of Illinois at Urbana Champaign/NCSA Bell Labs

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Using Resource Utilization Reporting to Collect DVS Usage Statistics

Progress Report on Transparent Checkpointing for Supercomputing

Batch environment PBS (Running applications on the Cray XC30) 1/18/2016

Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Toward Understanding Life-Long Performance of a Sonexion File System

Compute Node Linux (CNL) The Evolution of a Compute OS

Cray XC System Node Diagnosability. Jeffrey J. Schutkoske Platform Services Group (PSG)

Running applications on the Cray XC30

Is Petascale Complete? What Do We Do Now?

Blue Waters I/O Performance

Practical: a sample code

HPC Architectures. Types of resource currently in use

Our Workshop Environment

Running Jobs on Blue Waters. Greg Bauer

MPI on the Cray XC30

Cray XC System Diagnosability Roadmap

Porting SLURM to the Cray XT and XE. Neil Stringfellow and Gerrit Renker

Slurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)

HPC NETWORKING IN THE REAL WORLD

Blue Waters System Overview

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

Preparing GPU-Accelerated Applications for the Summit Supercomputer

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Lecture 20: Distributed Memory Parallelism. William Gropp

First steps on using an HPC service ARCHER

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Initial Performance Evaluation of the Cray SeaStar Interconnect

Linux HPC Software Stack

BMC Configuration Management (Marimba) Best Practices and Troubleshooting. Andy Santosa Senior Technical Support Analyst

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

Application Acceleration Beyond Flash Storage

Breakthrough Science via Extreme Scalability. Greg Clifford Segment Manager, Cray Inc.

Execution Models for the Exascale Era

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Advanced Job Launching. mapping applications to hardware

Xyratex ClusterStor6000 & OneStor

Performance Measurement and Analysis Tools Installation Guide S

Lustre A Platform for Intelligent Scale-Out Storage

8/19/13. Blue Waters User Monthly Teleconference

Cray Programming Environment User's Guide S

System input-output, performance aspects March 2009 Guy Chesnot

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Organizational Update: December 2015

GPU-centric communication for improved efficiency

Process Description and Control. Chapter 3

Data Management Platform (DMP) Administrator's Guide S 2327 B

The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality

An Introduction to OpenACC

User Training Cray XC40 IITM, Pune

UCX: An Open Source Framework for HPC Network APIs and Beyond

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Management Scalability. Author: Todd Rimmer Date: April 2014

EMC VPLEX Geo with Quantum StorNext

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Feature Comparison Summary

Red Hat Enterprise Virtualization and KVM Roadmap. Scott M. Herold Product Management - Red Hat Virtualization Technologies

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

HTCondor on Titan. Wisconsin IceCube Particle Astrophysics Center. Vladimir Brik. HTCondor Week May 2018

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

The advantages of architecting an open iscsi SAN

CSCS HPC storage. Hussein N. Harake

The Last Bottleneck: How Parallel I/O can improve application performance

Experiences Running and Optimizing the Berkeley Data Analytics Stack on Cray Platforms

Guillimin HPC Users Meeting. Bryan Caron

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

Smarter Clusters from the Supercomputer Experts

Open MPI for Cray XE/XK Systems

Evaluating Shifter for HPC Applications Don Bahls Cray Inc.

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Using DDN IME for Harmonie

OpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Blue Waters Super System

Batch Scheduling on XT3

EXPERIENCES WITH NVME OVER FABRICS

CUDA. Matthew Joyner, Jeremy Williams

Transcription:

Cray Operating System Plans and Status Charlie Carroll May 2012

Cray Operating Systems and I/O Compute Node Linux NVIDIA GPU driver Compute node Service node OS File systems: Lustre Networking HSN: Gemini and Aries (GNI, DMAPP) IBGNI (IB verbs -> Gemini) TCP/IP GPFS Panasas Batch systems Third-party extensions Operating system services Core specialization Dynamic Shared Library (DSL) support Cluster Compatibility Mode DVS (Data Virtualization Service) System management ALPS (Application-Level Placement Scheduler) Node Health Checker (NHC) CMS (Cray Management Services) Command interface Hardware Supervisory System Handling errors, resiliency Event routing Booting the system 2

Cray Operating Systems Focus Performance Maximize compute cycles delivered to applications while also providing necessary services Lightweight operating system on compute node Standard Linux environment on service nodes Optimize network performance through close interaction with hardware GPU infrastructure to support high performance Stability and Resiliency Correct defects which impact stability Implement features to increase system and application robustness Scalability Scale to large system sizes without sacrificing stability Provide system management tools to manage complicated systems 3

Accomplishments of the Past Year CLE 4.0, UP01, UP01A, UP02, UP03 Support for AMD s Interlagos, used in our currently shipping systems Nvidia Fermi GPU support Cray Sonexion introduction DVS performance improvements Memory Control Groups reduced OOM issues Cluster Compatibility Mode and ISV Application Acceleration Quality improvements Resiliency features improving system reliability Moving to fewer PMs (preventive maintenance) and less down-time Steve Johnson s talk follows this one Aries bring-up Programming Environments coordination Resiliency features, GPUs, DSLs (dynamic shared libraries) External servers Hiring 4

Demographics (April 2012) Year Release Systems Cabinets 2007 CLE 2.0 3 4 2008 CLE 2.1 6 118 2009 CLE 2.2 37 262 2010 CLE 3.1 27 276 2011 CLE 4.0 48 720 5

Cray Software 2011 Q1 Q2 Q3 Q4 Cray Linux Environment CLE 4.0 2012 Q1 Q2 Q3 Q4 Koshi (XE/XK) Nile 2013 Q1 Q2 Q3 Q4 Cascade GA Ohio (XE & Cascade) 2014 Q1 Q2 Q3 Q4 Pearl 2015 Q1 Q2 Q3 Q4 Rhine Cray Programming Environment Eagle Erie Fremont Hiawatha Itasca Cray System Management SMW 6.0 Denali Olympic Pecos Redwood XE - Gemini XK - Gemini Cascade - Aries 6

CLE Koshi Features Supports Cray XE/XK systems Releases in December 2012 Kernel features Compute Unit Affinity Compute Node Cleanup (CNCU) Faster warm boots Lustre 1.8.7; Lustre 2.2 client Application resiliency features Lightweight Log Manager (LLM) Accelerator features Nvidia Kepler support Soft GPU reset GPU memory scrub CCM/IAA improvements 7

CLE Nile Features Supports Cray Cascade systems General Availability (GA) in March 2013 Based on Intel processors and Cray s new Aries interconnect Kernel features Compute Unit Affinity (Intel Hyperthreads) Compute Node Cleanup (CNCU) Lustre 2.2 client Application resiliency features Lightweight Log Manager (LLM) Aries HSN (high-speed network) features Deadlock Avoidance (DLA) Aries collective engine CCM/IAA improvements Power management features 8

Application Resiliency: Two Steps Application Relaunch Current behavior: node dies -> application dies New behavior Node dies Application torn down If flag set, relaunch the job App restarts from application s checkpoint file No need to wait again in the job queue ALPS Reconnect Current behavior: node dies -> application dies New behavior Node dies If flag set, ALPS rebuilds its communication tree Passes new info to PMI, which rebuilds its communication tree PMI passes failure info to the programming model CHARM++ will be the initial programming model 9

Cascade Power Management Enhanced power monitoring Take advantage of power features in Intel and Aries chips Job power profiling Tie power usage data to job data Static system power capping Use case: data center has a hard power limit P-state control at job launch Run a particular job in a particular p-state Idle node power conservation If not needed for a while, turn it off Requires involvement of the batch system software 10

Lustre Road Map (in brief) Cray XE/XK systems with direct-attached Lustre Customers today are running a variety of CLE levels Can stay or move to CLE 4.0 UP03 (with Lustre 1.8.6) In December 2012 can move to Koshi (with Lustre 1.8.7) Patch support for CLE 4.0 is available through mid-2013 Patch support for Koshi is available through mid-2014 Cray XE/XK systems with external Lustre file systems (esfs) Can run any CLE release through and including Koshi in December 2012 Can run later versions of Lustre by: Upgrading their esfs to ESF running Lustre 2.x Upgrading their Cray XE/XK to Koshi and running its Lustre 2.x client Cray XE/XK systems with Sonexion devices Can run any release from CLE 4.0 UP03 through the last Ohio update in 2014 11

External Servers Three new products EsLogin EsFS (external Lustre servers) EsMS (management server for Bright Cluster Manager) Synchronicity EsLogin releases are tied to CLE (through OS levels) EsFS releases are tied to Lustre releases EsMS releases are tied to BCM releases Transition to new products EsLogin and EsFS releases in December with Koshi New shipments in 2H12 will use new products 12

Upcoming CUG Events Mon 1:30 in Hamburg Reliability and Resiliency of Cray XE6 and XK6 Systems Mon 3:00 in Köln Getting Up and Running with Cluster Compatibility Mode (CCM) Tue 1:00 in Köln Entire track is devoted to CCM Tue 1:00 in Hamburg Cray s Lustre Support Model and Road Map Tue 3:30 in Hamburg Minimizing Lustre Ping Effects at Scale Thu 1:00 in Köln The Year in Review in Cray Security Thu 2:00 in Hamburg Node Health Checker 13

OSIO Road Map Summary Koshi will release in December 2012 XE/XK systems Nile will release in March 2013 Cascade systems Three new products added for external servers ESL, ESF and ESM Ohio is planned for 2H13 and 1H14 Features: application resiliency and power management 14

Thank You

Dynamic Linking Make it possible for Cray customers to link dynamically (done) Improve our implementation of dynamic linking (2011) Name and install files in standard Linux ways Improve application startup time Static or dynamic can be configured as default site-by-site Make dynamic linking the default in CCE in mid-2013 No longer ship static libraries (TBD) 16

Power Management Ideas Power down unused node resources at job startup Example: job not using all the cores, then turn off unused cores In-band power monitoring Job-based power consumption accounting Job-based power capping Power profiling tools API for low-latency user space performance/power tradeoffs Auto-tuning for power optimization Network power scaling and control 17

IO node compute node Cray I/O Models Application Application Application Application Lustre Client Lustre Client Lustre Client DVS Client HSN HSN HSN HSN Lustre Server Lustre Router Lustre Router DVS Server ldiskfs NAS Client IB IB IB/Enet Lustre Server Lustre Server NAS Server Disk FS Disk FS Disk FS RAID Controller RAID Controller RAID Controller RAID Controller Direct-Attach Lustre External Lustre Lustre Appliance Alternate External File Systems (GPFS, Panasas, NFS) 18