ANL Site Update. ScicomP/SPXXL Summer Meeting Ray Loy Ben Allen Gordon McPheeters Jack O'Connell William Scullin Tim Williams

Size: px
Start display at page:

Download "ANL Site Update. ScicomP/SPXXL Summer Meeting Ray Loy Ben Allen Gordon McPheeters Jack O'Connell William Scullin Tim Williams"

Transcription

1 ANL Site Update ScicomP/SPXXL Summer Meeting 2016 Ray Loy Ben Allen Gordon McPheeters Jack O'Connell William Scullin Tim Williams

2 Site Update System overview Support topics GPFS AFM Burst Buffer GHI deployment Performance Portability 2

3 ALCF Resources 3

4 ALCF Roadmap CORAL Theta interim system (late 2016) Cray/Intel KNL >=2500 nodes, >= 60 cores/node (8.5PF) Peak comparable to Mira Aurora late 2018 ALCF-3, successor to Mira Cray/Intel KNH >=50K nodes Blue Gene will continue to be a primary resource through the installation of Aurora Both HW and SW considerations for extended life 4

5 Support/Requirements 5

6 GPFS on BG/Q Current: ESS GPFS ; DDNs, BG/Q GPFS 3.5 Cannot upgrade ESS without BG/Q GPFS >=4.1 End-of-service for GPFS 3.5 coming up in 2017 BG/Q will continue in service well beyond that Support for GPFS 4.1 (or 4.2) on BG/Q IONs would be very helpful Concern that fixes for GPFS and AFM at GPFS 4.2 will continue to be back-ported to GPFS in a timely manner ANL supports SPXXL 2016 requirement (Ticket 21) 6

7 Compilers IBM XL Compilers for Blue Gene/Q XL Fortran 14.1 Current BG/Q version August 2015 AIX, Linux last update April/May 2016 XL C/C Current BG/Q version August 2015 AIX, Linux last update April 2016 Hoping for BG/Q update 7

8 PMRs PMR getdents/getdents64 alias conflict with toolchain [Apr 2016] Success - Fixed in V1R2M4 PMR No way to list contents of node's /dev/shmem or /dev/persistent from a CNK program (Can only access from off-node using CDTI) Negotiating with IBM on a solution (stalled on ANL) 8

9 Implementing GPFS AFM as a Burst Buffer 9

10 ALCF Operational File System Configuration There are three main production level GPFS file systems: mira-fs0 19 PiB mira-fs1-7 PiB mira-home 1.1 PiB These are based on DDN 12K-E (Embedded NSD servers) storage. These file systems are mounted to all the BG IO servers, a Cray visualization cluster, Globus DTN nodes and HPSS data movers. symlinks to the project filesets are used to mask the underlying file system from the users. /project contains links to mira-fs0 and mirafs1 filesets. 10

11 The big picture Current Step 1 AFM Step 2 GHI A,B,C,D,E,F,G,H 8PB, 400 GB/s fs2 (30 ESS) A,B,C,D,E,F,G,H 8PB, 400 GB/s fs2 (30 ESS) A,C,D,G,H 21PB,240 GB/s fs0 (16 DDN) B,E,F 7PB, 90 GB/s fs1 (6 DDN) A,C,D,G,H 21PB,240 GB/s fs0 (16 DDN) B,E,F 7PB, 90 GB/s fs1 (6 DDN) A,C,D,G,H 21PB,240 GB/s fs0 (16 DDN) B,E,F 7PB, 90 GB/s fs1 (6 DDN) HPSS via GHI 11

12 ESS 50,000 foot view Terminology: AFM: Active File Management cache fs2, non-permanent location home fs0, fs1, permanent location; not /home, which is confusing fs2 is a cache; Like any other cache, files can be evicted; AFM ensures there is a good copy in home before eviction. Eviction is policy driven; Our policies will probably be fairly simple (high water / lower water and LRU), but most any file system parameter can be checked) AFM is constantly syncing files between the cache and home Default is every 15 seconds, but is alterable Basically replays events from the cache on home (metadata updates, block changes, etc.) We can influence priorities of new writes vs. replication with tuning parameters, but true QoS is not here yet (it is in the roadmap) Fundamental assumption: Our file systems are big enough that recalls will be rare 12

13 Why? Overall, better experience for both the users and the admins No single point of failure We got this by adding fs1 to the existing fs0 file system All users see the same performance There is not uniform bw between fs0 and fs1, but fs2 and AFM give it back A cache miss might cause run to run variation, but all users are equally subject to this and given our cache size, the assumption is this will be rare. Minimal user/admin intervention required Ideally policies will do the right thing at the right time Example: Project data will automatically disappear, via GHI, after a project completes; Storage team doesn t need to do anything. Should trivially (single command) be able to cause the right behavior Probably based on extended attribute and policy Minimal (ideally zero) manual copying of files All handled by the file system 13

14 Implementation Overview Current State: Internal testing 2 ESS node pairs deleted from ESS and used to form a test home cluster. File system called: fm-fs1. Using AFM mode that allows for a GPFS home file system. mira-fs0 and mira-fs1 will be remotely mounted to ESS system mira-home will not be AFM managed Home file system defined as: gpfs:/// Null server list which signifies remotely mounted to cache cluster 14

15 Implementation Challenges This cluster was originally based on the GSS product which was xseries (Intel) based. Cluster was migrated from xseries to pseries (ESS) to address support concerns. Kernel Panics in mpt2sas kernel module Redhat bug: (bug) / (back port to 7.1 zstream) Difficult PD/recreate path. Much of the work was done by ALCF dealing directly with Red Hat and Avago. Up to 3 weeks between recreates. Fixed in RH 7.2 stream but since ESS won t support RH 7.2 in calendar year 2016 we needed to ask Red Hat for a port to RH 7.1 zstream. This fix is under final tests now at Red Hat. As soon as available this will be incorporated in our ESS cluster. Fixed in: kernel el7 15

16 Implementation Challenges Quotas: No communication between home and cache WRT to quota settings. Data is sync ed to home as root and root does not error on over quota Found best to turn off auto-migration on cache filesets. Prevents over-running home cache hard limits Bug found after file evicted and subsequently re-read from cache cluster it was not becoming resident again in cache. Efix under construction. 16

17 HSI deployment 17

18 File System Backup Overview Ø Mira-home is intended to store executable files and configura[on files. Ø User quota limits are enforced. Ø Data is fully protected thru GPFS metadata and data replica[on, GPFS snapshots and nightly backups. Ø Mira-fs0 and mira-fs1 are intended as intermediate-term storage for Mira/Cetus job output such as checkpoint datasets. Ø Project associated fileset quota limits are enforced. Ø Only metadata replica[on is enabled. Ø The user is responsible for archiving their own data. Ø Aaer project expira[on, quota limits are reduced, data is archived and the fileset is removed.

19 Current archiving (fs0, fs1) Users manually archive files via HSI or Globus/GridFTP When user project expires, script invokes HSI Copy to tape, shrink disk quota 90 days after expiration unmount (but still on disk) 180 days delete from disk, copy retained on tape No other system-related archiving 19

20 GHI value added Instead of scripting archiving directly, generate GPFS policy files to tell GHI when/how to migrate expired projects Improved disaster recovery Migrate everything (leave in place on disk) Optional: Implement threshold policy e.g. when fs reaches 90% start migration down to 80% (punch holes in files leaving metadata) GHI image backup create image of the entire fs only as "punched out" files (metadata) Initial restore of entire fs quickly, retrieve remaining contents on demand until caught up 20

21 GHI Deployment Status In progress (~6 mo) GHI installed and enabled on a test fs Some delays due to PMR (snapshots not deleted, deletion times out due to other processes) Other sites report issue resolved Expecting deployment this year 21

22 High Level GHI Integration View GPFS Cluster HPSS GPFS NSD Nodes GHI Session Node Data Mover Data Mover DDN Storage GHI IOM Nodes Disk Cache Tape 1GbE 10GbE QDR IB FC 4 x 6Gb/s SAS

23 Portability/Interoperability 23

24 Coordinating SC Centers Efforts Meetings of cross-labs applications readiness staff Tools and libraries working group (W. Joubert) Cross-lab training committee (F. Foerttner) Shared calendar Shared training events Manage nondisclosure, export control challenges CORAL partners, APEX partners (NERSC & Trinity) Portability March 2014 Mee[ng Apps readiness coordina[on ~15 representa[ves September 2014 Mee[ng Apps Portability Apps readiness coordina[on ~25 representa[ves January 2015 Mee[ng Next-gen hardware & soaware Portable programming ~40 center staff ~10 vendor reps 24

25 OpenMP 4.5 Source Code Portability double x[128], y[128]; #pragma omp target data map(to:x[0:64]) map(tofrom:y[0;64]) { #pragma omp target { // y computed on device } } double x[128], y[128]; #pragma omp for simd aligned(x, y: 32) for (int i=0; i<128; i++ ) { // thread s iterates à SIMD lanes } #pragma omp target data map(to:x) { #pragma omp target map(tofrom:y) #pragma omp teams #pragma omp distribute #pragma omp parallel for for (int i = 0; i < n; ++i) { y[i] = a*x[i] + y[i]; } } Host (CPU) Device (accelerator) GPGPU Intel Mic omp_get_num_devices() map maybe shared location simd standard vectorization #ifdef MANYCORE #else GPU #endif <code> 25

26 Performance Portability When directives-based approach performs: OpenMP 4.5 Use libraries or frameworks to encapsulate node level parallelism OpenMP 4.0 Libraries App Modules CUDA Vector intrinsics shmem Charm++ ADLB PETSc Chombo Trilinos MADNESS Kokkos RAJA 26

27 Effort Continuing Coordination between ESP, CAAR, NESAP. ALCC allocation at ANL, OLCF, NESAP for portability work Shared training including live/videocon Bringing NNSA labs into the discussion HPCOR (9/2015), COEPP (4/2016) SC15 Workshop on Portability Among HPC Architectures for Scientific Applications New effort on kernels/mini-apps: optimize on 2 platforms, then rework with OMP 4.5 or OpenACC, compare NekBone (ALCF), BoxLib (NERSC), DSL-based library for MD (OLCF) 27

28 Training - ATPESC Argonne Training Program on Extreme Scale Computing Intensive 2-week course held off-site Audience: doctoral students, postdocs, and computational scientists Presenters are leaders in all major areas of HPC Planning in progress for the 4 th session in August

29 The End 29

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,

More information

Storage Supporting DOE Science

Storage Supporting DOE Science Storage Supporting DOE Science Jason Hick jhick@lbl.gov NERSC LBNL http://www.nersc.gov/nusers/systems/hpss/ http://www.nersc.gov/nusers/systems/ngf/ May 12, 2011 The Production Facility for DOE Office

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler The Blue Water s File/Archive System Data Management Challenges Michelle Butler (mbutler@ncsa.illinois.edu) NCSA is a World leader in deploying supercomputers and providing scientists with the software

More information

Configuring IBM Spectrum Protect for IBM Spectrum Scale Active File Management

Configuring IBM Spectrum Protect for IBM Spectrum Scale Active File Management IBM Spectrum Protect Configuring IBM Spectrum Protect for IBM Spectrum Scale Active File Management Document version 1.4 Dominic Müller-Wicke IBM Spectrum Protect Development Nils Haustein EMEA Storage

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

HPC Storage Use Cases & Future Trends

HPC Storage Use Cases & Future Trends Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively

More information

HPSS Development Update and Roadmap 2017

HPSS Development Update and Roadmap 2017 HPSS Development Update and Roadmap 2017 User Focused, Forward Looking http://www.hpss-collaboration.org 1 Disclaimer Forward looking information including schedules and future software reflect current

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

Present and Future Leadership Computers at OLCF

Present and Future Leadership Computers at OLCF Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Early X1 Experiences at Boeing. Jim Glidewell Information Technology Services Boeing Shared Services Group

Early X1 Experiences at Boeing. Jim Glidewell Information Technology Services Boeing Shared Services Group Early X1 Experiences at Boeing Jim Glidewell Information Technology Services Boeing Shared Services Group Early X1 Experiences at Boeing HPC computing environment X1 configuration Hardware and OS Applications

More information

ALCF Operations Best Practices and Highlights. Mark Fahey The 6 th AICS International Symposium Feb 22-23, 2016 RIKEN AICS, Kobe, Japan

ALCF Operations Best Practices and Highlights. Mark Fahey The 6 th AICS International Symposium Feb 22-23, 2016 RIKEN AICS, Kobe, Japan ALCF Operations Best Practices and Highlights Mark Fahey The 6 th AICS International Symposium Feb 22-23, 2016 RIKEN AICS, Kobe, Japan Overview ALCF Organization and Structure IBM BG/Q Mira Storage/Archive

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

LLVM and Clang on the Most Powerful Supercomputer in the World

LLVM and Clang on the Most Powerful Supercomputer in the World LLVM and Clang on the Most Powerful Supercomputer in the World Hal Finkel November 7, 2012 The 2012 LLVM Developers Meeting Hal Finkel (Argonne National Laboratory) LLVM and Clang on the BG/Q November

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

TSM HSM Explained. Agenda. Oxford University TSM Symposium Introduction. How it works. Best Practices. Futures. References. IBM Software Group

TSM HSM Explained. Agenda. Oxford University TSM Symposium Introduction. How it works. Best Practices. Futures. References. IBM Software Group IBM Software Group TSM HSM Explained Oxford University TSM Symposium 2003 Christian Bolik (bolik@de.ibm.com) IBM Tivoli Storage SW Development 1 Agenda Introduction How it works Best Practices Futures

More information

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming

More information

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to

More information

S Comparing OpenACC 2.5 and OpenMP 4.5

S Comparing OpenACC 2.5 and OpenMP 4.5 April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical

More information

Experiences with ENZO on the Intel Many Integrated Core Architecture

Experiences with ENZO on the Intel Many Integrated Core Architecture Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Insights into TSM/HSM for UNIX and Windows

Insights into TSM/HSM for UNIX and Windows IBM Software Group Insights into TSM/HSM for UNIX and Windows Oxford University TSM Symposium 2005 Jens-Peter Akelbein (akelbein@de.ibm.com) IBM Tivoli Storage SW Development 1 IBM Software Group Tivoli

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

Transferring a Petabyte in a Day. Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello

Transferring a Petabyte in a Day. Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Huge amount of data from extreme scale simulations and experiments Systems have

More information

Extraordinary HPC file system solutions at KIT

Extraordinary HPC file system solutions at KIT Extraordinary HPC file system solutions at KIT Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools for ldiskfs investigation

More information

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

LLVM for the future of Supercomputing

LLVM for the future of Supercomputing LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

Lustre usages and experiences

Lustre usages and experiences Lustre usages and experiences at German Climate Computing Centre in Hamburg Carsten Beyer High Performance Computing Center Exclusively for the German Climate Research Limited Company, non-profit Staff:

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

SONAS Best Practices and options for CIFS Scalability

SONAS Best Practices and options for CIFS Scalability COMMON INTERNET FILE SYSTEM (CIFS) FILE SERVING...2 MAXIMUM NUMBER OF ACTIVE CONCURRENT CIFS CONNECTIONS...2 SONAS SYSTEM CONFIGURATION...4 SONAS Best Practices and options for CIFS Scalability A guide

More information

IBM Spectrum Scale Archiving Policies

IBM Spectrum Scale Archiving Policies IBM Spectrum Scale Archiving Policies An introduction to GPFS policies for file archiving with Spectrum Archive Enterprise Edition Version 8 (07/31/2017) Nils Haustein Executive IT Specialist EMEA Storage

More information

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT By Joshua Kwedar Sr. Systems Engineer By Steve Horan Cloud Architect ATS Innovation Center, Malvern, PA Dates: Oct December 2017 INTRODUCTION

More information

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment Brett Bode, Michelle Butler, Sean Stevens, Jim Glasgow National Center for Supercomputing Applications/University

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl DDN s Vision for the Future of Lustre LUG2015 Robert Triendl 3 Topics 1. The Changing Markets for Lustre 2. A Vision for Lustre that isn t Exascale 3. Building Lustre for the Future 4. Peak vs. Operational

More information

Data storage services at KEK/CRC -- status and plan

Data storage services at KEK/CRC -- status and plan Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai KEKCC System Overview KEKCC (Central Computing System) The

More information

A GPFS Primer October 2005

A GPFS Primer October 2005 A Primer October 2005 Overview This paper describes (General Parallel File System) Version 2, Release 3 for AIX 5L and Linux. It provides an overview of key concepts which should be understood by those

More information

Computation for Beyond Standard Model Physics

Computation for Beyond Standard Model Physics Computation for Beyond Standard Model Physics Xiao-Yong Jin Argonne National Laboratory Lattice for BSM Physics 2018 Boulder, Colorado April 6, 2018 2 My PhD years at Columbia Lattice gauge theory Large

More information

GPFS for Life Sciences at NERSC

GPFS for Life Sciences at NERSC GPFS for Life Sciences at NERSC A NERSC & JGI collaborative effort Jason Hick, Rei Lee, Ravi Cheema, and Kjiersten Fagnan GPFS User Group meeting May 20, 2015-1 - Overview of Bioinformatics - 2 - A High-level

More information

HPC at UZH: status and plans

HPC at UZH: status and plans HPC at UZH: status and plans Dec. 4, 2013 This presentation s purpose Meet the sysadmin team. Update on what s coming soon in Schroedinger s HW. Review old and new usage policies. Discussion (later on).

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

IBM Spectrum Scale Archiving Policies

IBM Spectrum Scale Archiving Policies IBM Spectrum Scale Archiving Policies An introduction to GPFS policies for file archiving with Linear Tape File System Enterprise Edition Version 4 Nils Haustein Executive IT Specialist EMEA Storage Competence

More information

AFM Migration: The Road To Perdition

AFM Migration: The Road To Perdition AFM Migration: The Road To Perdition Spectrum Scale Users Group UK Meeting 9 th -10 th May 2017 Mark Roberts (AWE) Laurence Horrocks-Barlow (OCF) British Crown Owned Copyright [2017]/AWE GPFS Systems Legacy

More information

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC ARCHER/RDF Overview How do they fit together? Andy Turner, EPCC a.turner@epcc.ed.ac.uk www.epcc.ed.ac.uk www.archer.ac.uk Outline ARCHER/RDF Layout Available file systems Compute resources ARCHER Compute

More information

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright njwright @ lbl.gov NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery

More information

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 DVS, GPFS and External Lustre at NERSC How It s Working on Hopper Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 1 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves

More information

Guillimin HPC Users Meeting February 11, McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Guillimin HPC Users Meeting February 11, McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Guillimin HPC Users Meeting February 11, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Compute Canada News Scheduler Updates Software Updates Training

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

Dynamic SIMD Scheduling

Dynamic SIMD Scheduling Dynamic SIMD Scheduling Florian Wende SC15 MIC Tuning BoF November 18 th, 2015 Zuse Institute Berlin time Dynamic Work Assignment: The Idea Irregular SIMD execution Caused by branching: control flow varies

More information

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Simple SPMD etc. Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? Terminology I am badly abusing the term SPMD tough The

More information

Hybrid Implementation of 3D Kirchhoff Migration

Hybrid Implementation of 3D Kirchhoff Migration Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation

More information

SuperMike-II Launch Workshop. System Overview and Allocations

SuperMike-II Launch Workshop. System Overview and Allocations : System Overview and Allocations Dr Jim Lupo CCT Computational Enablement jalupo@cct.lsu.edu SuperMike-II: Serious Heterogeneous Computing Power System Hardware SuperMike provides 442 nodes, 221TB of

More information

ALCF Argonne Leadership Computing Facility

ALCF Argonne Leadership Computing Facility ALCF Argonne Leadership Computing Facility ALCF Data Analytics and Visualization Resources William (Bill) Allcock Leadership Computing Facility Argonne Leadership Computing Facility Established 2006. Dedicated

More information

Early Experiences Writing Performance Portable OpenMP 4 Codes

Early Experiences Writing Performance Portable OpenMP 4 Codes Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic

More information

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

Lessons learned from Lustre file system operation

Lessons learned from Lustre file system operation Lessons learned from Lustre file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms Research Technologies High Performance File Systems hpfs-admin@iu.edu

More information

Scalability issues : HPC Applications & Performance Tools

Scalability issues : HPC Applications & Performance Tools High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab chiranjib.sur@in.ibm.com Top 500 :

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

Challenges in making Lustre systems reliable

Challenges in making Lustre systems reliable Challenges in making Lustre systems reliable Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Challenges and in making Lustre systems reliable

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

Benoit DELAUNAY Benoit DELAUNAY 1

Benoit DELAUNAY Benoit DELAUNAY 1 Benoit DELAUNAY 20091023 Benoit DELAUNAY 1 CC-IN2P3 provides computing and storage for the 4 LHC experiments and many others (astro particles...) A long history of service sharing between experiments Some

More information

Distributed & Heterogeneous Programming in C++ for HPC at SC17

Distributed & Heterogeneous Programming in C++ for HPC at SC17 Distributed & Heterogeneous Programming in C++ for HPC at SC17 Michael Wong (Codeplay), Hal Finkel DHPCC++ 2018 1 The Panel 2 Ben Sanders (AMD, HCC, HiP, HSA) Carter Edwards (SNL, Kokkos, ISO C++) CJ Newburn

More information

Storage for HPC, HPDA and Machine Learning (ML)

Storage for HPC, HPDA and Machine Learning (ML) for HPC, HPDA and Machine Learning (ML) Frank Kraemer, IBM Systems Architect mailto:kraemerf@de.ibm.com IBM Data Management for Autonomous Driving (AD) significantly increase development efficiency by

More information

ACCRE High Performance Compute Cluster

ACCRE High Performance Compute Cluster 6 중 1 2010-05-16 오후 1:44 Enabling Researcher-Driven Innovation and Exploration Mission / Services Research Publications User Support Education / Outreach A - Z Index Our Mission History Governance Services

More information

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey Principal ATS Group, LLC churchey@theatsgroup.com (610-574-0207) October 2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Why Monitor? (Clusters, Servers, Storage, Net, etc.)

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

Fast Forward I/O & Storage

Fast Forward I/O & Storage Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7

More information

Oak Ridge Leadership Computing Facility: Summit and Beyond

Oak Ridge Leadership Computing Facility: Summit and Beyond Oak Ridge Leadership Computing Facility: Summit and Beyond Justin L. Whitt OLCF-4 Deputy Project Director, Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory March 2017 ORNL is managed

More information

Application Performance on IME

Application Performance on IME Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications Kevin Brandstatter Dan Gordon Jason DiBabbo Ben Walters Alex Ballmer Lauren Ribordy Ioan Raicu Illinois Institute

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing Jay Boisseau, Director April 17, 2012 TACC Vision & Strategy Provide the most powerful, capable computing technologies and

More information

Outline. March 5, 2012 CIRMMT - McGill University 2

Outline. March 5, 2012 CIRMMT - McGill University 2 Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started

More information

Community Release Update

Community Release Update Community Release Update LAD 2017 Peter Jones HPDD, Intel OpenSFS Lustre Working Group OpenSFS Lustre Working Group Lead by Peter Jones (Intel) and Dustin Leverman (ORNL) Single forum for all Lustre development

More information

TS7700 Technical Update TS7720 Tape Attach Deep Dive

TS7700 Technical Update TS7720 Tape Attach Deep Dive TS7700 Technical Update TS7720 Tape Attach Deep Dive Ralph Beeston TS7700 Architecture IBM Session objectives Brief Overview TS7700 Quick background of TS7700 TS7720T Overview TS7720T Deep Dive TS7720T

More information

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004

More information

DDN About Us Solving Large Enterprise and Web Scale Challenges

DDN About Us Solving Large Enterprise and Web Scale Challenges 1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,

More information

Hitachi Data Instance Manager Software Version Release Notes

Hitachi Data Instance Manager Software Version Release Notes Hitachi Data Instance Manager Software Version 4.2.3 Release Notes Contents Contents... 1 About this document... 2 Intended audience... 2 Getting help... 2 About this release... 2 Product package contents...

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information