HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Size: px

Start display at page:

Download "HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017"

Ethan Elliott
5 years ago
Views:

1 Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle for the US Department of Energy

2 Our vision Sustain leadership and scientific impact

s most powerful open resources for scalable computing

scalable infrastructure for science Follow a

these critical areas Attract the brightest talent and

leading-edge science relevant to missions of DOE and

partnerships with industry Provide unique opportunity

2 2 Our vision Sustain leadership and scientific impact in computing and computational sciences Provide world s most powerful open resources for scalable computing and simulation, data and analytics at any scale, and scalable infrastructure for science Follow a well-defined path for maintaining world leadership in these critical areas Attract the brightest talent and partnerships from all over the world Deliver leading-edge science relevant to missions of DOE and key federal and state agencies Invest in cross-cutting partnerships with industry Provide unique opportunity for innovation based on multiagency collaboration Invest in education and training

Oak Ridge Leadership Computing Facility (OLCF) is

facilities Titan Peak performance 27 PF/s Data

bandwidth HPSS archive 240 PB capacity 6 tape

1 PF/s Memory 240 TB Disk bandwidth 104 GB/s Square

2 MW Data analytics/visualization LENS cluster Ewok

appliance Beacon Peak performance 210 TF/s Memory

3 Oak Ridge Leadership Computing Facility (OLCF) is one of the world s most powerful computing facilities Titan Peak performance 27 PF/s Data storage Memory Disk bandwidth 710 TB 240 GB/s Square feet 5,000 Power 8.8 MW Spider file system 40 PB capacity >1 TB/s bandwidth HPSS archive 240 PB capacity 6 tape libraries Gaea Peak performance 1.1 PF/s Memory 240 TB Disk bandwidth 104 GB/s Square feet 1,600 Power 2.2 MW Data analytics/visualization LENS cluster Ewok cluster EVEREST visualization facility urika data appliance Beacon Peak performance 210 TF/s Memory Disk bandwidth 12 TB 56 GB/s Networks ESnet, 100 Gbps Darter Peak performance TF/s Internet2, 100 Gbps Memory 22.6 TB Private dark fibre 3 Disk bandwidth 30 GB/s

Our Compute and Data Environment for Science (CADES) provides a shared

Leadership Computing Facility Atmospheric Radiation Measurement Basic Energy

Graph Analytics Cray GX Infrastructure (File systems/ Networking / etc.

4 Our Compute and Data Environment for Science (CADES) provides a shared infrastructure to help solve big science problems Spallation Neutron Source Leadership Computing Facility Atmospheric Radiation Measurement Basic Energy Sciences Center for Nanophase Materials Sciences UT- CADES CADES ALICE, etc. Graph Analytics Cray GX Infrastructure (File systems/ Networking / etc.) XK7 Condos, Clusters, and Hybrid Clouds Shared- Memory Ultraviolet (UV) Future Technologies, Beyond Moore s Law 4

CADES connects the modes of discovery In silico investigation 1

solvers and computing 3 Execution Forward process in simulation

learning B Alignment Data capture into staged structures A

5 CADES connects the modes of discovery In silico investigation 1 Model Capturing the physical processes 2 Formulation Mapping to solvers and computing 3 Execution Forward process in simulation or iteration for convergence C Analytics at scale Machine learning B Alignment Data capture into staged structures A Experiment design Synthesis or control Empirical/experimental investigation 5

6 CADES Deployment.. and several ORNL projects on OIC.. and several other smaller projects. ~5000 Cores of Integrated Condos on Infiniband ~10,000 OIC Cores Attested PHI Enclave Integrated with UCAMS and XCAMS ~6000 Cores of Integrated Condos on Infiniband ~5000 Cores of Hybrid, Expandable Cloud SGI UV, Urika-GD/XA: GX 5PB+ High-Speed Storage ~3000 Cores of XK7 OIC CADES Moderate CADES Open Cray Condos Hybrid Cloud Unique Heterogeneous Platforms PHI Enclave Object store Large-Scale Storage 6 High-Speed Interconnects

Big Compute + Analytics (OLCF and CADES) coupled

data tier CADES Cades cluster computing HTTPS

speed secure data transfer Scientific instrument

Electron Microscopy (STEM) Scanning Tunneling

7 Big Compute + Analytics (OLCF and CADES) coupled to Big Science Data Beam user tier BEAM web and data tier CADES Cades cluster computing HTTPS Storage MySQL Database Data/ artifacts Local High speed secure data transfer Scientific instrument tier Supercomputing tier Scanning Transmission Electron Microscopy (STEM) Scanning Tunneling Microscopy (STM) Scanning Probe Microscopy (SPM) IFIR/CNMS resources 7

8 Distributed cloud-based architecture DOE HPC cloud Titan, Edison, and Hopper Scanning probe microscope CADES VM web/data server CADES compute clusters CADES data storage 8

ORNL s computing ecosystem must integrate data analysis and simulation capabilities

similar hardware technology requirements High bandwidth to memory Efficient processing Very

Analyzing and managing large complex data sets from experiments, observation, or simulation

9 ORNL s computing ecosystem must integrate data analysis and simulation capabilities Simulation and data are critical to DOE Both need more computing capability Both have similar hardware technology requirements High bandwidth to memory Efficient processing Very fast I/O Different machine balance may be required Experiment Computing Theory Big data Analyzing and managing large complex data sets from experiments, observation, or simulation and sharing them with a community Simulation Used to implement theory; helps with understanding and prediction 9

10 2017 OLCF Leadership System The Smartest Supercomputer on the Planet Hybrid CPU/GPU architecture Vendor: IBM (Prime) / NVIDIA / Mellanox Technologies At least 5X Titan s Application Performance Total System Memory >6 PB HBM, DDR, and non-volatile Dual-rail Mellanox Infiniband full, non-blocking fat-tree interconnect IBM Elastic Storage (GPFS ) 2.5 TB/s I/O and 250 PB disk capacity Approximately 4,600 nodes, each with: Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla GPUs using the NVIDIA Volta architecture CPUs and GPUs connected with high speed NVLink Large coherent memory: over 512 GB (HBM + DDR4) all directly addressable from the CPUs and GPUs An additional 800 GB of NVRAM, which can be configured as either a burst buffer or as extended memory over 40 TF peak performance 10

11 Summit will replace Titan as the OLCF s leadership supercomputer Many fewer nodes Much more powerful nodes Much more memory per node and total system memory Faster interconnect Much higher

11 11 Summit will replace Titan as the OLCF s leadership supercomputer Many fewer nodes Much more powerful nodes Much more memory per node and total system memory Faster interconnect Much higher bandwidth between CPUs and GPUs Much larger and faster file system Feature Titan Summit Application performance Baseline 5-10x Titan Number of nodes 18,688 ~4,600 Node performance 1.4 TF > 40 TF Memory per node 38GB DDR3 + 6GB GDDR5 512 GB DDR4 + HBM NV memory per node GB Total system memory System interconnect (node injection bandwidth) 710 TB Gemini (6.4 GB/s) >6 PB DDR4 + HBM + non-volatile Dual Rail EDR-IB (23 GB/s) Or Dual Rail HDR-IB (48 GB/s) Interconnect topology 3D Torus Non-blocking Fat Tree Processors 1 AMD Opteron 1 NVIDIA Kepler 2 IBM POWER9 6 NVIDIA Volta File system 32 PB, 1 TB/s, Lustre 250 PB, 2.5 TB/s, GPFS Peak power consumption 9 MW 13 MW

12 12 ECP aims to transform the HPC ecosystem and make major contributions to the nation Develop applications that will tackle a broad spectrum of mission critical problems of unprecedented complexity with unprecedented performance Contribute to the economic competitiveness of the nation Support national security Develop a software stack, in collaboration with vendors, that is exascalecapable and is usable on smaller systems by industry and academia Train a large cadre of computational scientists, engineers, and computer scientists who will be an asset to the nation long after the end of ECP Partner with vendors to develop computer architectures that support exascale applications Revitalize the US HPC vendor industry Demonstrate the value of comprehensive co-design

13 The ECP Plan of Record A 7-year project that follows the holistic/co-design approach, which runs through 2023 (including 12 months of schedule contingency) Enable an initial exascale system based on advanced architecture and delivered in 2021 Enable capable exascale systems, based on ECP R&D, delivered in 2022 and deployed in 2023 as part of an NNSA and SC facility upgrades Acquisition of the exascale systems is outside of the ECP scope, will be carried out by DOE-SC and NNSA-ASC supercomputing facilities 13

advanced architecture system Capable exascale

14 Transition to higher trajectory with advanced architecture Computing Capability First exascale advanced architecture system Capable exascale systems 10X 5X Time

Reaching the elevated trajectory will require advanced and innovative architectures In order to reach the elevated trajectory, advanced architectures must be developed that make a big leap in

15 Reaching the elevated trajectory will require advanced and innovative architectures In order to reach the elevated trajectory, advanced architectures must be developed that make a big leap in Parallelism Memory and Storage Reliability Energy Consumption The exascale advanced architecture developments benefit all future U.S. systems on the higher trajectory In addition, the exascale advanced architecture will need to solve emerging data science and machine learning problems in addition to the traditional modeling and simulations applications. 15

ECP follows a holistic approach that uses co-design and

Workflows Application Development Software Technology

applications Scalable and productive software stack

supercomputers Correctness Applications Programming

Software, resource management threading, scheduling,

Math libraries and Frameworks Memory and Burst buffer

managem ent I/O and file system ECP s work encompasses

16 ECP follows a holistic approach that uses co-design and integration to achieve capable exascale Resilience Workflows Application Development Software Technology Hardware Technology Exascale Systems Science and mission applications Scalable and productive software stack Hardware technology elements Integrated exascale supercomputers Correctness Applications Programming models, development environment, and System runtimes Software, resource management threading, scheduling, monitoring, and control Node OS, runtimes Visualization Math libraries and Frameworks Memory and Burst buffer Hardware interface Data Analysis Co-Design Tools Data managem ent I/O and file system ECP s work encompasses applications, system software, hardware technologies and architectures, and workforce development 16

17 Planned outcomes of the ECP Important applications running at exascale in 2021, producing useful results A full suite of mission and science applications ready to run on the 2023 capable exascale systems A large cadre of computational scientists, engineers, and computer scientists with deep expertise in exascale computing, who will be an asset to the nation long after the end of ECP An integrated software stack that supports exascale applications Results of PathForward R&D contracts with vendors that are integrated into exascale systems and are in vendors product roadmaps Industry and mission critical applications have been prepared for a more diverse and sophisticated set of computing technologies, carrying U.S. supercomputing well into the future 17

18 The Oak Ridge Leadership Computing Facility is on a well-defined path to exascale Since clock-rate scaling ended in 2003: HPC performance achieved through increased parallelism Titan and beyond: hierarchical parallelism with very powerful nodes Jaguar scaled to 300,000 cores MPI plus thread-level parallelism through OpenACC or OpenMP plus vectors Jaguar 2.3 PF Multi-core CPU 7 MW Titan: 27 PF Hybrid GPU/CPU 9 MW Summit 200PF Titan Hybrid GPU/CPU 13 MW OLCF Summit ~20-50 MW 18

19 Summary ORNL has a long history in highperformance computing for science, delivering many first-ofa-kind systems that were among the world s most powerful computers. We will continue this as a core-competency of the laboratory Delivering an ecosystem focused on the integration of computing and data into instruments of science and engineering This ecosystem delivers important, time-critical science with enormous impacts 19

20 20 Questions?

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership