Oak Ridge National Laboratory Computing and Computational Sciences

Size: px
Start display at page:

Download "Oak Ridge National Laboratory Computing and Computational Sciences"

Transcription

1 Oak Ridge National Laboratory Computing and Computational Sciences Preparing OpenSHMEM for Exascale Presented by: Pavel Shamis (Pasha) HPC Advisory Council Stanford Conference California Feb 2, 2015

2 Outline CORAL overview Summit What is OpenSHMEM? Preparing OpenSHMEM for Exascale Recent advances 2 Preparing OpenSHMEM for Exascale

3 CORAL CORAL Collaboration of ORNL, ANL, LLNL Objective Procure 3 leadership computers to be sited at Argonne, Oak Ridge and Lawrence Livermore in 2017 Two of the contracts have been awarded with the Argonne contract in process Leadership Computers RFP requests >100 PF, 2 GB/core main memory, local NVRAM, and science performance 5x-10x Titan or Sequoia 3 Preparing OpenSHMEM for Exascale

4 The Road to Exascale Since clock-rate scaling ended in 2003, HPC performance has been achieved through increased parallelism. Jaguar scaled to 300,000 cores. Titan and beyond deliver hierarchical parallelism with very powerful nodes. MPI plus thread level parallelism through OpenACC or OpenMP plus vectors Jaguar: 2.3 PF Multi-core CPU 7 MW Titan: 27 PF Hybrid GPU/CPU 9 MW Summit: 5-10x Titan Hybrid GPU/CPU 10 MW CORAL System OLCF5: 5-10x Summit ~20 MW 4 Preparing OpenSHMEM for Exascale

5 System Summary Compute Node POWER Architecture Processor NVIDIA Volta NVMe-compatible PCIe 800GB SSD > 512 GB HBM + DDR4 Coherent Shared Memory Compute Rack Standard 19 Warm water cooling Compute System Summit: 5x-10x Titan 10 MW IBM POWER NVLink NVIDIA Volta HBM NVLink Mellanox Interconnect Dual-rail EDR Infiniband 5 Preparing OpenSHMEM for Exascale

6 How does Summit compare to Titan Summit VS Titan Feature Summit Titan Application Performance 5-10x Titan Baseline Number of Nodes ~3,400 18,688 Node performance > 40 TF 1.4 TF Memory per Node >512 GB (HBM + DDR4) 38GB (GDDR5+DDR3) NVRAM per Node 800 GB 0 Node Interconnect NVLink (5-12x PCIe 3) PCIe 2 System Interconnect (node injection bandwidth) 12 SC 14 Summit - Bland Do Not Release Prior to Monday, Nov. 17, Preparing OpenSHMEM for Exascale Dual Rail EDR-IB (23 GB/s) Gemini (6.4 GB/s) Interconnect Topology Non-blocking Fat Tree 3D Torus Processors File System IBM POWER9 NVIDIA Volta AMD Ot er on NVIDIA e Kp ler 120 PB, 1 TB/s, P GFS 32 PB, 1 TB/s, Lustre Peak power consumption 10 MW 9 MW Present and Future Leadership Computers at OLCF, Buddy Bland

7 Challenges for Programming Models Very powerful compute nodes Hybrid architecture Multiple CPU/GPU Different types of memory Must be fun to program ;-) MPI + X 7 Preparing OpenSHMEM for Exascale

8 Private Data Objects Remotely Accessible Symmetric Data Objects What is OpenSHMEM? Communication library and interface specification that implements a Partitioned Global Address Space (PGAS) programming model Processing Element (PE) an OpenSHMEM process Symmetric objects have same address (or offset) on all PEs PE 0 PE 1 PE N-1 Global and Static Variables Global and Static Variables Global and Static Variables X = shmalloc(sizeof(long)) Variable: X Variable: X Variable: X Symmetric Heap Symmetric Heap Symmetric Heap Local Variables Local Variables Local Variables 8 Preparing OpenSHMEM for Exascale

9 Private Data Objects Remotely Accessible Symmetric Data Objects OpenSHMEM Operations Remote memory Put and Get void shmem_getmem(void *target, const void *source, size_t len, int pe); void shmem_putmem(void *target, const void *source, size_t len, int pe); Remote memory Atomic operations long long shmem_int_add(int *target, int value, int pe); Collective PE 0 PE 1 PE N-1 broadcast, reductions, etc Synchronization operations Point-to-point Global Ordering operations Global and Static Variables Symmetric Heap Global and Static Variables Symmetric Heap X = shmalloc(sizeof(long)) Global and Static Variables Variable: X Variable: X Variable: X Symmetric Heap Distributed lock operations Local Variables Local Variables Local Variables 9 Preparing OpenSHMEM for Exascale

10 OpenSHMEM Code Example Preparing OpenSHMEM for Exascale

11 OpenSHMEM Code Example You just learned program OpenSHMEM! Library initialization AMO/PUT/GET Synchronization Done Preparing OpenSHMEM for Exascale

12 OpenSHMEM OpenSHMEM is a one-sided communications library C and Fortran API Uses symmetric data objects to efficiently communicate across processes Advantages: Good for irregular applications, latency-driven communication Random memory access patterns Maps really well to hardware/interconnects OpenSHMEM InfniBand (Mellanox) Gemini/Aries (Cray) RMA PUT/GET V V Atomics V V Collectives V V 12 Preparing OpenSHMEM for Exascale

13 OpenSHMEM Key Principles Keep it simple The specification is only ~ 80 pages Keep it fast As close as possible to hardware 13 Preparing OpenSHMEM for Exascale

14 Evolution of OpenSHMEM SHMEM library introduced by Cray Research Inc. (T3D systems) Adapted by SGI for products based on the Numa-Link architecture and included in the Message Passing Toolkit (MPT). Vendor specific SHMEM libraries emerge (Quadrics, HP, IBM, Mellanox, Intel, gpshmem, SiCortex etc.). OpenSHMEM is born. ORNL and UH come together to address the differences between various SHMEM implementations. OSSS signed SHMEM trademark licensing agreement OpenSHMEM 1.0 is finalized 2015 onwards, next OpenSHMEM specifications: faster, more predictable, more agile OpenSHMEM is a living specification! OpenSHMEM 1.0 reference implementation & V&V, Tools OpenSHMEM 1.1 released OpenSHMEM s mid Preparing OpenSHMEM for Exascale

15 OpenSHMEM - Roadmap OpenSHMEM v1.1 (June 2014) Errata, bug fixes Ratified (100+ tickets resolved) OpenSHMEM v1.2 (Early 2015) API naming convention finalize(), global_exit() Consistent data type support Version information Clarifications: zero-length, wait shmem_ptr() OpenSHMEM v1.5 (Late 2015) Non-blocking communication semantics (RMA, AMO) teams, groups Thread safety OpenSHMEM v1.6 Non-blocking collectives OpenSHMEM v1.7 Thread safety update OpenSHMEM Next Generation (2.0) Let s go wild!!! (Exascale!) Active set + Memory context Fault Tolerance Exit codes Locality I/O White paper: OpenSHMEM Tools API 15 Preparing OpenSHMEM for Exascale

16 OpenSHMEM Community Today Academia Vendors Government 16 Preparing OpenSHMEM for Exascale

17 OpenSHMEM Implementations Proprietary SGI SHMEM Cray SHMEM IBM SHMEM HP SHMEM Mellanox Scalable SHMEM Legacy Quadrics SHMEM Open Source OpenSHMEM Reference Implementation (UH) Portals SHMEM Oshmpi / Open SHMEM over MPI (under development) OpenSHMEM with OpenMPI OpenSHMEM with Mvapich MPI (OSU) TSHMEM (UFL) GatorSHMEM (UFL) 17 Preparing OpenSHMEM for Exascale

18 OpenSHMEM Eco-system Development OpenSHMEM Reference Implementation ANALYZER Performance Analysis Vampir Debug 18 Preparing OpenSHMEM for Exascale

19 OpenSHMEM Eco-system OpenSHMEM Specification Vampir TAU DDT OpenSHMEM Analyzer UCCS 19 Preparing OpenSHMEM for Exascale

20 Upcoming Challenges for OpenSHMEM Based on what we know about the upcoming architecture Hybrid Architecture Communication across different components of system Locality of resources Multiple CPU/GPU Thread Safety (without performance sacrifices) Threads locality Scalability Different Types of memory Address spaces 20 Preparing OpenSHMEM for Exascale

21 Hybrid Architecture Challenges and Ideas OpenSHMEM for accelerators TOC-Centric Communication: a case study with NVSHMEM, OUG/PGAS 2014, Shreeram Potluri UG14.pdf Preliminary study, prototype concept 21 Preparing OpenSHMEM for Exascale

22 NVSHMEM The problem Communication across GPU requires synchronization with Host Software overheads, hardware overhead of launching kernels, etc. Research idea/concept proposed by Nvidia GPU-initiated communication NVSHMEM communication primitives: nvshmem_put(), nvshmem_get() to/from remote GPU memory Emulated using CUDA IPC (CUDA 4.2) CHANGE IN T Traditional Loop { Interior Compute (kernel launch) Pack Boundaries (kernel launch) Stream Synchronize Exchange (MPI/OpenSHMEM) Unpack Boundaries (kernel launch) Boundary Compute (kernel launch) Stream/Device Synchronize } - Kernel launch overheads - CPU based blocking synchronization The slide is based on TOC-Centric Communication: a case study with NVSHMEM, OUG/PGAS 2014, Shreeram Potlurihttp:// 22 Preparing OpenSHMEM for Exascale

23 NVSHMEM u[i][j] = u[i][j] + (v[i+1][j] + v[i-1][j] + v[i][j+1] + v[i][j+1])/x PRELIMINARY R Time%per%Step%(usec)% 1500" 1000" 500" 0" tradi/ onal" persistent"kernel" 64" 128" 256" 512" 16 1K" 2K" Stencil%Size % tl D Evaluation results from: TOC-Centric Communication: a case study with NVSHMEM, OUG/PGAS 2014, Shreeram Potlurihttp:// 23 Preparing OpenSHMEM for Exascale

24 Many-Core System Challenges It is challenging to provide highperformance THREAD_MULTIPLE support Locks / Atomic operations in communication path Even though MPI IMB benchmarks benefits from full process memory separation, multi-threaded UCCS obtains comparable performance Aurelien Bouteiller, Thomas Herault and George Bosilca, A Multithreaded Communication Substrate for OpenSHMEM, OUG Preparing OpenSHMEM for Exascale

25 Many-Core System Challenges Old Ideas SHMEM_PTR (or SHMEM_LOCAL_PTR on Cray) Y = shmem_ptr(&x, PE1) PE 0 PE 1 Variable: X Variable: X Symmetric Heap Memory Mapping Symmetric Heap Variable: Y Local Variables Local Variables 25 Preparing OpenSHMEM for Exascale

26 Many-Core System Challenges Old Ideas Provides direct assess to remote PE element with memory load and store operations Supported on a systems where SHMEM_PUT/GET are implemented with memory load and store operations Usually implemented using XPMEM ( Gabriele Jost, Ulf R. Hanebutte, James Dinan, OpenSHMEM with Threads: A Bad Idea? f 26 Preparing OpenSHMEM for Exascale

27 Many-Core System Challenges New Ideas OpenSHMEM Context by Intel James Dinan and Mario Flajslik, Contexts: A Mechanism for High Throughput Communication in OpenSHMEM, PGAS 2015 Explicit API for allocation and management of communication contexts OpenSHMEM Application Thread 0 Thread 1 Thread 2 OpenSHMEM Library Context 0 Context 1 Context 2 Put Put Get Put Put Get Put Put Get 27 Preparing OpenSHMEM for Exascale

28 Many-Core System Challenges New Ideas Cray s proposal of Hot Threads Monika ten Bruggencate Cray Inc. Cray SHMEM Update, First OpenSHMEM Workshop: Experiences, Implementations and Tools ns_and_tutorials/tenbruggencate_cray_shmem_update.pdf Idea: each thread is registered within OpenSHMEM library. The library allocates and automatically manages communication resources (context) for the application Compatible with current API 28 Preparing OpenSHMEM for Exascale

29 Address Space and Locality Challenges Symmetric heap is not flexible enough All PE have to allocate the same amount of memory No concept of locality How we manage different types of memory? What is the right abstraction? 29 Preparing OpenSHMEM for Exascale

30 Memory Spaces Aaron Welch, Swaroop Pophale, Pavel Shamis, Oscar Hernandez, Stephen Poole, Barbara Chapman, Extending the OpenSHMEM Memory Model to Support User-Defined Spaces, PGAS2014 Concept of teams Original OpenSHMEM active-set (group of Pes) concept is outdates, BUT very lightweight (local operation) Memory Space Memory space association with a team Similar concepts can be found in MPI, Chapel, etc. 30 Preparing OpenSHMEM for Exascale

31 Teams Explicit method of grouping PEs Fully local objects and operations - Fast New (sub)teams created from parent teams Re-indexing of PE ids with respect to the team Strided teams and axial splits No need to maintain translation array All translations can be done with simple arithmetic Ability to specify team index for remote operations 31 Preparing OpenSHMEM for Exascale

32 Spaces 32 Preparing OpenSHMEM for Exascale

33 Spaces Spaces and teams creation is decoupled Faster memory allocation compared to shmalloc Future directions Different types of memory Locality Separate address spaces Asymmetric RMA access 33 Preparing OpenSHMEM for Exascale

34 Fault Tolerance? How to run in presence of faults? What is the responsibility of programming model and communication libraries Pengfei Hao, Pavel Shamis, Manjunath Gorentla Venkata, Swaroop Pophale, Aaron Welch, Stephen Poole, Barbara Chapman, Fault Tolerance for OpenSHMEM, PGAS/OUG14 sion_12.pdf 34 Preparing OpenSHMEM for Exascale

35 Fault Tolerance Basic idea In memory checkpoint of symmetric memory regions Symmetric recovery or only memory recovery 35 Preparing OpenSHMEM for Exascale

36 Fault Tolerance Code snippet 36 Preparing OpenSHMEM for Exascale

37 Fault Tolerance Work in progress OpenSHMEM is just one piece of the puzzle Run-time, I/O, drivers, etc. The system has to provide fault tolerance infrastructure Error notification, coordination, etc. Leveraging existing work/research in the HPC community MPI, Hadoop, etc. 37 Preparing OpenSHMEM for Exascale

38 Summary This just a snapshot some of the ideas Other active research & development topics: non-blocking operations, counting operations, signaled operation, asymmetric memory access, etc These challenges are relevant for many other HPC programming models The key to success Co-design of hardware and software Generic solutions that target broader community The challenges are common across different fields: storage, analytics, bigdata, etc. 38 Preparing OpenSHMEM for Exascale

39 How to get involved? Join the mailing list Join OpenSHMEM redmine GitHUB OpenSHMEM RF, test suites, benchmarks, etc. Participate in our upcoming events Workshop, user group meetings, and conference calls 39 Preparing OpenSHMEM for Exascale

40 Upcoming Events Workshop 2015 August,4 th -6 th, Preparing OpenSHMEM for Exascale

41 Upcoming Events Co-Located with PGAS th international Conference on Partitioned Global Address Space Programming Models Washington, DC 41 Preparing OpenSHMEM for Exascale

42 Acknowledgements This work was supported by the United States Department of Defense & used resources of the Extreme Scale Systems Center at Oak Ridge National Laboratory. Empowering the Mission 42 Preparing OpenSHMEM for Exascale

43 Questions? 43 Preparing OpenSHMEM for Exascale

44 Backup Slides 44 Preparing OpenSHMEM for Exascale

45 NVSHMEM Code Example USING NVSHMEM Device Code global void one_kernel (u, v, sync, ) { i = threadidx.x; for ( ) { if (i+1 > nx) { v[i+1] = nvshmem_float_g (v[1], rightpe) } if (i-1 < 1) { v[i-1] = nvshmem_float_g (v[nx], leftpe) } contd. } } /*peers array has left and right PE ids*/ if (i < 2) { nvshmem_int_p (sync[i], 1, peers[i]); nvshmem_quiet(); nvshmem_wait_until (sync[i], EQ, 1); } //intra-process sync //compute v from u and sync u[i] = (u[i] + (v[i+1] + v[i-1]... contd. Evaluation results from: TOC-Centric Communication: a case study with NVSHMEM, OUG/PGAS 2014, Shreeram Potlurihttp:// Preparing OpenSHMEM for Exascale

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Present and Future Leadership Computers at OLCF

Present and Future Leadership Computers at OLCF Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Pavel Shamis, Manjunath Gorentla Venkata, M. Graham Lopez, Matthew B. Baker, Oscar Hernandez, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017 Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

EFFICIENT BREADTH FIRST SEARCH ON MULTI-GPU SYSTEMS USING GPU-CENTRIC OPENSHMEM

EFFICIENT BREADTH FIRST SEARCH ON MULTI-GPU SYSTEMS USING GPU-CENTRIC OPENSHMEM EFFICIENT BREADTH FIRST SEARCH ON MULTI-GPU SYSTEMS USING GPU-CENTRIC OPENSHMEM Sreeram Potluri, Anshuman Goswami NVIDIA Manjunath Gorentla Venkata, Neena Imam - ORNL SCOPE OF THE WORK Reliance on CPU

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015

More information

Unified Communication X (UCX)

Unified Communication X (UCX) Unified Communication X (UCX) Pavel Shamis / Pasha ARM Research SC 18 UCF Consortium Mission: Collaboration between industry, laboratories, and academia to create production grade communication frameworks

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

Modernizing OpenMP for an Accelerated World

Modernizing OpenMP for an Accelerated World Modernizing OpenMP for an Accelerated World Tom Scogland Bronis de Supinski This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

The Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy

The Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy The Role of InfiniBand Technologies in High Performance Computing 1 Managed by UT-Battelle Contributors Gil Bloch Noam Bloch Hillel Chapman Manjunath Gorentla- Venkata Richard Graham Michael Kagan Vasily

More information

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Jeffrey S. Vetter SPPEXA Symposium Munich 26 Jan 2016 ORNL is managed by UT-Battelle for the US Department of Energy

More information

MVAPICH User s Group Meeting August 20, 2015

MVAPICH User s Group Meeting August 20, 2015 MVAPICH User s Group Meeting August 20, 2015 LLNL-PRES-676271 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Livermore National

More information

NVSHMEM: A PARTITIONED GLOBAL ADDRESS SPACE LIBRARY FOR NVIDIA GPU CLUSTERS. Sreeram Potluri, Anshuman Goswami - NVIDIA 3/28/2018

NVSHMEM: A PARTITIONED GLOBAL ADDRESS SPACE LIBRARY FOR NVIDIA GPU CLUSTERS. Sreeram Potluri, Anshuman Goswami - NVIDIA 3/28/2018 NVSHMEM: A PARTITIONED GLOBAL ADDRESS SPACE LIBRARY FOR NVIDIA GPU CLUSTERS Sreeram Potluri, Anshuman Goswami - NVIDIA 3/28/2018 GPU Programming Models AGENDA Overview of NVSHMEM Porting to NVSHMEM Performance

More information

OpenSHMEM. Application Programming Interface. Version th March 2015

OpenSHMEM. Application Programming Interface.   Version th March 2015 OpenSHMEM Application Programming Interface http://www.openshmem.org/ Version. th March 0 Developed by High Performance Computing Tools group at the University of Houston http://www.cs.uh.edu/ hpctools/

More information

IBM HPC Technology & Strategy

IBM HPC Technology & Strategy IBM HPC Technology & Strategy Hyperion HPC User Forum Stuttgart, October 1st, 2018 The World s Smartest Supercomputers Klaus Gottschalk gottschalk@de.ibm.com HPC Strategy Deliver End to End Solutions for

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help? Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Paving the Road to Exascale August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data

More information

outthink limits Spectrum Scale Enhancements for CORAL Sarp Oral, Oak Ridge National Laboratory Gautam Shah, IBM

outthink limits Spectrum Scale Enhancements for CORAL Sarp Oral, Oak Ridge National Laboratory Gautam Shah, IBM outthink limits Spectrum Scale Enhancements for CORAL Sarp Oral, Oak Ridge National Laboratory Gautam Shah, IBM What is CORAL Collaboration of DOE Oak Ridge, Argonne, and Lawrence Livermore National Labs

More information

DISP: Optimizations Towards Scalable MPI Startup

DISP: Optimizations Towards Scalable MPI Startup DISP: Optimizations Towards Scalable MPI Startup Huansong Fu, Swaroop Pophale*, Manjunath Gorentla Venkata*, Weikuan Yu Florida State University *Oak Ridge National Laboratory Outline Background and motivation

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

One-Sided Append: A New Communication Paradigm For PGAS Models

One-Sided Append: A New Communication Paradigm For PGAS Models One-Sided Append: A New Communication Paradigm For PGAS Models James Dinan and Mario Flajslik Intel Corporation {james.dinan, mario.flajslik}@intel.com ABSTRACT One-sided append represents a new class

More information

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries How to Boost the Performance of Your MPI and PGAS s with MVAPICH2 Libraries A Tutorial at the MVAPICH User Group (MUG) Meeting 18 by The MVAPICH Team The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters. Presented at GTC 15

Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters. Presented at GTC 15 Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters Presented at Presented by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio- state.edu hcp://www.cse.ohio-

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Interconnect Related Research at Oak Ridge National Laboratory

Interconnect Related Research at Oak Ridge National Laboratory Interconnect Related Research at Oak Ridge National Laboratory Barney Maccabe Director, Computer Science and Mathematics Division July 16, 2015 Frankfurt, Germany ORNL is managed by UT-Battelle for the

More information

Scaling with PGAS Languages

Scaling with PGAS Languages Scaling with PGAS Languages Panel Presentation at OFA Developers Workshop (2013) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

Oak Ridge Leadership Computing Facility: Summit and Beyond

Oak Ridge Leadership Computing Facility: Summit and Beyond Oak Ridge Leadership Computing Facility: Summit and Beyond Justin L. Whitt OLCF-4 Deputy Project Director, Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory March 2017 ORNL is managed

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE

ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE 13 th ANNUAL WORKSHOP 2017 ADVANCED PGAS CENTRIC USAGE OF THE OPENFABRICS INTERFACE Erik Paulson, Kayla Seager, Sayantan Sur, James Dinan, Dave Ozog: Intel Corporation Collaborators: Howard Pritchard:

More information

Preparing Scientific Software for Exascale

Preparing Scientific Software for Exascale Preparing Scientific Software for Exascale Jack Wells Director of Science Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory Mini-Symposium on Scientific Software Engineering Monday,

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Operational Robustness of Accelerator Aware MPI

Operational Robustness of Accelerator Aware MPI Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities

More information

Center for Accelerated Application Readiness. Summit. Tjerk Straatsma. Getting Applications Ready for. OLCF Scientific Computing Group

Center for Accelerated Application Readiness. Summit. Tjerk Straatsma. Getting Applications Ready for. OLCF Scientific Computing Group Center for Accelerated Application Readiness Getting Applications Ready for Summit Tjerk Straatsma OLCF Scientific Computing Group ORNL is managed by UT-Battelle for the US Department of Energy OLCF on

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

High-Performance Key-Value Store on OpenSHMEM

High-Performance Key-Value Store on OpenSHMEM High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

OPENSHMEM AND OFI: BETTER TOGETHER

OPENSHMEM AND OFI: BETTER TOGETHER 4th ANNUAL WORKSHOP 208 OPENSHMEM AND OFI: BETTER TOGETHER James Dinan, David Ozog, and Kayla Seager Intel Corporation [ April, 208 ] NOTICES AND DISCLAIMERS Intel technologies features and benefits depend

More information

High Performance Computing

High Performance Computing High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and

More information

OpenSHMEM. Application Programming Interface. Version th December 2017

OpenSHMEM. Application Programming Interface.   Version th December 2017 OpenSHMEM Application Programming Interface http://www.openshmem.org/ Version. th December 0 Development by For a current list of contributors and collaborators please see http://www.openshmem.org/site/contributors/

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Using MPI+OpenMP for current and future architectures

Using MPI+OpenMP for current and future architectures Using MPI+OpenMP for current and future architectures September 24th, 2018 OpenMPCon 2018 Oscar Hernandez Yun (Helen) He Barbara Chapman DOE s Office of Science Computation User Facilities DOE is leader

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

The Titan Tools Experience

The Titan Tools Experience The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

The Future of High Performance Interconnects

The Future of High Performance Interconnects The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox

More information

MPI + X programming. UTK resources: Rho Cluster with GPGPU George Bosilca CS462

MPI + X programming. UTK resources: Rho Cluster with GPGPU   George Bosilca CS462 MPI + X programming UTK resources: Rho Cluster with GPGPU https://newton.utk.edu/doc/documentation/systems/rhocluster George Bosilca CS462 MPI Each programming paradigm only covers a particular spectrum

More information

S8688 : INSIDE DGX-2. Glenn Dearth, Vyas Venkataraman Mar 28, 2018

S8688 : INSIDE DGX-2. Glenn Dearth, Vyas Venkataraman Mar 28, 2018 S8688 : INSIDE DGX-2 Glenn Dearth, Vyas Venkataraman Mar 28, 2018 Why was DGX-2 created Agenda DGX-2 internal architecture Software programming model Simple application Results 2 DEEP LEARNING TRENDS Application

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS

OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS OPENSHMEM AS AN EFFECTIVE COMMUNICATION LAYER FOR PGAS MODELS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for

More information

CS500 SMARTER CLUSTER SUPERCOMPUTERS

CS500 SMARTER CLUSTER SUPERCOMPUTERS CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication

More information

Cray RS Programming Environment

Cray RS Programming Environment Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

Open MPI for Cray XE/XK Systems

Open MPI for Cray XE/XK Systems Open MPI for Cray XE/XK Systems Samuel K. Gutierrez LANL Nathan T. Hjelm LANL Manjunath Gorentla Venkata ORNL Richard L. Graham - Mellanox Cray User Group (CUG) 2012 May 2, 2012 U N C L A S S I F I E D

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters Exploiting InfiniBand and Direct Technology for High Performance Collectives on Clusters Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth

More information

The Arm Technology Ecosystem: Current Products and Future Outlook

The Arm Technology Ecosystem: Current Products and Future Outlook The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed

More information

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core

More information

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning 5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:

More information

Lecture 6: odds and ends

Lecture 6: odds and ends Lecture 6: odds and ends Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 6 p. 1 Overview synchronicity multiple streams and devices

More information

GPU-centric communication for improved efficiency

GPU-centric communication for improved efficiency GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop

More information

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Swen Böhm 1,2, Christian Engelmann 2, and Stephen L. Scott 2 1 Department of Computer

More information

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

High-Performance Broadcast for Streaming and Deep Learning

High-Performance Broadcast for Streaming and Deep Learning High-Performance Broadcast for Streaming and Deep Learning Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth - SC17 2 Outline Introduction

More information

Solutions for Scalable HPC

Solutions for Scalable HPC Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

White Paper. Technical Advances in the SGI. UV Architecture

White Paper. Technical Advances in the SGI. UV Architecture White Paper Technical Advances in the SGI UV Architecture TABLE OF CONTENTS 1. Introduction 1 2. The SGI UV Architecture 2 2.1. SGI UV Compute Blade 3 2.1.1. UV_Hub ASIC Functionality 4 2.1.1.1. Global

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional

More information