MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems

Size: px
Start display at page:

Download "MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems"

Transcription

1 MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems Xinghong He HPC Application Support IBM Systems WW Client Centers May

2 Agenda System configurations Storage system, compute cluster IOR benchmark Build, run-time environment, test cases (command line) IOR POSIX performance Baseline, capability of the file system IOR MPIIO performance PE and MPIIO (ROMIO) parameters Collective IO, independent IO File transfer size 2

3 System configurations compute-1 compute-2 compute-40 8GB pagepool IB Switch1 IB Switch2 FDR 3xFDR server1 server2 server3 server4 147GB pagepool ESS GL4 6Gbps SAS 6Gbps SAS ESS GL4 3

4 System configurations compute-1 compute-2 compute-40 8GB pagepool IB Switch1 IB Switch2 EDR 3xFDR server1 server2 server3 server4 147GB pagepool ESS GL4 6Gbps SAS 6Gbps SAS ESS GL4 4

5 40 compute nodes IBM Power System S824L ( L) 2x10-core POWER GHz 256GB (16x16GB CDIMM) memory GPFS pagepool size: 8GB 2 FDR InfiniBand links (1 dual-port adapter) Ubuntu (LE) IBM Parallel Environment Run-time Edition

6 Compute nodes - updated IBM Power System S822LC (8335-GTA) 2x10-core POWER GHz 256GB (8x32GB 1333 MHz RDIMM) memory GPFS pagepool size: 8GB 2 EDR InfiniBand links (1 dual-port adapter) RHEL-7.2 LE IBM Parallel Environment Run-time Edition

7 2 ESS GL4, containing 4 IBM Power System S822L ( L) 2x10-core POWER GHz 256GB (16x16GB CDIMM) memory GPFS pagepool size: 147GB 6 FDR InfiniBand links (3 dual-port adapters) RHEL-7.1 BE IBM Spectrum Scale IBM DCS3700 Expansion Unit ( E) 464 (8x58) NL-SAS 2TB HDDs, 4 400GB SDDs RAID 8+2P 7

8 IOR Benchmark IOR downloaded from sourceforge Build no change to any source or makefile Export PATH=/opt/ibmhpc/prcurrent/ppe.poe/bin:${PATH} To add mpicc which is not in /usr/bin cd IOR/src/C; make mpiio Will build POSIX and MPIIO Test cases posix-1: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a POSIX -F posix-2: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a POSIX mpiio-1: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a MPIIO -c -F mpiio-2: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a MPIIO -c bsize is chosen to ensure total file size per compute node is 76800MB, ~10x of pagepool 8

9 PE environment variables export MP_USE_BULK_XFER=yes export MP_EAGER_LIMIT=65536 export MEMORY_AFFINITY=MCM export MP_RESD=poe export MP_PE_AFFINITY=yes export MP_BINDPROC=yes export MP_TASK_AFFINITY=cpu export MP_CPU_BIND_LIST="152,144,136,128,120,112,104\,96,88,80,72,64,56,48,40,32,24,16,8,0" Adapter on the 2nd socket 9

10 Other settings MPIIO related export GPFSMPIO_COMM=1 Use MPI_Isend/MPI_Irecv, instead of MPI_alltoallv for data exchanging between the aggregator and other processes export GPFSMPIO_P2PCONTIG=1 export MP_IOTASKLIST= ${io_list} export ROMIO_HINTS=hints_file Equivalent to IOR option -U hints_file export MP_I_SHOW_AGGRS=1 export ROMIO_PRINT_HINTS=1 Equivalent to IOR option -H 10

11 IOR POSIX IO on 2 GL4-1ppn IOR POSIX IO on 2 GL4: 1 mpi task per node IO bandwidth in MiB/s posix-1 write posix-1 read posix-2 write posix-2 read Number of compute nodes 11

12 IOR POSIX IO on 2 GL4-4ppn IOR POSIX IO on 2 GL4-4 MPI tasks per node IO bandwidth in MiB/s posix-1 write posix-1 read posix-2 write posix-2 read Number of compute nodes 12

13 IOR MPIIO on 2 GL4-1ppn IOR MPIIO on 2 GL4: 1 MPI task per node IO bandwidth in MiB/s mpiio-1 write mpiio-1 read mpiio-2 write mpiio-2 read Number of compute nodes 13

14 IOR MPIIO on 2 GL4-4ppn IOR MPIIO on 2 GL4-4 MPI tasks per node IO bandwidth in MiB/s mpiio-1 write mpiio-1 read mpiio-2 write mpiio-2 read Number of compute nodes 14

15 Parameter table of the test cses GPFSMPIO_ COMM GPFSMPIO_ P2PCONTIG MP_IOTASKLI ST romio_cb_write romio_cb_read 15 def default default default default default comm 1 default default default p2p 1 default default default both 1 1 default default default dd_def default disable disable dd_comm 1 default disable disable dd_p2p 1 default disable disable dd_both 1 1 default disable disable tio_def all default default tio_comm 1 all default default tio_p2p 1 all default default tio_both 1 1 all default default dd_tio_def all disable disable dd_tio_comm 1 all disable disable dd_tio_p2p 1 all disable disable dd_tio_both 1 1 all disable disable Note: default is 0 0 One aggregator per node enable enable

16 ROMIO hints parameter default values PE cb_buffer_size = romio_cb_read = enable romio_cb_write = enable cb_nodes = 4 romio_no_indep_rw = false romio_cb_pfr = disable romio_cb_fr_types = aar romio_cb_fr_alignment = 1 romio_cb_ds_threshold = 0 romio_cb_alltoall = automatic ind_rd_buffer_size = ind_wr_buffer_size = romio_ds_read = automatic romio_ds_write = automatic romio_filesystem_type = GPFS+PE: IBM GPFS for PE OpenMPI cb_buffer_size = romio_cb_read = automatic romio_cb_write = automatic cb_nodes = 2 romio_no_indep_rw = false romio_cb_pfr = disable romio_cb_fr_types = aar romio_cb_fr_alignment = 1 romio_cb_ds_threshold = 0 romio_cb_alltoall = automatic ind_rd_buffer_size = ind_wr_buffer_size = romio_ds_read = automatic romio_ds_write = automatic cb_config_list = *:1 16

17 16 MB transfer size - mpiio on 1 and 2 nodes 17

18 IOR mpiio-1 write from one node 16 MB tsize IOR mpiio-1 write on 1 node, 16MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 18

19 IOR mpiio-1 write from two nodes 16 MB tsize IOR mpiio-1 write on 2 nodes, 16MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 19

20 IOR mpiio-2 write from one node 16 MB tsize IOR mpiio-2 write on 1 node, 16MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 20

21 IOR mpiio-2 write from two nodes 16 MB tsize IOR mpiio-2 write on 2 node, 16MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 21

22 1 MB transfer size - much larger difference 22

23 IOR mpiio-1 write from one node 1 MB tsize IOR mpiio-1 write on 1 node, 1MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 23

24 IOR mpiio-1 write from two nodes 1 MB tsize IOR mpiio-1 write on 2 nodes, 1MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 24

25 IOR mpiio-2 write from one node 1 MB tsize IOR mpiio-2 write on 1 node, 1MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 25

26 IOR mpiio-2 write from two nodes 1 MB tsize IOR mpiio-2 write on 2 nodes, 1MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 26

27 IOR mpiio-2 write BW comparison 16 MB 1 MB default best ratio default best ratio 1x x x x x x mpiio-2 write bandwidths in MiB/s for the default and the best parameters. The 16 MB and and 1 MB are file transfer size (tsize) of IOR option -t. 27

28 Summary 45GB/s read and 35GB/s write for 2 ESS GL4 Both POSIX and MPIIO Both file_per_proc and single_shared_file MPI collective IO very sensitive to ROMIO hints parameters and other run-time parameters More impact to single_share_file than to file_per_proc More impact to multiple MPI tasks per node than to one MPI task per node More impact to smaller transfer size than to larger transfer size The worst from 1 MB transfer size can be 136x worse It can be expected to be more worse for sub -MB transfer sizes 28

29 Thank you! 29

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016 Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis

More information

Lecture 33: More on MPI I/O. William Gropp

Lecture 33: More on MPI I/O. William Gropp Lecture 33: More on MPI I/O William Gropp www.cs.illinois.edu/~wgropp Today s Topics High level parallel I/O libraries Options for efficient I/O Example of I/O for a distributed array Understanding why

More information

Parallel I/O on Theta with Best Practices

Parallel I/O on Theta with Best Practices Parallel I/O on Theta with Best Practices Paul Coffman pcoffman@anl.gov Francois Tessier, Preeti Malakar, George Brown ALCF 1 Parallel IO Performance on Theta dependent on optimal Lustre File System utilization

More information

Best Practice Guide - Parallel I/O

Best Practice Guide - Parallel I/O Sandra Mendez, LRZ, Germany Sebastian Lührs, FZJ, Germany Dominic Sloan-Murphy (Editor), EPCC, United Kingdom Andrew Turner (Editor), EPCC, United Kingdom Volker Weinberg (Editor), LRZ, Germany Version

More information

An ESS implementation in a Tier 1 HPC Centre

An ESS implementation in a Tier 1 HPC Centre An ESS implementation in a Tier 1 HPC Centre Maximising Performance - the NeSI Experience José Higino (NeSI Platforms and NIWA, HPC Systems Engineer) Outline What is NeSI? The National Platforms Framework

More information

Parallel I/O and MPI-IO contd. Rajeev Thakur

Parallel I/O and MPI-IO contd. Rajeev Thakur Parallel I/O and MPI-IO contd. Rajeev Thakur Outline Accessing noncontiguous data with MPI-IO Special features in MPI-IO for accessing subarrays and distributed arrays I/O performance tuning 2 Accessing

More information

Introdução ao MPI-IO. Escola Regional de Alto Desempenho 2018 Porto Alegre RS. Jean Luca Bez 1 Francieli Z. Boito 2 Philippe O. A.

Introdução ao MPI-IO. Escola Regional de Alto Desempenho 2018 Porto Alegre RS. Jean Luca Bez 1 Francieli Z. Boito 2 Philippe O. A. Introdução ao MPI-IO Escola Regional de Alto Desempenho 2018 Porto Alegre RS Jean Luca Bez 1 Francieli Z. Boito 2 Philippe O. A. Navaux 1 1 GPPD - INF - Universidade Federal do Rio Grande do Sul 2 INRIA

More information

STAR-CCM+ Performance Benchmark and Profiling. July 2014

STAR-CCM+ Performance Benchmark and Profiling. July 2014 STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute

More information

Habanero Operating Committee. January

Habanero Operating Committee. January Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes

More information

CPMD Performance Benchmark and Profiling. February 2014

CPMD Performance Benchmark and Profiling. February 2014 CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

OpenFOAM Performance Testing and Profiling. October 2017

OpenFOAM Performance Testing and Profiling. October 2017 OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC

More information

Implementing Storage in Intel Omni-Path Architecture Fabrics

Implementing Storage in Intel Omni-Path Architecture Fabrics white paper Implementing in Intel Omni-Path Architecture Fabrics Rev 2 A rich ecosystem of storage solutions supports Intel Omni- Path Executive Overview The Intel Omni-Path Architecture (Intel OPA) is

More information

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015 Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute

More information

Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen

Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen May 5, 2008 Cray Inc. Proprietary Slide 1 Goals of the Presentation Provide users an overview of the

More information

CSCS HPC storage. Hussein N. Harake

CSCS HPC storage. Hussein N. Harake CSCS HPC storage Hussein N. Harake Points to Cover - XE6 External Storage (DDN SFA10K, SRP, QDR) - PCI-E SSD Technology - RamSan 620 Technology XE6 External Storage - Installed Q4 2010 - In Production

More information

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

GROMACS (GPU) Performance Benchmark and Profiling. February 2016 GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute

More information

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

High-Performance Lustre with Maximum Data Assurance

High-Performance Lustre with Maximum Data Assurance High-Performance Lustre with Maximum Data Assurance Silicon Graphics International Corp. 900 North McCarthy Blvd. Milpitas, CA 95035 Disclaimer and Copyright Notice The information presented here is meant

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4

More information

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015 LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012 ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

Emerging Technologies for HPC Storage

Emerging Technologies for HPC Storage Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018 The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Feedback on BeeGFS. A Parallel File System for High Performance Computing Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December

More information

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH

More information

Design and Evaluation of a 2048 Core Cluster System

Design and Evaluation of a 2048 Core Cluster System Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December

More information

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

GRID Testing and Profiling. November 2017

GRID Testing and Profiling. November 2017 GRID Testing and Profiling November 2017 2 GRID C++ library for Lattice Quantum Chromodynamics (Lattice QCD) calculations Developed by Peter Boyle (U. of Edinburgh) et al. Hybrid MPI+OpenMP plus NUMA aware

More information

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3 Picking the right number of targets per server for BeeGFS Jan Heichler March 2015 v1.3 Picking the right number of targets per server for BeeGFS 2 Abstract In this paper we will show the performance of

More information

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT Abhisek Pan 2, J.P. Walters 1, Vijay S. Pai 1,2, David Kang 1, Stephen P. Crago 1 1 University of Southern California/Information Sciences Institute 2

More information

GROMACS Performance Benchmark and Profiling. September 2012

GROMACS Performance Benchmark and Profiling. September 2012 GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute

More information

Accelerating Spectrum Scale with a Intelligent IO Manager

Accelerating Spectrum Scale with a Intelligent IO Manager Accelerating Spectrum Scale with a Intelligent IO Manager Ray Coetzee Pre-Sales Architect Seagate Systems Group, HPC 2017 Seagate, Inc. All Rights Reserved. 1 ClusterStor: Lustre, Spectrum Scale and Object

More information

IBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8 Processor Technology. March 2016

IBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8 Processor Technology. March 2016 March 2016 IBM Systems Facts and Features: Enterprise and Scale-out Systems with POWER8 Processor Technology IBM Systems servers and IBM BladeCenter blade servers using IBM POWER7 and POWER7+ processors

More information

New Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path. Brian Marshall GPFS UG - SC16 November 13, 2016

New Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path. Brian Marshall GPFS UG - SC16 November 13, 2016 New Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path Brian Marshall GPFS UG - SC16 November 13, 2016 Presenter Background Brian Marshall Computational Scientist at Virginia Tech

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications

Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen, Cray Inc. ABSTRACT: The Cray XT implementation of MPI provides configurable runtime environment variables

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

The Last Bottleneck: How Parallel I/O can improve application performance

The Last Bottleneck: How Parallel I/O can improve application performance The Last Bottleneck: How Parallel I/O can improve application performance HPC ADVISORY COUNCIL STANFORD WORKSHOP; DECEMBER 6 TH 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Panasas Overview Who

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

George Markomanolis IO500 Committee: John Bent, Julian M. Kunkel, Jay Lofstead 2017-11-12 http://www.io500.org IBM Spectrum Scale User Group, Denver, Colorado, USA Why? The increase of the studied domains,

More information

Certification Document macle GmbH IBM System xx3650 M4 03/06/2014. macle GmbH IBM System x3650 M4

Certification Document macle GmbH IBM System xx3650 M4 03/06/2014. macle GmbH IBM System x3650 M4 macle GmbH IBM System x3650 M4 1 Executive summary After performing all tests, the Certification Document macle GmbH IBM System x3650 M4 system has been officially certified according to the Open-E Hardware

More information

Part Number Unit Descriptions

Part Number Unit Descriptions Part Number Unit Descriptions 2582B2A System x3100m4 Simple Swap (SATA) Xeon 4C E3-1220v2 69W 3.1GHz/1600MHz/8MB Form factor Tower (can be a 4U rack form factor using the optional Tower-to-Rack Conversion

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain

Is remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

HPE Scalable Storage with Intel Enterprise Edition for Lustre* HPE Scalable Storage with Intel Enterprise Edition for Lustre* HPE Scalable Storage with Intel Enterprise Edition For Lustre* High Performance Storage Solution Meets Demanding I/O requirements Performance

More information

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. February 2012 NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

Communication Models for Resource Constrained Hierarchical Ethernet Networks

Communication Models for Resource Constrained Hierarchical Ethernet Networks Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos Katrinis # Jun Zhu +, Alexey Lastovetsky *, Shoukat Ali #, Rolf Riesen # + Technical University of Eindhoven,

More information

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,

More information

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016 National Aeronautics and Space Administration Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures 13 November 2016 Carrie Spear (carrie.e.spear@nasa.gov) HPC Architect/Contractor

More information

HPC Storage Use Cases & Future Trends

HPC Storage Use Cases & Future Trends Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively

More information

Isilon Performance. Name

Isilon Performance. Name 1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.

More information

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER

More information

I/O and Scheduling aspects in DEEP-EST

I/O and Scheduling aspects in DEEP-EST I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster

More information

IBM Tivoli Storage Manager. Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM

IBM Tivoli Storage Manager. Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM IBM Tivoli Storage Manager Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM Note: Before you use this information and the product it supports, read the information in

More information

UAntwerpen, 24 June 2016

UAntwerpen, 24 June 2016 Tier-1b Info Session UAntwerpen, 24 June 2016 VSC HPC environment Tier - 0 47 PF Tier -1 623 TF Tier -2 510 Tf 16,240 CPU cores 128/256 GB memory/node IB EDR interconnect Tier -3 HOPPER/TURING STEVIN THINKING/CEREBRO

More information

Coordinating Parallel HSM in Object-based Cluster Filesystems

Coordinating Parallel HSM in Object-based Cluster Filesystems Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving

More information

The following documentation is an electronicallysubmitted vendor response to an advertised solicitation from the West Virginia Purchasing Bulletin

The following documentation is an electronicallysubmitted vendor response to an advertised solicitation from the West Virginia Purchasing Bulletin The following documentation is an electronicallysubmitted vendor response to an advertised solicitation from the West Virginia Purchasing Bulletin within the Vendor Self Service portal at wvoasis.gov.

More information

Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain

Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain Opportunities of the rcuda remote virtualization middleware Federico Silla Universitat Politècnica de València Spain st Outline What is rcuda? HPC Advisory Council China Conference 2017 2/45 s are the

More information

LAMMPS Performance Benchmark and Profiling. July 2012

LAMMPS Performance Benchmark and Profiling. July 2012 LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013 Lustre on ZFS At The University of Wisconsin Space Science and Engineering Center Scott Nolin September 17, 2013 Why use ZFS for Lustre? The University of Wisconsin Space Science and Engineering Center

More information

Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access

Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access S. Moreaud, B. Goglin and R. Namyst INRIA Runtime team-project University of Bordeaux, France Context Multicore architectures everywhere

More information

High Performance Computing

High Performance Computing 21 High Performance Computing High Performance Computing Systems 21-2 HPC-1420-ISSE Robust 1U Intel Quad Core Xeon Server with Innovative Cable-less Design 21-3 HPC-2820-ISSE 2U Intel Quad Core Xeon Server

More information

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law ERESEARCH AUSTRALASIA, NOVEMBER 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Parallel System Parallel processing goes mainstream

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

Hardware withdrawal: Miscellaneous IBM Power Systems features

Hardware withdrawal: Miscellaneous IBM Power Systems features IBM United States Withdrawal Announcement 917-065, dated March 28, 2017 Hardware withdrawal: Miscellaneous IBM Power Systems features Table of contents 1 Overview 10 Replacement product information 1 Withdrawn

More information

SFA12KX and Lustre Update

SFA12KX and Lustre Update Sep 2014 SFA12KX and Lustre Update Maria Perez Gutierrez HPC Specialist HPC Advisory Council Agenda SFA12KX Features update Partial Rebuilds QoS on reads Lustre metadata performance update 2 SFA12KX Features

More information

Automated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al

Automated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al Automated Verifica/on of I/O Performance F. Delalondre, M. Baerstchi Requirements Support Scien6sts Crea6vity Minimize Development 6me Maximize applica6on performance Performance Analysis System Performance

More information

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities

More information

IBM Power Systems. 14 February IBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8Ò Processor Technology

IBM Power Systems. 14 February IBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8Ò Processor Technology 14 February 2017 IBM Systems Facts and Features: Enterprise and Scale-out Systems with POWER8Ò Processor Technology 1 Table of contents Page no. Notes 3 Why Systems 4 IBM System S812LC, S822LC for Commercial

More information

Hardware withdrawal: Lenovo System x select options/features

Hardware withdrawal: Lenovo System x select options/features Announcement ZG15-0292, dated December 15, 2015 Hardware withdrawal: Lenovo System x select options/features Table of contents 1 Overview 4 Replacement product information 1 Withdrawn products 5 Announcement

More information

Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment

Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment David Hart, NCAR/CISL User Services Manager June 23, 2016 1 History of computing at NCAR 2 2 Cheyenne Planned production, January

More information

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms Dr. Jeffrey Layton Enterprise Technologist HPC Why GPUs? GPUs have very high peak compute capability! 6-9X CPU Challenges

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,

More information

Genius Quick Start Guide

Genius Quick Start Guide Genius Quick Start Guide Overview of the system Genius consists of a total of 116 nodes with 2 Skylake Xeon Gold 6140 processors. Each with 18 cores, at least 192GB of memory and 800 GB of local SSD disk.

More information