MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems
|
|
- Ashley West
- 6 years ago
- Views:
Transcription
1 MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems Xinghong He HPC Application Support IBM Systems WW Client Centers May
2 Agenda System configurations Storage system, compute cluster IOR benchmark Build, run-time environment, test cases (command line) IOR POSIX performance Baseline, capability of the file system IOR MPIIO performance PE and MPIIO (ROMIO) parameters Collective IO, independent IO File transfer size 2
3 System configurations compute-1 compute-2 compute-40 8GB pagepool IB Switch1 IB Switch2 FDR 3xFDR server1 server2 server3 server4 147GB pagepool ESS GL4 6Gbps SAS 6Gbps SAS ESS GL4 3
4 System configurations compute-1 compute-2 compute-40 8GB pagepool IB Switch1 IB Switch2 EDR 3xFDR server1 server2 server3 server4 147GB pagepool ESS GL4 6Gbps SAS 6Gbps SAS ESS GL4 4
5 40 compute nodes IBM Power System S824L ( L) 2x10-core POWER GHz 256GB (16x16GB CDIMM) memory GPFS pagepool size: 8GB 2 FDR InfiniBand links (1 dual-port adapter) Ubuntu (LE) IBM Parallel Environment Run-time Edition
6 Compute nodes - updated IBM Power System S822LC (8335-GTA) 2x10-core POWER GHz 256GB (8x32GB 1333 MHz RDIMM) memory GPFS pagepool size: 8GB 2 EDR InfiniBand links (1 dual-port adapter) RHEL-7.2 LE IBM Parallel Environment Run-time Edition
7 2 ESS GL4, containing 4 IBM Power System S822L ( L) 2x10-core POWER GHz 256GB (16x16GB CDIMM) memory GPFS pagepool size: 147GB 6 FDR InfiniBand links (3 dual-port adapters) RHEL-7.1 BE IBM Spectrum Scale IBM DCS3700 Expansion Unit ( E) 464 (8x58) NL-SAS 2TB HDDs, 4 400GB SDDs RAID 8+2P 7
8 IOR Benchmark IOR downloaded from sourceforge Build no change to any source or makefile Export PATH=/opt/ibmhpc/prcurrent/ppe.poe/bin:${PATH} To add mpicc which is not in /usr/bin cd IOR/src/C; make mpiio Will build POSIX and MPIIO Test cases posix-1: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a POSIX -F posix-2: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a POSIX mpiio-1: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a MPIIO -c -F mpiio-2: -b $bsize -t 16M -s 1 -w -r -g -v -d 1 -i 4 -o $TARGET -a MPIIO -c bsize is chosen to ensure total file size per compute node is 76800MB, ~10x of pagepool 8
9 PE environment variables export MP_USE_BULK_XFER=yes export MP_EAGER_LIMIT=65536 export MEMORY_AFFINITY=MCM export MP_RESD=poe export MP_PE_AFFINITY=yes export MP_BINDPROC=yes export MP_TASK_AFFINITY=cpu export MP_CPU_BIND_LIST="152,144,136,128,120,112,104\,96,88,80,72,64,56,48,40,32,24,16,8,0" Adapter on the 2nd socket 9
10 Other settings MPIIO related export GPFSMPIO_COMM=1 Use MPI_Isend/MPI_Irecv, instead of MPI_alltoallv for data exchanging between the aggregator and other processes export GPFSMPIO_P2PCONTIG=1 export MP_IOTASKLIST= ${io_list} export ROMIO_HINTS=hints_file Equivalent to IOR option -U hints_file export MP_I_SHOW_AGGRS=1 export ROMIO_PRINT_HINTS=1 Equivalent to IOR option -H 10
11 IOR POSIX IO on 2 GL4-1ppn IOR POSIX IO on 2 GL4: 1 mpi task per node IO bandwidth in MiB/s posix-1 write posix-1 read posix-2 write posix-2 read Number of compute nodes 11
12 IOR POSIX IO on 2 GL4-4ppn IOR POSIX IO on 2 GL4-4 MPI tasks per node IO bandwidth in MiB/s posix-1 write posix-1 read posix-2 write posix-2 read Number of compute nodes 12
13 IOR MPIIO on 2 GL4-1ppn IOR MPIIO on 2 GL4: 1 MPI task per node IO bandwidth in MiB/s mpiio-1 write mpiio-1 read mpiio-2 write mpiio-2 read Number of compute nodes 13
14 IOR MPIIO on 2 GL4-4ppn IOR MPIIO on 2 GL4-4 MPI tasks per node IO bandwidth in MiB/s mpiio-1 write mpiio-1 read mpiio-2 write mpiio-2 read Number of compute nodes 14
15 Parameter table of the test cses GPFSMPIO_ COMM GPFSMPIO_ P2PCONTIG MP_IOTASKLI ST romio_cb_write romio_cb_read 15 def default default default default default comm 1 default default default p2p 1 default default default both 1 1 default default default dd_def default disable disable dd_comm 1 default disable disable dd_p2p 1 default disable disable dd_both 1 1 default disable disable tio_def all default default tio_comm 1 all default default tio_p2p 1 all default default tio_both 1 1 all default default dd_tio_def all disable disable dd_tio_comm 1 all disable disable dd_tio_p2p 1 all disable disable dd_tio_both 1 1 all disable disable Note: default is 0 0 One aggregator per node enable enable
16 ROMIO hints parameter default values PE cb_buffer_size = romio_cb_read = enable romio_cb_write = enable cb_nodes = 4 romio_no_indep_rw = false romio_cb_pfr = disable romio_cb_fr_types = aar romio_cb_fr_alignment = 1 romio_cb_ds_threshold = 0 romio_cb_alltoall = automatic ind_rd_buffer_size = ind_wr_buffer_size = romio_ds_read = automatic romio_ds_write = automatic romio_filesystem_type = GPFS+PE: IBM GPFS for PE OpenMPI cb_buffer_size = romio_cb_read = automatic romio_cb_write = automatic cb_nodes = 2 romio_no_indep_rw = false romio_cb_pfr = disable romio_cb_fr_types = aar romio_cb_fr_alignment = 1 romio_cb_ds_threshold = 0 romio_cb_alltoall = automatic ind_rd_buffer_size = ind_wr_buffer_size = romio_ds_read = automatic romio_ds_write = automatic cb_config_list = *:1 16
17 16 MB transfer size - mpiio on 1 and 2 nodes 17
18 IOR mpiio-1 write from one node 16 MB tsize IOR mpiio-1 write on 1 node, 16MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 18
19 IOR mpiio-1 write from two nodes 16 MB tsize IOR mpiio-1 write on 2 nodes, 16MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 19
20 IOR mpiio-2 write from one node 16 MB tsize IOR mpiio-2 write on 1 node, 16MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 20
21 IOR mpiio-2 write from two nodes 16 MB tsize IOR mpiio-2 write on 2 node, 16MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 21
22 1 MB transfer size - much larger difference 22
23 IOR mpiio-1 write from one node 1 MB tsize IOR mpiio-1 write on 1 node, 1MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 23
24 IOR mpiio-1 write from two nodes 1 MB tsize IOR mpiio-1 write on 2 nodes, 1MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 24
25 IOR mpiio-2 write from one node 1 MB tsize IOR mpiio-2 write on 1 node, 1MB Bandwidths in MiB/s x1 1x2 1x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 25
26 IOR mpiio-2 write from two nodes 1 MB tsize IOR mpiio-2 write on 2 nodes, 1MB Bandwidths in MiB/s x1 2x2 2x4 def comm p2p Both dd_def dd_comm dd_p2p dd_both tio_def tio_comm tio_p2p tio_both dd_tio_def dd_tio_comm dd_tio_p2p dd_tio_both node x ppn 26
27 IOR mpiio-2 write BW comparison 16 MB 1 MB default best ratio default best ratio 1x x x x x x mpiio-2 write bandwidths in MiB/s for the default and the best parameters. The 16 MB and and 1 MB are file transfer size (tsize) of IOR option -t. 27
28 Summary 45GB/s read and 35GB/s write for 2 ESS GL4 Both POSIX and MPIIO Both file_per_proc and single_shared_file MPI collective IO very sensitive to ROMIO hints parameters and other run-time parameters More impact to single_share_file than to file_per_proc More impact to multiple MPI tasks per node than to one MPI task per node More impact to smaller transfer size than to larger transfer size The worst from 1 MB transfer size can be 136x worse It can be expected to be more worse for sub -MB transfer sizes 28
29 Thank you! 29
Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016
Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis
More informationLecture 33: More on MPI I/O. William Gropp
Lecture 33: More on MPI I/O William Gropp www.cs.illinois.edu/~wgropp Today s Topics High level parallel I/O libraries Options for efficient I/O Example of I/O for a distributed array Understanding why
More informationParallel I/O on Theta with Best Practices
Parallel I/O on Theta with Best Practices Paul Coffman pcoffman@anl.gov Francois Tessier, Preeti Malakar, George Brown ALCF 1 Parallel IO Performance on Theta dependent on optimal Lustre File System utilization
More informationBest Practice Guide - Parallel I/O
Sandra Mendez, LRZ, Germany Sebastian Lührs, FZJ, Germany Dominic Sloan-Murphy (Editor), EPCC, United Kingdom Andrew Turner (Editor), EPCC, United Kingdom Volker Weinberg (Editor), LRZ, Germany Version
More informationAn ESS implementation in a Tier 1 HPC Centre
An ESS implementation in a Tier 1 HPC Centre Maximising Performance - the NeSI Experience José Higino (NeSI Platforms and NIWA, HPC Systems Engineer) Outline What is NeSI? The National Platforms Framework
More informationParallel I/O and MPI-IO contd. Rajeev Thakur
Parallel I/O and MPI-IO contd. Rajeev Thakur Outline Accessing noncontiguous data with MPI-IO Special features in MPI-IO for accessing subarrays and distributed arrays I/O performance tuning 2 Accessing
More informationIntrodução ao MPI-IO. Escola Regional de Alto Desempenho 2018 Porto Alegre RS. Jean Luca Bez 1 Francieli Z. Boito 2 Philippe O. A.
Introdução ao MPI-IO Escola Regional de Alto Desempenho 2018 Porto Alegre RS Jean Luca Bez 1 Francieli Z. Boito 2 Philippe O. A. Navaux 1 1 GPPD - INF - Universidade Federal do Rio Grande do Sul 2 INRIA
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationHabanero Operating Committee. January
Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes
More informationCPMD Performance Benchmark and Profiling. February 2014
CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationMILC Performance Benchmark and Profiling. April 2013
MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationOpenFOAM Performance Testing and Profiling. October 2017
OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC
More informationImplementing Storage in Intel Omni-Path Architecture Fabrics
white paper Implementing in Intel Omni-Path Architecture Fabrics Rev 2 A rich ecosystem of storage solutions supports Intel Omni- Path Executive Overview The Intel Omni-Path Architecture (Intel OPA) is
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationManaging Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen
Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen May 5, 2008 Cray Inc. Proprietary Slide 1 Goals of the Presentation Provide users an overview of the
More informationCSCS HPC storage. Hussein N. Harake
CSCS HPC storage Hussein N. Harake Points to Cover - XE6 External Storage (DDN SFA10K, SRP, QDR) - PCI-E SSD Technology - RamSan 620 Technology XE6 External Storage - Installed Q4 2010 - In Production
More informationGROMACS (GPU) Performance Benchmark and Profiling. February 2016
GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute
More informationLS-DYNA Performance Benchmark and Profiling. April 2015
LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationHigh-Performance Lustre with Maximum Data Assurance
High-Performance Lustre with Maximum Data Assurance Silicon Graphics International Corp. 900 North McCarthy Blvd. Milpitas, CA 95035 Disclaimer and Copyright Notice The information presented here is meant
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationGuidelines for Efficient Parallel I/O on the Cray XT3/XT4
Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationANSYS Fluent 14 Performance Benchmark and Profiling. October 2012
ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationEmerging Technologies for HPC Storage
Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018 The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationFeedback on BeeGFS. A Parallel File System for High Performance Computing
Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December
More informationZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1
ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationSami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1
Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationDeploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain
Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH
More informationDesign and Evaluation of a 2048 Core Cluster System
Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December
More informationA ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC
A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationGRID Testing and Profiling. November 2017
GRID Testing and Profiling November 2017 2 GRID C++ library for Lattice Quantum Chromodynamics (Lattice QCD) calculations Developed by Peter Boyle (U. of Edinburgh) et al. Hybrid MPI+OpenMP plus NUMA aware
More informationPicking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3
Picking the right number of targets per server for BeeGFS Jan Heichler March 2015 v1.3 Picking the right number of targets per server for BeeGFS 2 Abstract In this paper we will show the performance of
More informationINTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT
INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT Abhisek Pan 2, J.P. Walters 1, Vijay S. Pai 1,2, David Kang 1, Stephen P. Crago 1 1 University of Southern California/Information Sciences Institute 2
More informationGROMACS Performance Benchmark and Profiling. September 2012
GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationStore Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete
Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business
More informationNAMD Performance Benchmark and Profiling. January 2015
NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationParallel I/O on JUQUEEN
Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationAccelerating Spectrum Scale with a Intelligent IO Manager
Accelerating Spectrum Scale with a Intelligent IO Manager Ray Coetzee Pre-Sales Architect Seagate Systems Group, HPC 2017 Seagate, Inc. All Rights Reserved. 1 ClusterStor: Lustre, Spectrum Scale and Object
More informationIBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8 Processor Technology. March 2016
March 2016 IBM Systems Facts and Features: Enterprise and Scale-out Systems with POWER8 Processor Technology IBM Systems servers and IBM BladeCenter blade servers using IBM POWER7 and POWER7+ processors
More informationNew Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path. Brian Marshall GPFS UG - SC16 November 13, 2016
New Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path Brian Marshall GPFS UG - SC16 November 13, 2016 Presenter Background Brian Marshall Computational Scientist at Virginia Tech
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationManaging Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications
Managing Cray XT MPI Runtime Environment Variables to Optimize and Scale Applications Geir Johansen, Cray Inc. ABSTRACT: The Cray XT implementation of MPI provides configurable runtime environment variables
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationThe Last Bottleneck: How Parallel I/O can improve application performance
The Last Bottleneck: How Parallel I/O can improve application performance HPC ADVISORY COUNCIL STANFORD WORKSHOP; DECEMBER 6 TH 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Panasas Overview Who
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationParallel File Systems. John White Lawrence Berkeley National Lab
Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation
More informationGeorge Markomanolis IO500 Committee: John Bent, Julian M. Kunkel, Jay Lofstead 2017-11-12 http://www.io500.org IBM Spectrum Scale User Group, Denver, Colorado, USA Why? The increase of the studied domains,
More informationCertification Document macle GmbH IBM System xx3650 M4 03/06/2014. macle GmbH IBM System x3650 M4
macle GmbH IBM System x3650 M4 1 Executive summary After performing all tests, the Certification Document macle GmbH IBM System x3650 M4 system has been officially certified according to the Open-E Hardware
More informationPart Number Unit Descriptions
Part Number Unit Descriptions 2582B2A System x3100m4 Simple Swap (SATA) Xeon 4C E3-1220v2 69W 3.1GHz/1600MHz/8MB Form factor Tower (can be a 4U rack form factor using the optional Tower-to-Rack Conversion
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationParallel File Systems for HPC
Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical
More informationIs remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain
Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationHPE Scalable Storage with Intel Enterprise Edition for Lustre*
HPE Scalable Storage with Intel Enterprise Edition for Lustre* HPE Scalable Storage with Intel Enterprise Edition For Lustre* High Performance Storage Solution Meets Demanding I/O requirements Performance
More informationNAMD Performance Benchmark and Profiling. February 2012
NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationCommunication Models for Resource Constrained Hierarchical Ethernet Networks
Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos Katrinis # Jun Zhu +, Alexey Lastovetsky *, Shoukat Ali #, Rolf Riesen # + Technical University of Eindhoven,
More informationNCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017
NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,
More informationData Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016
National Aeronautics and Space Administration Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures 13 November 2016 Carrie Spear (carrie.e.spear@nasa.gov) HPC Architect/Contractor
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationData Management. Parallel Filesystems. Dr David Henty HPC Training and Support
Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER
More informationI/O and Scheduling aspects in DEEP-EST
I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh
More informationIBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
More informationDELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)
DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster
More informationIBM Tivoli Storage Manager. Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM
IBM Tivoli Storage Manager Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM Note: Before you use this information and the product it supports, read the information in
More informationUAntwerpen, 24 June 2016
Tier-1b Info Session UAntwerpen, 24 June 2016 VSC HPC environment Tier - 0 47 PF Tier -1 623 TF Tier -2 510 Tf 16,240 CPU cores 128/256 GB memory/node IB EDR interconnect Tier -3 HOPPER/TURING STEVIN THINKING/CEREBRO
More informationCoordinating Parallel HSM in Object-based Cluster Filesystems
Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving
More informationThe following documentation is an electronicallysubmitted vendor response to an advertised solicitation from the West Virginia Purchasing Bulletin
The following documentation is an electronicallysubmitted vendor response to an advertised solicitation from the West Virginia Purchasing Bulletin within the Vendor Self Service portal at wvoasis.gov.
More informationOpportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain
Opportunities of the rcuda remote virtualization middleware Federico Silla Universitat Politècnica de València Spain st Outline What is rcuda? HPC Advisory Council China Conference 2017 2/45 s are the
More informationLAMMPS Performance Benchmark and Profiling. July 2012
LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationThe Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations
The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationLustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013
Lustre on ZFS At The University of Wisconsin Space Science and Engineering Center Scott Nolin September 17, 2013 Why use ZFS for Lustre? The University of Wisconsin Space Science and Engineering Center
More informationAdaptive MPI Multirail Tuning for Non-Uniform Input/Output Access
Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access S. Moreaud, B. Goglin and R. Namyst INRIA Runtime team-project University of Bordeaux, France Context Multicore architectures everywhere
More informationHigh Performance Computing
21 High Performance Computing High Performance Computing Systems 21-2 HPC-1420-ISSE Robust 1U Intel Quad Core Xeon Server with Innovative Cable-less Design 21-3 HPC-2820-ISSE 2U Intel Quad Core Xeon Server
More informationThe Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law
The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law ERESEARCH AUSTRALASIA, NOVEMBER 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Parallel System Parallel processing goes mainstream
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationHardware withdrawal: Miscellaneous IBM Power Systems features
IBM United States Withdrawal Announcement 917-065, dated March 28, 2017 Hardware withdrawal: Miscellaneous IBM Power Systems features Table of contents 1 Overview 10 Replacement product information 1 Withdrawn
More informationSFA12KX and Lustre Update
Sep 2014 SFA12KX and Lustre Update Maria Perez Gutierrez HPC Specialist HPC Advisory Council Agenda SFA12KX Features update Partial Rebuilds QoS on reads Lustre metadata performance update 2 SFA12KX Features
More informationAutomated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al
Automated Verifica/on of I/O Performance F. Delalondre, M. Baerstchi Requirements Support Scien6sts Crea6vity Minimize Development 6me Maximize applica6on performance Performance Analysis System Performance
More informationHYCOM Performance Benchmark and Profiling
HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities
More informationIBM Power Systems. 14 February IBM Power Systems Facts and Features: Enterprise and Scale-out Systems with POWER8Ò Processor Technology
14 February 2017 IBM Systems Facts and Features: Enterprise and Scale-out Systems with POWER8Ò Processor Technology 1 Table of contents Page no. Notes 3 Why Systems 4 IBM System S812LC, S822LC for Commercial
More informationHardware withdrawal: Lenovo System x select options/features
Announcement ZG15-0292, dated December 15, 2015 Hardware withdrawal: Lenovo System x select options/features Table of contents 1 Overview 4 Replacement product information 1 Withdrawn products 5 Announcement
More informationCheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment
Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment David Hart, NCAR/CISL User Services Manager June 23, 2016 1 History of computing at NCAR 2 2 Cheyenne Planned production, January
More informationAnalyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC
Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms Dr. Jeffrey Layton Enterprise Technologist HPC Why GPUs? GPUs have very high peak compute capability! 6-9X CPU Challenges
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationGenius Quick Start Guide
Genius Quick Start Guide Overview of the system Genius consists of a total of 116 nodes with 2 Skylake Xeon Gold 6140 processors. Each with 18 cores, at least 192GB of memory and 800 GB of local SSD disk.
More information