Carlos Reaño Universitat Politècnica de València (Spain) HPC Advisory Council Switzerland Conference April 3, Lugano (Switzerland)
|
|
- Georgia Baldwin
- 5 years ago
- Views:
Transcription
1 Carlos Reaño Universitat Politècnica de València (Spain) Switzerland Conference April 3, Lugano (Switzerland)
2 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 2
3 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 3
4 Current HPC clusters: heterogeneous platforms (CPUs + GPUs ) Reduce execution time of parallel applications Increase costs: acquisition, maintenance, space... Energy consumption Low utilization rate Solution: remote GPU virtualization Physical GPUs only in some nodes of the cluster GPUs transparently shared among all the nodes Virtualization framework overhead Network overhead 4
5 GPGPU: OpenCL or CUDA General architecture remote GPU solutions: Existing solutions: GVirtuS, DS-CUDA, rcuda... rcuda (remote CUDA): freely available, supports CUDA 5.5 5
6 rcuda communication architecture: 6
7 rcuda communication architecture: Eth, IB,... 7
8 CUDA: Node 1 GPU rcuda (remote CUDA): Node 2 Network Node 1 GPU With rcuda Node 2 can use Node 1 GPU!!! 8
9 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 9
10 Where to obtain rcuda? Software Request Form Package contents. Two important folders: doc: rcuda user guide framework: rcuda server and library rcudad: rcuda server daemon rcudal: rcuda library Installing rcuda Just untar the tarball in both the server and the client(s) node(s) 10
11 Starting rcuda server: Set env. vars as if you were going to run a CUDA program: export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 Start rcuda server: cd $HOME/rCUDA/framework/rCUDAd./rCUDAd 11
12 Starting rcuda server: Set env. vars as if you were going to run a CUDA program: Path to CUDA binaries export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 Start rcuda server: cd $HOME/rCUDA/framework/rCUDAd./rCUDAd 12
13 Starting rcuda server: Set env. vars as if you were going to run a CUDA program: Path to CUDA libraries export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 Start rcuda server: cd $HOME/rCUDA/framework/rCUDAd./rCUDAd 13
14 Starting rcuda server: Set env. vars as if you were going to run a CUDA program: export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 Start rcuda server: Path to rcuda server cd $HOME/rCUDA/framework/rCUDAd./rCUDAd 14
15 Starting rcuda server: Set env. vars as if you were going to run a CUDA program: export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 Start rcuda server: cd rcuda/framework/rcudad./rcudad Start rcuda server in background 15
16 Running a CUDA program with rcuda: Set env. vars as follows: export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 16
17 Running a CUDA program with rcuda: Set env. vars as follows: Path to CUDA binaries export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 17
18 Running a CUDA program with rcuda: Set env. vars as follows: Path to rcuda library export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 18
19 Running a CUDA program with rcuda: Set env. vars as follows: Number of remote GPUs: 1, 2, 3... export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 19
20 Running a CUDA program with rcuda: Set env. vars as follows: Name/IP of rcuda server export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 20
21 Running a CUDA program with rcuda: Set env. vars as follows: GPU of remote server to use export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual:./devicequery... 21
22 Running a CUDA program with rcuda: Set env. vars as follows: export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=$HOME/rCUDA/framework/rCUDAl:$LD_LIBRARY_PATH export RCUDA_DEVICE_COUNT=1 export RCUDA_DEVICE_0=<server_name_or_ip_address>:0 Compile CUDA program using dynamic libraries: cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/deviceQuery make EXTRA_NVCCFLAGS=--cudart=shared Run the CUDA program as usual: Very important!!!./devicequery... 22
23 Live demonstration: devicequery using all the GPUs of the cluster bandwidthtest Problem: bandwidth with rcuda is too low!! Why? We are using TCP Solution: HPC networks InfiniBand (IB) 23
24 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 24
25 Is my IB network ready? Check that links are active and up in all nodes: $> ibstatus Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0001:c700:0000:e403 base lid: 0x1 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand 25
26 Is my IB network ready? Test links with, for instance, an IB bandwidth test Node1-$ ib_write_bw ********************************* * Waiting for client to connect * ********************************* 1. Start bandwidth server in Node1 2. Start bandwidth client in Node2 Node2-$ ib_write_bw Node RDMA_Write BW Test Dual-port : OFF Device : mlx4_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF TX depth : 128 CQ Moderation : 100 Mtu : 2048[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet local address:... remote address: #bytes #iterations BW peak[mb/sec] BW average[mb/sec] MsgRate[Mpps]
27 Starting rcuda server using IB: export RCUDAPROTO=IB cd $HOME/rCUDA/framework/rCUDAd./rCUDAd Run CUDA program using rcuda over IB: export RCUDAPROTO=IB cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/bandwidthTest./bandwidthTest Live demonstration: bandwidthtest using IB Bandwidth is no more a problem!! 27
28 Starting rcuda server using IB: export RCUDAPROTO=IB cd $HOME/rCUDA/framework/rCUDAd./rCUDAd Tell rcuda we want to use IB Run CUDA program using rcuda over IB: Also in the client!! export RCUDAPROTO=IB cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/bandwidthTest./bandwidthTest Live demonstration: bandwidthtest using IB Bandwidth is no more a problem!! 28
29 Starting rcuda server using IB: export RCUDAPROTO=IB cd $HOME/rCUDA/framework/rCUDAd./rCUDAd Run CUDA program using rcuda over IB: export RCUDAPROTO=IB cd $HOME/NVIDIA_CUDA_Samples/5.5/1_Utilities/bandwidthTest./bandwidthTest Live demonstration: bandwidthtest using IB Bandwidth is no more a problem!! 29
30 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 30
31 Sample scenarios: Typical behavior of CUDA applications: moving data to the GPU and performing a lot of computations there to compensate the overhead of having moved the data This benefits rcuda: more computations, less rcuda overhead Live demonstration Scalable applications: more GPUs, less execution time rcuda can use all the GPUs of the cluster, while CUDA only can use the ones directly connected to one node: for some applications, rcuda can get better results than with CUDA Live demonstration 31
32 What is rcuda? Installing and using rcuda rcuda over HPC networks InfiniBand How taking benefit from rcuda Sample scenarios Questions & Answers 32
33 33
34 Antonio Peña (1) Carlos Reaño Federico Silla José Duato rcuda Team Adrian Castelló Sergio Iserte Rafael Mayo Enrique S. Quintana-Ortí (1) Now at Argonne National Lab. (USA) 34
Improving overall performance and energy consumption of your cluster with remote GPU virtualization
Improving overall performance and energy consumption of your cluster with remote GPU virtualization Federico Silla & Carlos Reaño Technical University of Valencia Spain Tutorial Agenda 9.00-10.00 SESSION
More informationThe rcuda technology: an inexpensive way to improve the performance of GPU-based clusters Federico Silla
The rcuda technology: an inexpensive way to improve the performance of -based clusters Federico Silla Technical University of Valencia Spain The scope of this talk Delft, April 2015 2/47 More flexible
More informationExploiting Task-Parallelism on GPU Clusters via OmpSs and rcuda Virtualization
Exploiting Task-Parallelism on Clusters via Adrián Castelló, Rafael Mayo, Judit Planas, Enrique S. Quintana-Ortí RePara 2015, August Helsinki, Finland Exploiting Task-Parallelism on Clusters via Power/energy/utilization
More informationrcuda: towards energy-efficiency in GPU computing by leveraging low-power processors and InfiniBand interconnects
rcuda: towards energy-efficiency in computing by leveraging low-power processors and InfiniBand interconnects Federico Silla Technical University of Valencia Spain Joint research effort Outline Current
More informationCarlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)
Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB
More informationOpportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain
Opportunities of the rcuda remote virtualization middleware Federico Silla Universitat Politècnica de València Spain st Outline What is rcuda? HPC Advisory Council China Conference 2017 2/45 s are the
More informationWhy? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators
Remote CUDA (rcuda) Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators Better performance-watt, performance-cost
More informationRemote GPU virtualization: pros and cons of a recent technology. Federico Silla Technical University of Valencia Spain
Remote virtualization: pros and cons of a recent technology Federico Silla Technical University of Valencia Spain The scope of this talk HPC Advisory Council Brazil Conference 2015 2/43 st Outline What
More informationrcuda: an approach to provide remote access to GPU computational power
rcuda: an approach to provide remote access to computational power Rafael Mayo Gual Universitat Jaume I Spain (1 of 60) HPC Advisory Council Workshop Outline computing Cost of a node rcuda goals rcuda
More informationIs remote GPU virtualization useful? Federico Silla Technical University of Valencia Spain
Is remote virtualization useful? Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC Advisory Council Spain Conference 2015 2/57 We deal with s, obviously!
More informationrcuda: desde máquinas virtuales a clústers mixtos CPU-GPU
rcuda: desde máquinas virtuales a clústers mixtos CPU-GPU Federico Silla Universitat Politècnica de València HPC ADMINTECH 2018 rcuda: from virtual machines to hybrid CPU-GPU clusters Federico Silla Universitat
More informationDocument downloaded from:
Document downloaded from: http://hdl.handle.net/10251/70225 This paper must be cited as: Reaño González, C.; Silla Jiménez, F. (2015). On the Deployment and Characterization of CUDA Teaching Laboratories.
More informationFramework of rcuda: An Overview
Framework of rcuda: An Overview Mohamed Hussain 1, M.B.Potdar 2, Third Viraj Choksi 3 11 Research scholar, VLSI & Embedded Systems, Gujarat Technological University, Ahmedabad, India 2 Project Director,
More informationcomputational power computational
rcuda: rcuda: an an approach approach to to provide provide remote remote access access to to computational computational power power Rafael Mayo Gual Universitat Jaume I Spain (1 of 59) HPC Advisory Council
More informationDeploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain
Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH
More informationSpeeding up the execution of numerical computations and simulations with rcuda José Duato
Speeding up the execution of numerical computations and simulations with rcuda José Duato Universidad Politécnica de Valencia Spain Outline 1. Introduction to GPU computing 2. What is remote GPU virtualization?
More informationAn approach to provide remote access to GPU computational power
An approach to provide remote access to computational power University Jaume I, Spain Joint research effort 1/84 Outline computing computing scenarios Introduction to rcuda rcuda structure rcuda functionality
More informationOn the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications
On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, F. Silla Universitat Politècnica
More informationIncreasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain
Increasing the efficiency of your -enabled cluster with rcuda Federico Silla Technical University of Valencia Spain Outline Why remote virtualization? How does rcuda work? The performance of the rcuda
More informationrcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain
rcuda: hybrid - clusters Federico Silla Technical University of Valencia Spain Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that?
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationSolving Dense Linear Systems on Graphics Processors
Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad
More informationDynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems
Dynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems Jiadong Wu, Weiming Shi, and Bo Hong School of Electric and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332
More informationdopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters
dopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters Albano Alves 1, José Rufino 1, António Pina 2, and Luís Paulo Santos 2 1 Polytechnic Institute of Bragança, Bragança, Portugal,
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationIntroduction to High-Speed InfiniBand Interconnect
Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationclopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters
clopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters Albano Alves 1, José Rufino 1, António Pina 2, and Luís Paulo Santos 2 1 Polytechnic Institute of Bragança, Bragança, Portugal,
More informationThe Exascale Architecture
The Exascale Architecture Richard Graham HPC Advisory Council China 2013 Overview Programming-model challenges for Exascale Challenges for scaling MPI to Exascale InfiniBand enhancements Dynamically Connected
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationE4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU
E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium
More informationContaining RDMA and High Performance Computing
Containing RDMA and High Performance Computing Liran Liss ContainerCon 2015 Agenda High Performance Computing (HPC) networking RDMA 101 Containing RDMA Challenges Solution approach RDMA network namespace
More informationdopencl Evaluation of an API-Forwarding Implementation
dopencl Evaluation of an API-Forwarding Implementation Karsten Tausche, Max Plauth and Andreas Polze Operating Systems and Middleware Group Hasso Plattner Institute for Software Systems Engineering, University
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationETHERNET OVER INFINIBAND
14th ANNUAL WORKSHOP 2018 ETHERNET OVER INFINIBAND Evgenii Smirnov and Mikhail Sennikovsky ProfitBricks GmbH April 10, 2018 ETHERNET OVER INFINIBAND: CURRENT SOLUTIONS mlx4_vnic Currently deprecated Requires
More informationRDMA in Embedded Fabrics
RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded
More informationHigh Performance Computing
High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and
More informationDistributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability Atsushi Kawai, Kenji Yasuoka Department of Mechanical Engineering, Keio University Yokohama, Japan
More informationManuel F. Dolz, Juan C. Fernández, Rafael Mayo, Enrique S. Quintana-Ortí. High Performance Computing & Architectures (HPCA)
EnergySaving Cluster Roll: Power Saving System for Clusters Manuel F. Dolz, Juan C. Fernández, Rafael Mayo, Enrique S. Quintana-Ortí High Performance Computing & Architectures (HPCA) University Jaume I
More informationJesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN
Pedro Javier Garcia Jesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN Universidad de Castilla-La Mancha (UCLM) SPAIN Style Powered tby: Conference itle 1 March 12, Barcelona, Spain
More informationHidden Performance Tooling. Marc Skinner
Hidden Performance Tooling Marc Skinner Twin Cities Users Group :: Q2/2016 What do mean by Peformance? Is my system slow? What is my system doing? Do I have a bottleneck? How has my system been performing
More informationDynamic Sharing of GPUs in Cloud Systems
Dynamic Sharing of GPUs in Cloud Systems Khaled M. Diab, M. Mustafa Rafique, Mohamed Hefeeda Qatar Computing Research Institute (QCRI), Qatar Foundation; IBM Research, Ireland Email: kdiab@qf.org.qa; mustafa.rafique@ie.ibm.com;
More informationOpen Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,
Open Compute Stack (OpenCS) Overview D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, http://www.daetools.com/opencs What is OpenCS? A framework for: Platform-independent model specification 1.
More informationThe NE010 iwarp Adapter
The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More informationCan Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu
More informationNVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA C GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v02 August 2010 Installation and Verification on Mac OS X DOCUMENT CHANGE HISTORY DU-05348-001_v02 Version Date Authors Description of Change
More informationarxiv: v1 [cs.dc] 14 Oct 2018
Accelerator Virtualization in Fog Computing: Moving From the Cloud to the Edge arxiv:1810.06046v1 [cs.dc] 14 Oct 2018 Blesson Varghese 1, Carlos Reaño 1, and Federico Silla 2 1 School of Electronics, Electrical
More informationHPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda
KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects
More informationAccelerating Spark RDD Operations with Local and Remote GPU Devices
Accelerating Spark RDD Operations with Local and Remote GPU Devices Yasuhiro Ohno, Shin Morishima, and Hiroki Matsutani Dept.ofICS,KeioUniversity, 3-14-1 Hiyoshi, Kohoku, Yokohama, Japan 223-8522 Email:
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationWelcome to the IBTA Fall Webinar Series
Welcome to the IBTA Fall Webinar Series A four-part webinar series devoted to making I/O work for you Presented by the InfiniBand Trade Association The webinar will begin shortly. 1 September 23 October
More informationDesign challenges of Highperformance. MPI over InfiniBand. Presented by Karthik
Design challenges of Highperformance and Scalable MPI over InfiniBand Presented by Karthik Presentation Overview In depth analysis of High-Performance and scalable MPI with Reduced Memory Usage Zero Copy
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.0 October 2012 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1 System Requirements... 1 1.2 About
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationPARAVIRTUAL RDMA DEVICE
12th ANNUAL WORKSHOP 2016 PARAVIRTUAL RDMA DEVICE Aditya Sarwade, Adit Ranadive, Jorgen Hansen, Bhavesh Davda, George Zhang, Shelley Gong VMware, Inc. [ April 5th, 2016 ] MOTIVATION User Kernel Socket
More informationDistributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet
Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet Shin Morishima 1 and Hiroki Matsutani 1,2,3 1 Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan 223-8522
More informationDebugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.
Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors
More informationFUJITSU PHI Turnkey Solution
FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture
More informationImplementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction
More informationGROMACS (GPU) Performance Benchmark and Profiling. February 2016
GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationServers: Take Care of your Wattmeters!
Solving some Mysteries in Power Monitoring of Servers: Take Care of your Wattmeters! M. Diouri 1, M. Dolz 2, O. Gluck 1, L. Lefevre 1, P. Alonso 3, S. Catalan 2, R. Mayo 2, and E. Quintana-Orti 2 1- INRIA
More informationMy operating system is old but I don't care : I'm using NIX! B.Bzeznik BUX meeting, Vilnius 22/03/2016
My operating system is old but I don't care : I'm using NIX! B.Bzeznik BUX meeting, Vilnius 22/03/2016 CIMENT is the computing center of the University of Grenoble CIMENT computing platforms 132Tflops
More informationEXPERIENCES WITH NVME OVER FABRICS
13th ANNUAL WORKSHOP 2017 EXPERIENCES WITH NVME OVER FABRICS Parav Pandit, Oren Duer, Max Gurtovoy Mellanox Technologies [ 31 March, 2017 ] BACKGROUND: NVME TECHNOLOGY Optimized for flash and next-gen
More informationInfiniCloud: Leveraging the Global InfiniCortex Fabric and OpenStack Cloud for Borderless High Performance Computing of Genomic Data and Beyond
: Leveraging the Global InfiniCortex Fabric and OpenStack Cloud for Borderless High Performance Computing of Genomic Data and Beyond Kenneth Ban Institute of Molecular and Cell Biology, A*STAR, Dept. of
More informationFuture Trends in Hardware and Software for use in Simulation
Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock
More informationBenchmarking Instructions
Benchmarking Instructions Barcelona, May 4th, 2017 Summary Summary 1. CSUC benchmark suite 2017 2. General rules 3. Software 4. Results 1. CSUC benchmark suite 2017 With the present set of benchmarks,
More informationAdvanced MPI: MPI Tuning or How to Save SUs by Optimizing your MPI Library! The lab.
Advanced MPI: MPI Tuning or How to Save SUs by Optimizing your MPI Library! The lab. Jérôme VIENNE viennej@tacc.utexas.edu Texas Advanced Computing Center (TACC). University of Texas at Austin Tuesday
More informationSolutions for Scalable HPC
Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End
More informationNTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.
Messaging App IB Verbs NTRDMA dmaengine.h ntb.h DMA DMA DMA NTRDMA v0.1 An Open Source Driver for PCIe and DMA Allen Hubbe at Linux Piter 2015 1 INTRODUCTION Allen Hubbe Senior Software Engineer EMC Corporation
More informationExploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR
Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More information2 Installation Installing SnuCL CPU Installing SnuCL Single Installing SnuCL Cluster... 7
SnuCL User Manual Center for Manycore Programming Department of Computer Science and Engineering Seoul National University, Seoul 151-744, Korea http://aces.snu.ac.kr Release 1.3.2 December 2013 Contents
More informationHigh Performance Computing. Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab.
High Performance Computing Taichiro Suzuki Tokyo Institute of Technology Dept. of mathematical and computing sciences Matsuoka Lab. 1 Review Paper Two-Level Checkpoint/Restart Modeling for GPGPU Supada
More informationOpen Fabrics Workshop 2013
Open Fabrics Workshop 2013 OFS Software for the Intel Xeon Phi Bob Woodruff Agenda Intel Coprocessor Communication Link (CCL) Software IBSCIF RDMA from Host to Intel Xeon Phi Direct HCA Access from Intel
More informationConfiguring Non-Volatile Memory Express* (NVMe*) over Fabrics on Intel Omni-Path Architecture
Configuring Non-Volatile Memory Express* (NVMe*) over Fabrics on Intel Omni-Path Architecture Document Number: J78967-1.0 Legal Disclaimer Legal Disclaimer You may not use or facilitate the use of this
More informationRoCE Update. Liran Liss, Mellanox Technologies March,
RoCE Update Liran Liss, Mellanox Technologies March, 2012 www.openfabrics.org 1 Agenda RoCE Ecosystem QoS Virtualization High availability Latest news 2 RoCE in the Data Center Lossless configuration recommended
More informationHPC Performance Monitor
HPC Performance Monitor Application User Manual March 7, 2017 Page 1 of 35 Contents 1 Introduction... 4 2 System Requirements... 5 2.1 Client Side... 5 2.2 Server Side... 5 2.3 Cluster Configuration...
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationLab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC
Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC # pg. Subject Purpose directory 1 3 5 Offload, Begin (C) (F90) Compile and Run (CPU, MIC, Offload) offload_hello 2 7 Offload, Data Optimize
More informationShadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies
Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan {merritt.alex,abhishek.verma}@gatech.edu {vishakha,ada,schwan}@cc.gtaech.edu
More informationLustre File System on ARM
1 Lustre File System on ARM Architecture Evaluation v1.1 September 2017 Carlos Thomaz 2 Motivations ARM momentum 64bit evolution Recent debuts on HPC Traction in new areas such as Machine Learning and
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationVoltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO
Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com The Enterprise Grid Model and ization VMs need to interact efficiently
More informationLS-DYNA Performance Benchmark and Profiling. April 2015
LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationOceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.
OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationExploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters
2015 IEEE International Conference on Cluster Computing Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan,
More informationOncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters
Oncilla: A GAS Runtime for Efficient Resource Allocation and Data Movement in Accelerated Clusters Jeff Young, Se Hoon Shon, Sudhakar Yalamanchili, Alex Merritt, Karsten Schwan School of Electrical and
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationMellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note. Rev 3.5.0
Mellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note Rev 3.5.0 www.mellanox.com Mellanox Technologies NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS
More informationModern GPU Programming With CUDA and Thrust. Gilles Civario (ICHEC)
Modern GPU Programming With CUDA and Thrust Gilles Civario (ICHEC) gilles.civario@ichec.ie Let me introduce myself ID: Gilles Civario Gender: Male Age: 40 Affiliation: ICHEC Specialty: HPC Mission: CUDA
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More information