HPC Network Stack on ARM
|
|
- Warren Cole
- 5 years ago
- Views:
Transcription
1 HPC Network Stack on ARM Pavel Shamis (Pasha) Principal Research Engineer ARM Research ExaComm /22/2017
2 HPC network stack on ARM? 2
3 Serious ARM HPC deployments starting in 2017 ARM Emerging CPU architecture in HPC and server space Future deployments Islamabad Cray CS-400, Japan Post-K ARMv8 3
4 An introduction to ARM ARM is the world's leading semiconductor intellectual property supplier. We license to over 350 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 60 billion ARM cores have been shipped since Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products. We may license an instruction set architecture (ISA) such as ARMv8-A ) or a specific implementation, such as Cortex-A72. Partners who license an ISA can create their own implementation, as long as it passes the compliance tests. and our IP extends beyond the CPU 4
5 Range of SoCs addressing infrastructure Highly Accelerated Massively Multicore QorIQ Layerscape 2080A 5 One size does not fit all
6 Integration with Network Interconnects 6
7 CCIX Accelerators and Network (NIC/HCA/etc.) as a first class citizen in the system Seamless process and accelerator hardware cache coherence support Low-latency and high-bandwidth Allow in-line acceleration Bump in the wire processing (network packet processing, storage acceleration, etc.) Allows off-line acceleration (co-processor model) Driver-less / interrupt-less usage model 7
8 Scale-up server node DMC-620 DMC-620 DMC-620 DMC-620 Accelerator CoreLink CMN-600 CoreLink CMN-600 Smart Network DMC-620 DMC-620 DMC-620 DMC-620 Persistent Memory Shared virtual memory system 8
9 CCIX multichip connectivity and topologies New class of interconnect providing high performance, low latency for new accelerators use cases CCIX defines 25GT/s (3x performance*) Examining 56GT/s (7x performance*) and beyond Enabling low latency via light transaction layer Compute Node Switch Accelerator Smart Network Persistent Memory Flexible, scalable interconnect topologies Flexible point-to-point, daisy chained and switched topologies Simplified deployment by leveraging existing PCIe hardware and software infrastructure Runs on existing PCIe transport layer and management stack Coexist with legacy PCIe designs * Note: Based on PCIe Gen3 Performance 9
10 Building CCIX devices Cadence IP for CCIX Built upon silicon proven PCIe solutions Cadence Controller & PHY IP IP products: Controller IP Provides the CCIX transaction and data link layers. PHY IP Provides the high performance SERDES physical layer supporting speeds up to 25Gpbs. Verification IP Provides the necessary test infrastructure to verify CCIX designs. 10
11 Cadence CCIX integration Example CMN-600 mesh design CML converts CHI to CCIX messages Low latency CCIX transaction layer Support up to 25Gbps vs 16Gbps PCIe Gen4 DMC-620 DMC-620 Cadence IP CoreLink CMN-600 XP CML RNI CXS AXI CCIX Transaction Layer PCIe Transaction Layer Data Link Layer PHY (up to 25Gpbs) 16 Lanes DMC-620 DMC-620 PCIe IP connects to a CMN IO interface via AXI 11
12 Gen-Z All data is accessed by some form of a Read or a Write Example of reads: DDR Row + Column Read, PCI DMA Read, SCSI Write, Socket Read, File Read, RDMA Read Example of writes: DDR Row + Column Write, PCI DMA Write, SCSI Read, Socket Write, File Write, RDMA Write The Goal: Simplify world to memory semantic Reads & Writes 12
13 Gen-Z Overview An open, standards-based, scalable, system interconnect and protocol. Optimized to support memory semantic communications Breaks Processor-Memory Interlock Split controller model Memory controller Initiates high-level requests Read, Write, Atomic, Put / Get, etc. Enforces ordering, reliability, path selection, etc. Media controller Abstracts memory media Supports volatile / non-volatile / mixed-media Performs media-specific operations Executes requests and returns responses Enables data-centric computing (accelerator, compute, etc.) 13
14 Software Stack Overview 14
15 Linux / FreeBSD w/ AARCH64 support Debian 8 adds AARCH64 April LTS & 14.04LTS released ß Also & releases Fedora 22 released May 2015 Fedora 23 released Nov 2015 Red Hat Enterprise Linux Server for ARM 7.2 BETA Sept, 2015 CentOS Linux 7 for AArch64 GA August 2015 OpenSUSE 13.2 Nov 2014 SUSE Launches Partner Program to Bring SUSE Linux Enterprise 12 to 64-bit ARM July ISC 15 ß Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8 FreeBSD Beta version demo d by Semihalf Nov. 2015
16 Open source and commercial compilers GCC C, C++, Fortran OpenMP 4.0 PathScale C, C++ Fortran OpenACC OpenMP 4.0 LLVM C, C++ OpenMP 3.1, (4.0 coming soon) Fortran coming Q NAG Fortran OpenMP 3.1 ARM C/C++ Compiler LLVM based Includes SVE 16
17 ARM HPC ecosystem roadmap AppliedMicro X-Gene 1 & 2 Hardware AMD Seattle Cavium ThunderX Qualcomm Centriq Phytium Mars Cavium ThunderX2 AppliedMicro X-Gene 3 Fujitsu Post K (SVE) Released Planned Concept Open-Source software OpenHPC 1.2 ARM Optimized Routines ARM Optimized Routines vector versions Altair PBS Pro GCC (gcc/g++/gfortran) LLVM - clang LLVM Flang ARM C/C++ Compiler ahead of LLVM trunk ARM Fortran Compiler ARM HPC tools ARM Performance Libraries ARM Code Advisor (Beta) ARM Code Advisor (Full release) ARM Instruction Emulator ISV software Allinea DDT and MAP NAG Library & Compiler PathScale ENZO Rogue Wave TotalView ISV software Future
18 RDMA Networks Remote Direct Memory Access (RDMA) popular hardware network technology InfiniBand 37% of systems in TOP
19 RDMA Support Mellanox OFED 2.4 and above supports ARM Linux Kernel and above (maybe even earlier) Rdma-Core runs on ARM OFED No support Linux Distribution on going process 19
20 OpenUCX v1.2 The first official release from OpenUCX community Features Support for InfiniBand and RoCE Transports RC, UD, DC Support for Accelerated Verbs 40% speedup on ARM compared to vanilla Verbs Support for Cray Aries and Gemini Support for Shared Memory: KNEM, CMA, XPMEM, Posix, SySV Support for x86, ARMv8, Power Efficient memory polling 36% increase in efficiency on ARM UCX interface is integrated with MPICH, OpenMPI, OSHMEM, ORNL- SHMEM, etc. 20 Pavel Shamis, M. Graham Lopez, and Gilad Shainer. Enabling One-sided Communication Semantics on ARM, HIPS 2017
21 Programing models Open MPI compiles and runs on ARMv8 Continues integration with HPCAC ARMv8 server MPICH compiles and runs on ARMv8 MVAPICH compiles (with patches) and runs OSHMEM compiles and runs Continues integration with HPCAC ARMv8 server 21
22 Example: MPI+SHMEM+OpenUCX on InfiniBand 22
23 Lessons Learned Memory Barriers Multithread environment Software-hardware interaction Examples You can fish for these bugs in MPI implementations around Eager-RDMA and shared memory protocols RDMA Write Payload Busy-wait Read Notify RDMA Write Barrier Write Notify Read Barrier Read Payload Maranget, Luc, Susmit Sarkar, and Peter Sewell. "A tutorial introduction to the ARM and POWER relaxed memory models." Draft available from cl. cam. ac. uk/~ pes20/ppc-supplemental/test7. pdf (2012). 23
24 More About Barriers There are multiple types of barriers DSB Completion semantics Interaction with external devices (PCIe doorbells) Device drivers DMB ISH* domain on Linux Poll-flag, barrier, data ISB 24
25 Lessons Learned - continued Low-level timers Typically found in benchmarks and MPI Code examples 25
26 Lessons Learned continued Not all cache-lines are 64Byte! Implementation dependent 128Byte and 64Byte 26
27 Optimizations AVX => Neon Mostly found around communication request initialization codes ib/mlx5/ib_mlx5.inl#l160 Busy-wait loop See Wait-For-Event (WFE) 27 Pavel Shamis, M. Graham Lopez, and Gilad Shainer. Enabling One-sided Communication Semantics on ARM, HIPS 2017
28 Preliminary Results 28
29 Testbed 2 x Softiron Overdrive 3000 servers with AMD Opteron A1100 / 2GHz ConnectX-4 IB/VPI EDR (PCIe gen2 x8) Ubuntu MOFED UCX [0558b41] XPMEM [bdfcc52] OSHMEM/OPEN-MPI [fed4849] 29
30 OpenUCX IB: MLX5 vs Verbs 40% 30
31 OpenUCX: XPMEM 31
32 SHMEM_WAIT() 73% 35% 32
33 OpenSHMEM SSCA 7-30% 33
34 OpenSHMEM GUPs 21% 34 Pavel Shamis, M. Graham Lopez, and Gilad Shainer. Enabling One-sided Communication Semantics on ARM
35 Summary Linux RDMA community is doing great job! A lot of progress was made in ARM HPC/server software eco-system 35
36 The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright 2017 ARM Limited
HPC Network Stack on Arm Pavel Shamis/Pasha Principal Research Engineer
HPC Network Stack on Arm Pavel Shamis/Pasha Principal Research Engineer Mvapich User Group Mee:ng, 2017 Annapolis, MD Arm Overview An introduc0on to Arm Arm is the world's leading semiconductor intellectual
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationUnified Communication X (UCX)
Unified Communication X (UCX) Pavel Shamis / Pasha ARM Research SC 18 UCF Consortium Mission: Collaboration between industry, laboratories, and academia to create production grade communication frameworks
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationArm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited
Arm in HPC Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm 2019 Arm Limited Arm Technology Connects the World Arm in IOT 21 billion chips in the past year Mobile/Embedded/IoT/ Automotive/GPUs/Servers
More informationEnabling the ARM high performance computing (HPC) software ecosystem
Enabling the ARM high performance computing (HPC) software ecosystem Ashok Bhat Product manager, HPC and Server tools ARM Tech Symposia India December 7th 2016 Are these supercomputers? For example, the
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationThe Arm Technology Ecosystem: Current Products and Future Outlook
The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed
More informationARM High Performance Computing
ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction
More informationSUSE Linux Entreprise Server for ARM
FUT89013 SUSE Linux Entreprise Server for ARM Trends and Roadmap Jay Kruemcke Product Manager jayk@suse.com @mr_sles ARM Overview ARM is a Reduced Instruction Set (RISC) processor family British company,
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationArm Processor Technology Update and Roadmap
Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture
More informationBeyond Hardware IP An overview of Arm development solutions
Beyond Hardware IP An overview of Arm development solutions 2018 Arm Limited Arm Technical Symposia 2018 Advanced first design cost (US$ million) IC design complexity and cost aren t slowing down 542.2
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationHYCOM Performance Benchmark and Profiling
HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationJay Kruemcke Sr. Product Manager, HPC, Arm,
Jay Kruemcke Sr. Product Manager, HPC, Arm, POWER jayk@suse.com @mr_sles What s changed in the last year? 1.More capable Arm server chips New processors from Cavium, Qualcomm, HiSilicon, Ampere 2.Maturing
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationARISTA: Improving Application Performance While Reducing Complexity
ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationScheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications
Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for
More informationHigh Performance Computing
High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and
More informationInnovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing
Innovative Alternate Architecture for Exascale Computing Surya Hotha Director, Product Marketing Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud
More informationSoftware Ecosystem for Arm-based HPC
Software Ecosystem for Arm-based HPC CUG 2018 - Stockholm Florent.Lebeau@arm.com Ecosystem for HPC List of components needed: Linux OS availability Compilers Libraries Job schedulers Debuggers Profilers
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Pavel Shamis, Manjunath Gorentla Venkata, M. Graham Lopez, Matthew B. Baker, Oscar Hernandez, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard
More informationArm's role in co-design for the next generation of HPC platforms
Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization
More informationTransforming the Data Center with ARM
WHITE PAPER Transforming the Data Center with ARM Maximizing Energy, Scalability, and Performance in the Modern Data Center IT leaders looking to modernize their computer, networking and storage systems
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationEnabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers
Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers World s First 10nm Server Processor Sandeep Sethia Staff Engineer Qualcomm Datacenter Technologies, Inc. February 25, 2018 MariaDB
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationspin: High-performance streaming Processing in the Network
T. HOEFLER, S. DI GIROLAMO, K. TARANOV, R. E. GRANT, R. BRIGHTWELL spin: High-performance streaming Processing in the Network spcl.inf.ethz.ch The Development of High-Performance Networking Interfaces
More informationRapidIO.org Update. Mar RapidIO.org 1
RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org
More informationBootstrapping a HPC Ecosystem
Bootstrapping a HPC Ecosystem Eric Van Hensbergen Fellow Senior Director of HPC Software and Large Scale Systems Research Teratech Forum June 19, 2018 Copyright ARM computing is everywhere #1 shipping
More informationARM BOF. Jay Kruemcke Sr. Product Manager, HPC, ARM,
ARM BOF Jay Kruemcke Sr. Product Manager, HPC, ARM, POWER jayk@suse.com @mr_sles SUSE and the High Performance Computing Ecosystem Partnerships with HPE, Arm, Cavium, Cray, Intel, Microsoft, Dell, Qualcomm,
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationScaling with PGAS Languages
Scaling with PGAS Languages Panel Presentation at OFA Developers Workshop (2013) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationSami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1
Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var
More informationDynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks
DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks Jeff Maguire Senior Product Manager Infrastructure IP Product Management Arm 2017 Arm Limited Arm Tech Symposia 2017 Agenda 5G networks
More informationNVMe over Universal RDMA Fabrics
NVMe over Universal RDMA Fabrics Build a Flexible Scale-Out NVMe Fabric with Concurrent RoCE and iwarp Acceleration Broad spectrum Ethernet connectivity Universal RDMA NVMe Direct End-to-end solutions
More informationMM5 Modeling System Performance Research and Profiling. March 2009
MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationLow latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015
Low latency, high bandwidth communication. Infiniband and RDMA programming Knut Omang Ifi/Oracle 2 Nov, 2015 1 Bandwidth vs latency There is an old network saying: Bandwidth problems can be cured with
More informationUse Cases and Best Practices Primer for SUSE and ARM
Use Cases and Best Practices Primer for SUSE and ARM CAS91763 Andrew Wafaa Principal Engineer ARM Ltd Alexander Graf Dirk Mueller The Data Center is Evolving Today Next 3 Years 5 Years + Data center workload
More informationChecklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics
Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics Lloyd Dickman, CTO InfiniBand Products Host Solutions Group QLogic Corporation November 13, 2007 @ SC07, Exhibitor Forum
More informationCCR. ISC18 June 28, Kevin Pedretti, Jim H. Laros III, Si Hammond SAND C. Photos placed in horizontal env
Photos placed in horizontal position with even amount of white space between photos and header Photos placed in horizontal env position with even amount of white space between photos and header Vanguard
More informationAMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationIn-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017
In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationOpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,
IWOPH workshop, ISC, Germany June 21, 2017 OpenPOWER Innovations for HPC IBM Research Christoph Hagleitner, hle@zurich.ibm.com IBM Research - Zurich Lab IBM Research - Zurich Established in 1956 45+ different
More informationDRAM and Storage-Class Memory (SCM) Overview
Page 1 of 7 DRAM and Storage-Class Memory (SCM) Overview Introduction/Motivation Looking forward, volatile and non-volatile memory will play a much greater role in future infrastructure solutions. Figure
More informationAtos ARM solutions for HPC
Atos ARM solutions for HPC Eric Eppe Head of Solution Marketing & Portfolio HPC & Quantum Global Business Line Tuesday, March 7th, HPC User Forum, TERATEC Atos HPC and ARM A long time engagement 2012 2013
More informationRapidIO.org Update.
RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications
More informationArm crossplatform. VI-HPS platform October 16, Arm Limited
Arm crossplatform tools VI-HPS platform October 16, 2018 An introduction to Arm Arm is the world's leading semiconductor intellectual property supplier We license to over 350 partners: present in 95% of
More informationARM SERVER STANDARDIZATION
ARM SERVER STANDARDIZATION (and a general update on some happenings at Red Hat) Jon Masters, Chief ARM Architect, Red Hat 6+ YEARS OF ARM AT RED HAT Red Hat ARM Team formed in March 2011 Bootstrapped ARMv8
More informationNTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.
Messaging App IB Verbs NTRDMA dmaengine.h ntb.h DMA DMA DMA NTRDMA v0.1 An Open Source Driver for PCIe and DMA Allen Hubbe at Linux Piter 2015 1 INTRODUCTION Allen Hubbe Senior Software Engineer EMC Corporation
More informationOPEN MPI AND RECENT TRENDS IN NETWORK APIS
12th ANNUAL WORKSHOP 2016 OPEN MPI AND RECENT TRENDS IN NETWORK APIS #OFADevWorkshop HOWARD PRITCHARD (HOWARDP@LANL.GOV) LOS ALAMOS NATIONAL LAB LA-UR-16-22559 OUTLINE Open MPI background and release timeline
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationDatacenter Java Developers Start your ARMv8 Engines! CON11179
Datacenter Java Developers Start your ARMv8 Engines! CON11179 Jeff Underhill ARM - Director Server Programs Christian Thalinger Oracle - Principal Member of Technical Staff 1 Agenda ARM overview - who
More informationMVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The
More informationgenzconsortium.org Gen-Z Technology: Enabling Memory Centric Architecture
Gen-Z Technology: Enabling Memory Centric Architecture Why Gen-Z? Gen-Z Consortium 2017 2 Why Gen-Z? Gen-Z Consortium 2017 3 Why Gen-Z? Businesses Need to Monetize Data Big Data AI Machine Learning Deep
More informationArm in High Performance Computing: Fortran on AArch64
Arm in High Performance Computing: Fortran on AArch64 Nathan Sircombe Arm Manchester nathan.sircombe@arm.com 70% of the world s population uses Arm technology 2 Total computing experience Consumer Arm
More informationToward Building up Arm HPC Ecosystem --Fujitsu s Activities--
Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of
More informationiwarp Learnings and Best Practices
iwarp Learnings and Best Practices Author: Michael Fenn, Penn State Date: March 28, 2012 www.openfabrics.org 1 Introduction Last year, the Research Computing and Cyberinfrastructure group at Penn State
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationLAMMPS Performance Benchmark and Profiling. July 2012
LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationPCCC WORKSHOP:AMD の最新製品戦略とプラットフォームソリューション FEBRUARY 19 TH 2016 HIDETOSHI IWASA, FAE MANAGER AMD JAPAN
PCCC WORKSHOP:AMD の最新製品戦略とプラットフォームソリューション FEBRUARY 19 TH 2016 HIDETOSHI IWASA, FAE MANAGER AMD JAPAN BUILDING ON A HERITAGE OF INNOVATION 64-bit x86 Hardware Virtualization Enablement Integrated Memory
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationROCm: An open platform for GPU computing exploration
UCX-ROCm: ROCm Integration into UCX {Khaled Hamidouche, Brad Benton}@AMD Research ROCm: An open platform for GPU computing exploration 1 JUNE, 2018 ISC ROCm Software Platform An Open Source foundation
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox
More informationPerformance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract
Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand Abstract...1 Introduction...2 Overview of ConnectX Architecture...2 Performance Results...3 Acknowledgments...7 For
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationARMv8-A Software Development
ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for
More informationARM instruction sets and CPUs for wide-ranging applications
ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world
More informationModeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces
Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation
More informationRDMA in Embedded Fabrics
RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded
More informationComparing Ethernet & Soft RoCE over 1 Gigabit Ethernet
Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics
More informationProgramming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title
Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and
More informationLS-DYNA Productivity and Power-aware Simulations in Cluster Environments
LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,
More informationSolutions for Scalable HPC
Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End
More informationEarly Software Development Through Emulation for a Complex SoC
Early Software Development Through Emulation for a Complex SoC FTF-NET-F0204 Raghav U. Nayak Senior Validation Engineer A P R. 2 0 1 4 TM External Use Session Objectives After completing this session you
More informationComparing Ethernet and Soft RoCE for MPI Communication
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 7-66, p- ISSN: 7-77Volume, Issue, Ver. I (Jul-Aug. ), PP 5-5 Gurkirat Kaur, Manoj Kumar, Manju Bala Department of Computer Science & Engineering,
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationSoftware Development Using Full System Simulation with Freescale QorIQ Communications Processors
Patrick Keliher, Simics Field Application Engineer Software Development Using Full System Simulation with Freescale QorIQ Communications Processors 1 2013 Wind River. All Rights Reserved. Agenda Introduction
More information