IBM Research: AcceleratorTechnologies in HPC and Cognitive Computing
|
|
- Blake Dean
- 6 years ago
- Views:
Transcription
1 MaRS Workshop, Eurosys 2017, Belgrade April 23, 2017 IBM Research: AcceleratorTechnologies in HPC and Cognitive Computing Christoph Hagleitner, IBM Research - Zurich Lab
2 IBM Research Zurich Established in different nationalities Two Nobel Prizes: 1986: Nobel Prize in Physics for the invention of the scanning tunneling microscope by Heinrich Rohrer and Gerd K. Binnig 1987: Nobel Prize in Physics for the discovery of high-temperature superconductivity by K. Alex Müller and J. Georg Bednorz Binnig and Rohrer Nanotechnology Centre opened in 2011 (Public Private Partnership with ETH Zürich and EMPA) Open Collaboration: Horizon2020: 28 funded projects and 170+ partners 9 European Research Council Grants IBM Research THINK Lab Zurich (Client Center) 2
3 Outline Towards exascale computing OpenPOWER Foundation IBM Systems POWER8 Accelerators POWER9 Accelerator IBM Research - Zurich CAPI attached accelerators Near-memory acceleration DSS Hyperscale FPGAs 3
4 Towards Exascale: FLOPS Increasing gap between performance and power efficiency Innovation metric measures (relative) increase in performance x increase in power efficiency Heterogeneous systems (Cell processor, GPUs) Dense systems (BlueGenL, TaihuLight) Diminshing performance / power efficience gains from technology scaling -> heterogeneous systems Performance (Petaflops/sec.) Power efficiency (Gigaflops/W) Innovation 4
5 Towards Exascale: Applications National Labs w/ monolithic applications drive HPC roadmap in the US Europe is different with diverse user / application space Convergence of Data-science and HPC, e.g., Cognitive Computing Information extraction Build Knowledge Query, act on knowledge base Enhance Knowledge 5
6 Heterogeneous Exascale Disaggregation Fat Nodes hadoop-style workloads... scale-out via network complex HPC-like workloads... scale-up via high-speed buses main metrics cost (capital, energy) compute density scalability node level (CPU / FPGA / NVMe plus compute) main metrics memory / accelerator / inter-node BW optimal mix of heterogeneous resources (CPU / GPU / FPGA / HBM / DRAM / NVMe) compute density, scalability heterogeniety within nodes data centric design 6
7 Outline OpenPOWER Foundation IBM Systems POWER8 Accelerators POWER9 7
8 OpenPOWER: Five Founding Members in
9 The OpenPOWER Foundation 230+ Members & Growing 9
10 OpenPOWER: Endorsing the Strategy 10
11 OpenPOWER: Going Global 11
12 OpenPOWER Software Support Standard compilers : GCC 4.8.5, MPICH 3.0.4, CUDA 8.0 AT9.0.3 compilers: GCC 5.3.1, Python 3.4, and more optimized for POWER AT compilers: GCC 6.2.1, Python 3.5, and more optimized for POWER Optimized libraries: MASS (math functions) ESSL (BLAS) and MPI 12
13 OpenPOWER Roadmap: IBM LC-line Mellanox Interconnect Technology Connect-IB FDR Infiniband PCIe Gen3 ConnectX-4 EDR Infiniband CAPI over PCIe Gen3 ConnectX-5 Next-Gen Infiniband Enhanced CAPI over PCIe Gen4 NVIDIA GPUs Kepler PCIe Gen3 Pascal NVLink Volta NVLink Next Gen IBM CPUs POWER8 OpenPower CAPI Interface POWER8 with NVLink Acceleration: NVLink 1.0, CAPI 1.0, PCIe Gen3 POWER9 Acceleration: CAPI 2.0, NVLink 2.0, opencapi 3.0, PCIe Gen IBM Nodes 13
14 IBM: The LC-line 14
15 Minsky: The System Architecture 15
16 S822LC for High Performance Computing (aka Minsky) 16
17 POWER8+ Processor Up to 12 cores (SMT8) 8 dispatch, 10 issue, 16 exec pipe 2 FXU, 2 LSU, 2 LU, 4 FPU, 2 VMX, 1 Crypto, 1 DFU, 1 CR, 1 BR 64K data cache, 32K instruction cache New NVlink for Minsky s 17
18 POWER8 Caches L2: 1 MB 8 way per core L3: 96 MB (12 x 8 MB 8 way Bank) L4: 128 MB (on Centaur) NUCA Cache policy (Non-Uniform Cache Architecture) Cache bandwidth 4 TB/sec L2 BW 3 TB/sec L3 BW 18
19 POWER8 Memory System POWER8 Processor 8 high speed channels, 230 GB/s sustained memory BW 32 total DDR ports yielding 410 GB/s peak at the DRAM 1 TB memory capacity per fully configured processor socket 19
20 Accelerator Interfaces: POWER8 20
21 CAPI... Coherent Accelerator Processor Interface Standard I/O Model Flow DD Call Copy/Pin MMIO Notify Accelerate Poll / Int Copy/Unpin Return DD Shared Mem. Notify Accelerator Flow with a Coherent Model Accelerate Shared Memory Completion CAPI FPGA CAPP PCIe POWER8 Processor POWER Service Layer AFU n AFU 2 AFU 1 AFU 0 21
22 Accelerator cards announced at OpenPOWER Summit in April Nallatech team explaining CAPI Flash card: 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com 22
23 Alpha Data FPGA CAPI ADM-PCIE-7V3 ADM-PCIE-KU3 ADM-PCIE-8K5 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com 23
24 Integrated CAPI Flash Form Factor & Attributes Standard PCI card, single wide Four M2 NVMe connectors for flash sticks Systems Supported Up to 4 cards per Tuleta L Up to 2 cards per Firestone LC Memory Up to 4 NMVe sticks 1TB ( 2 Supported for 1st GA) ( 2TB NVMe sticks in the future ) Sticks are features, MES adds & upgrades 4GB of on card DRAM Firmware / Hypervisor / OS Environments Same as SureLock Linux options, no AIX support 1) Ubuntu ( GA 8/26/16 ) 2) Redhat 7.3 ( GA 12/06/16) Performance (per NVMe card) M.2 NVMe Specifications (Samsung PM963) 1600 MB/s Sequential Read 1200 MB/s Sequential Write 380K Random Read IOPs 35K Random Write IOPs Card aggregation Applications controlled, can use multiple cards as one Database or Multiple Integrated Flash Configuration Power S822L / S812L / S822 LC 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com 24
25 Outline Accelerator IBM Research - Zurich CAPI attached accelerators Near-memory acceleration DSS Hyperscale FPGAs 25
26 Heterogeneous Exascale Fat Nodes complex HPC-like workloads... scale-up via high-speed buses main metrics memory / accelerator / inter-node BW optimal mix of heterogeneous resources (CPU / GPU / FPGA / HBM / DRAM / NVMe) compute density, scalability heterogeniety within nodes data centric design 26
27 Accelerated Fast Fourier Transformation Library FFTs are widely used in cognitive computing... Data preparation: spectral analysis, filter banks Data compression: MP3, JPEG ML: convolutional neural networks [1] HPC: partial differential equations, mathematical finance Common FFT Libraries (FFTW, ESSL, MKL, ) [1] Mathieu, Henaff, Lecun. Fast training of convolutional networks through FFTs. ICLR 14 27
28 FFTW on Heterogeneous Compute Nodes 28
29 Latency... for a single CAPI FFT call is 10% higher than CPU (can be improved as the AFU is bandwidth optimized) 4x better compared to a PCIe version using OpenCL CPU 80 Compute Copy FPGA using CAPI 89 FPGA using PCIe (OpenCL) 124 NVIDI K80 using cufft Runtime in micro seconds for one 4k-input complex FFT from cache 29
30 Performance & Energy Efficiency Test case: Compute 100 rounds of subsequent 4k-point FFTs in complex single precision float (1GB input samples per round) a) 1 core W = 0.21 GFLOP/W b) 12 cores 1) W = 0.31 GFLOP/W c) 12 cores 2) W = 0.12 GFLOP/W d) 1 AFU W = 3.37 GFLOP/W e) 1 GPU 3) W = 0.29 GFLOP/W 1) 12 threads, SMT1, DVFS off 2) 96 threads, SMT8, DVFS on 3) NVIDIA K40, CUDA-7.5 Result: One AFU is 2.2x faster and 16x more energy efficient compared to one core 30
31 Outline Near-memory acceleration DSS Hyperscale FPGAs 31
32 Integrating Near-data Processing in a (POWER) Server enabling near-data processing capabilities, while being minimally-invasive, in an existing CPU architecture ability to implement wide range of near-data processing functionality from optimized fixed-function hardware to a multiprocessor SOC dereferencing all virtual pointers of the host process on the NDP, coherent with the CPUs view of the memory 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com 32
33 Heterogeneous Nodes: POWER8 Accelerator Interfaces 33
34 Near-Memory Acceleration on ConTutto 34
35 Near-Memory Acceleration on ConTutto 35
36 Near-memory Acceleration big-data analytics, neural networks, cognitive computing, graph algorithms,... benefit from low latency, small access granularity, and large memories. memory performance and power depend on a complex interaction between workload and memory system locality of reference, access patterns/strides,... cache size, associativity, replacement policy,... bank interleaving, refresh, row buffer hits,... current systems use bare metal programming to adapt workload to memory system memory system should be programmable / adaptive must integrate programmable compute capabilities to achieve substantial performance & power gains for a wide range of workloads 36
37 Speedup Bytes used per bytes fetched from DRAM Boosting Irregular Applications: Graph500 Benchmark results obtained on a system-simulator capable of both functional verification and performance estimations was developed the Graph500 benchmark benefits from a low latency and small access granularity: NDP cores four times slower than the CPU cores outperform them for large problems the NDPs show much better bandwidth utilization due to the small access granularity 4 core CPU 8 core CPU 4 core NDP core NDP 1 4 core CPU (sec. axis) 4 core NDP (sec. axis) Graph500 scale 50% 40% 30% 20% 10% 0% 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com 37
38 Outline DSS 38
39 Dense Memory (remote access) Prototype Dense Memory integration software stack available byte addressable, distributed globally accessable DM resource exports industry standard asynchronous RDMA API for DM read and write access Implements efficient local and remote DM access zero copy local access via direct DMA device - application buffer zero copy remote access via IB RDMA remote host - application buffer Performance measurements local DM access at NVMe devices performance limits (3.5 GB/s read, 1.8 GB/s write of 4k buffers) remote DM access at network (100Gb/s InfiniBand) and device limits: 12.5 GB/s distributed DM random read with 4 storage nodes, all equipped with one NVMe SSD each close to 900k IOPs for single device short sequential red/write operations 39
40 Flexible DSS Configuration mix of local and shared resources multiple shared DM partitions possible 40
41 Dense Storage: Software Components 3 kernel modules dsa.ko, sal.ko, sal_blkdev.ko DSS GSL part of SAL 1 user library libdsa 1 user level demon dssd 41
42 Outline Hyperscale FPGAs 42
43 Heterogeneous Exascale Disaggregation hadoop-style workloads... scale-out via network main metrics cost (capital, energy) compute density scalability node level (CPU / FPGA / NVMe plus compute) 43
44 ZRL Dome mserver of Hyperscale DCs Cloud economics density (>1000 nodes / rack) integrated NICs switch card (backplane, no cables) medium to low-cost compute chips Passive liquid cooling ultimate density (cooling >70W / node) energy re-use Built to integrate heterogeneous resources CPUs Accelerators 44
45 HyperscaleFPGA: Network-attached Hyperscale Disaggregation of compute resources FPGAs can be deployed independent of: the # CPUs (respectively servers) the server form factor (which keep on shrinking) FPGAs can be provisioned / rented similar to other cloud compute, storage and network resources Scalability Users can build SDN fabrics of FPGAs in the cloud FPGAs are promoted to the rank of peer processor (end of slavery) HW-based FPGA-to-FPGA communication provides low latency and high-tput (RDMA NICs) 45
46 Reference Prototype: FPGA Compute Node FPGA Card Memory FPGA Management Layer (ML) User Logic (vfpga) KU060 FPGA w/ 16GB memory, 10GbE, PCIe extension, board management controller The inic enables the FPGA to hook itself to the network and to communicate with other DC resources, such as servers, disks, I/O and other FPGA appliances inic Network Service Layer (NSL) Data Center Network 46 4/23/2017 IBM Research - Zurich Lab, hle@zurich.ibm.com
47 But be willing take incremental steps when you can! IBM Research - Zurich Lab, hle@zurich.ibm.com 47
OpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,
IWOPH workshop, ISC, Germany June 21, 2017 OpenPOWER Innovations for HPC IBM Research Christoph Hagleitner, hle@zurich.ibm.com IBM Research - Zurich Lab IBM Research - Zurich Established in 1956 45+ different
More informationHeterogeneous Computing Systems in Cloud Datacenters
FPL 2016 Lausanne, August 31 Heterogeneous Computing Systems in Cloud Datacenters Christoph Hagleitner, hle@zurich.ibm.com IBM Research - Zurich Lab IBM Research Zurich Lab (ZRL) Established in 1956 Two
More informationPower Technology For a Smarter Future
2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationOpen Innovation with Power8
2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Open Innovation with Power8 Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation 2013
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationIBM Power Advanced Compute (AC) AC922 Server
IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationEnergy Efficient Transparent Library Accelera4on with CAPI Heiner Giefers IBM Research Zurich
Energy Efficient Transparent Library Accelera4on with CAPI Heiner Giefers IBM Research Zurich Revolu'onizing the Datacenter Datacenter Join the Conversa'on #OpenPOWERSummit Towards highly efficient data
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationUniversité IBM i 2017
Université IBM i 2017 17 et 18 mai IBM Client Center de Bois-Colombes S24 Architecture IBM POWER: tendances et stratégies Jeudi 18 mai 11h00-12h30 Jean-Luc Bonhommet IBM AGENDA IBM Power Systems - IBM
More informationOpenPOWER Performance
OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationPOWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist
POWER9 Announcement Martin Bušek IBM Server Solution Sales Specialist Announce Performance Launch GA 2/13 2/27 3/19 3/20 POWER9 is here!!! The new POWER9 processor ~1TB/s 1 st chip with PCIe4 4GHZ 2x Core
More informationInterconnect Your Future
#OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox
More informationPOWER8 for DB2 and SAP
July 2014 POWER8 for DB2 and SAP Walter Orb IBM SAP Competence Center, Walldorf Agenda OpenPOWER Foundation POWER8 POWER8 for SAP POWER8 for DB2 2 Important Disclaimer IBM s statements regarding its plans,
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationIBM Power Systems HPC Cluster
IBM Power Systems HPC Cluster Highlights Complete and fully Integrated HPC cluster for demanding workloads Modular and Extensible: match components & configurations to meet demands Integrated: racked &
More informationCloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera
Cloud Acceleration with FPGA s Mike Strickland, Director, Computer & Storage BU, Altera Agenda Mission Alignment & Data Center Trends OpenCL and Algorithm Acceleration Networking Acceleration Data Access
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationEnabling FPGAs in Hyperscale Data Centers
J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical
More informationRevolutionizing Data-Centric Transformation
2016 OpenPOWER Foundation Revolutionizing Data-Centric Transformation April 2016 Sumit Gupta Vice President, High Performance Computing and Analytics IBM Power Systems OpenPOWER: Catalyst for Open Innovation
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationE4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU
E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationPOWER9. Jeff Stuecheli POWER Systems, IBM Systems IBM Corporation
POWER9 Jeff Stuecheli POWER Systems, IM Systems 2018 IM Corporation Recent and Future POWER Processor Roadmap POWER7 45 nm 2010 POWER7+ 32 nm 2012 POWER8 Family 22nm 2014 2016 POWER9 Family 14nm 2H17 2H18+
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationJeff Stuecheli, PhD IBM Power Systems IBM Systems & Technology Group Development International Business Machines Corporation 1
Jeff Stuecheli, PhD IBM Power Systems IBM Systems & Technology Group Development 2013 International Business Machines Corporation 1 POWER5 2004 POWER6 2007 POWER7 2010 POWER7+ 2012 Technology 130nm SOI
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationRevolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales
Revolutionizing Open Cecilia Carniel IBM Power Systems Scale Out sales cecilia_carniel@it.ibm.com Copyright IBM Corporation 2015 Technical University/Symposia materials may not be reproduced in whole or
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationOpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit
OpenCAPI Technology Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI Topics Computation
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationCapturing value from an open ecosystem
Capturing value from an open ecosystem Tom Rosamilia Senior Vice President IBM Systems Forward-Looking Statement Certain comments made during this event and in the presentation materials may be characterized
More informationOpenCAPI and its Roadmap
OpenCAPI and its Roadmap Myron Slota, President OpenCAPI Speaker name, Consortium Title Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI and
More informationHow to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1
How to Network Flash Storage Efficiently at Hyperscale Manoj Wadekar Michael Kagan Flash Memory Summit 2017 Santa Clara, CA 1 ebay Hyper scale Infrastructure Search Front-End & Product Hadoop Object Store
More informationIBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland
IBM Power Systems Update David Spurway IBM Power Systems Product Manager STG, UK and Ireland Would you like to go fast? Go faster - win your race Doing More LESS With Power 8 POWER8 is the fastest around
More informationSolros: A Data-Centric Operating System Architecture for Heterogeneous Computing
Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, ebay, Georgia
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationWhen MPPDB Meets GPU:
When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU
More informationLooking ahead with IBM i. 10+ year roadmap
Looking ahead with IBM i 10+ year roadmap 1 Enterprises Trust IBM Power 80 of Fortune 100 have IBM Power Systems The top 10 banking firms have IBM Power Systems 9 of top 10 insurance companies have IBM
More informationIBM POWER9 Server Update
IBM POWER9 Server Update Luc Cloutier Advisory I/T Specialist, Power Server luc@ca.ibm.com Charts by: Simon Porstendorfer Principal Offering Manager Cognitive Systems Dylan Boday, Ph.D. Offering Manager,
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationPower 7. Dan Christiani Kyle Wieschowski
Power 7 Dan Christiani Kyle Wieschowski History 1980-2000 1980 RISC Prototype 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) 1993 IBM launches 66MHz POWER2 (.35 um) 1997 POWER2 Super
More informationIBM Power User Group - Atlanta
IBM Power User Group - Atlanta Wes Showfety Open Source Database & HPC strategist, North America showfety@us.ibm.com 770-617-7377 LinkedIn: https://www.linkedin.com/in/wes-showfety-2399444 Twitter: @Wes_Show
More informationIndustry Collaboration and Innovation
Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationIBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems
IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems 2014 IBM Corporation Powerful Forces are Changing the Way Business Gets Done Data growing exponentially
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationLow-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.
Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc. 1 DISCLAIMER This presentation and/or accompanying oral statements by Samsung
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationSOFTWARE-DEFINED BLOCK STORAGE FOR HYPERSCALE APPLICATIONS
SOFTWARE-DEFINED BLOCK STORAGE FOR HYPERSCALE APPLICATIONS SCALE-OUT SERVER SAN WITH DISTRIBUTED NVME, POWERED BY HIGH-PERFORMANCE NETWORK TECHNOLOGY INTRODUCTION The evolution in data-centric applications,
More informationGPU-centric communication for improved efficiency
GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop
More informationCAPI SNAP framework, the tool for C/C++ programmers to accelerate by a 2 digit factor using FPGA technology
CAPI SNAP framework, the tool for C/C++ programmers to accelerate by a 2 digit factor using FPGA technology Bruno MESNET, Power CAPI Enablement IBM Power Systems Join the Conversation #OpenPOWERSummit
More informationIBM Power Systems: Open Innovation to put data to work. Juan López-Vidriero Mata Director técnico de ventas de servidores
IBM Power Systems: Open Innovation to put data to work Juan López-Vidriero Mata Director técnico de ventas de servidores Openpower Power vs Intel Strength of IBM Vertical Stack: What is it? From Semiconductors
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationA 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures
A 101 Guide to Heterogeneous, Accelerated, Centric Computing Architectures Allan Cantle President & Founder, Nallatech Join the Conversation #OpenPOWERSummit 2016 OpenPOWER Foundation Buzzword & Acronym
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationLow-Overhead Flash Disaggregation via NVMe-over-Fabrics
Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc. August 2017 1 DISCLAIMER This presentation and/or accompanying oral statements
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationLinuxCon Japan 2014 OpenPOWER Technical Overview. Jeff Scheel Chief Engineer Linux on Power May 21, IBM Corporation
LinuxCon Japan 2014 OpenPOWER Technical Overview Jeff Scheel Chief Engineer Linux on Power scheel@us.ibm.com May 21, 2014 Agenda 1. OpenPOWER Foundation Overview 2. OpenPOWER Hardware Technologies 3. OpenPOWER
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationIN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures
IN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures A Future Accelerated Cognitive Distributed Hybrid Testbed for Big Data Science Analytics Milton Halem 1, John Edward
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationOpenFOAM Performance Testing and Profiling. October 2017
OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC
More informationOptimizing Efficiency of Deep Learning Workloads through GPU Virtualization
Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization Presenters: Tim Kaldewey Performance Architect, Watson Group Michael Gschwind Chief Engineer ML & DL, Systems Group David K.
More informationNew Interconnnects. Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel
New Interconnnects Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel CCIX: Seamless Data Movement for Accelerated Applications TM Millind Mittal
More informationIBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE
IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE Choosing IT infrastructure is a crucial decision, and the right choice will position your organization for success. IBM Power Systems provides an innovative platform
More informationIndustry Collaboration and Innovation
Industry Collaboration and Innovation OpenCAPI Topics Industry Background Technology Overview Design Enablement OpenCAPI Consortium Industry Landscape Key changes occurring in our industry Historical microprocessor
More informationCPMD Performance Benchmark and Profiling. February 2014
CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationAcceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?
Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga
More informationFacilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit
Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM Join the Conversation #OpenPOWERSummit Moral of the Story OpenPOWER is the best platform to
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationgenzconsortium.org Gen-Z Technology: Enabling Memory Centric Architecture
Gen-Z Technology: Enabling Memory Centric Architecture Why Gen-Z? Gen-Z Consortium 2017 2 Why Gen-Z? Gen-Z Consortium 2017 3 Why Gen-Z? Businesses Need to Monetize Data Big Data AI Machine Learning Deep
More informationAccelerating Data Centers Using NVMe and CUDA
Accelerating Data Centers Using NVMe and CUDA Stephen Bates, PhD Technical Director, CSTO, PMC-Sierra Santa Clara, CA 1 Project Donard @ PMC-Sierra Donard is a PMC CTO project that leverages NVM Express
More informationAnnual Update on Flash Memory for Non-Technologists
Annual Update on Flash Memory for Non-Technologists Jay Kramer, Network Storage Advisors & George Crump, Storage Switzerland August 2017 1 Memory / Storage Hierarchy Flash Memory Summit 2017 2 NAND Flash
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationPower Systems with POWER8 Scale-out Technical Sales Skills V1
Power Systems with POWER8 Scale-out Technical Sales Skills V1 1. An ISV develops Linux based applications in their heterogeneous environment consisting of both IBM Power Systems and x86 servers. They are
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationOPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications
OPERA Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications Co-funded by the Horizon 2020 Framework Programme of the
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More information