Building the Most Efficient Machine Learning System
|
|
- Ambrose Marsh
- 5 years ago
- Views:
Transcription
1 Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017
2 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide Offices ~2,900 Employees worldwide Ticker: MLNX 2
3 Exponential Data Growth Everywhere Higher Data Speeds SmartNIC System on a Chip Faster Better Data Processing Data Security Adapters Switches Cables & Transceivers 3
4 Enabling the Future of Machine Learning Applications IoT Storage Self-Driving Vehicles Database Embedded Appliances High Performance Computing Machine Learning Healthcare Financial Hyperscale Retail Manufacturing HPC and Machine Learning Share Same Interconnect Needs 4
5 Highest Performance 100 and 200Gb/s Interconnect Solutions Adapters 200Gb/s, 0.6us Latency 200 Million Messages per Second (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) Switch 40 HDR (200Gb/s) Ports 80 HDR100 (100Gb/s) Ports 16Tb/s Throughput, 15.6 Billion msg/sec Switch GbE Ports, 64 25/50GbE Ports (10 / 25 / 40 / 50 / 100GbE) Throughput of 6.4Tb/s Interconnect Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) Today s Datacenters Need the Most Intelligent Interconnect 5
6 Mellanox Delivers Best Return on Investment 60% Higher Return on Investment Up to 50% Savings on Capital and Operation Expenses World s Highest Performance, Scalability and Productivity for Deep Learning Cognitive Toolkit Chainer Mellanox Unlocks the Power of AI 6
7 Mellanox is Leading Artificial Intelligence (AI) Advancing Technology to Affect Science, Business, and Society By Enabling Critical and Timely Decision Making Health Care, Business Integrity, Business Intelligence Knowledge Discovery, Security, Customer Support and more More Data Better Models Faster Interconnect GPUs CPUs FPGAs Storage More Data Faster Interconnect Better Insight Competitive Advantage 7
8 Enabling Most Efficient Machine Learning Platforms (Examples) Highest Performance, Scalability and Productivity for Deep Learning 8
9 Mellanox Accelerates Machine Learning and Big Data World s First PCIe Gen 4 Public Cloud Server for Cognitive Computing Sets TeraSort 2016 Benchmark Record 5x Faster, 3x Energy Efficient than 2015 Record Smart Network for Azure Cloud Server Designed for Big Data Analytics & AI Enabling Analytics in Cloud 9
10 Mellanox Accelerates Machine Learning and Big Data Big Sur & Big Basin Facebook Open Source AI Hardware Platform Only ONE Network of Choice - Mellanox Caffe2 Powering Self Driving Car 2X Faster Training with Paddle Paddle We rely on fast interconnect technologies and RDMA. Andrew Ng, Chief Scientist, Baidu Caffe Real Time Fraud Detection 14 Million Transactions per Day 4 Billion Database Inserts Image Recognition ~90% Prediction Accuracy RDMA in Tensorflow and Caffe 10
11 AI is Changing the Way We Interact with Computers Automotive and Transportation Security and Public Safety Consumer Web, Mobile, Retail Medicine and Biology Broadcast, Media and Entertainment Finance, Fraud and Insurance Autonomous driving Surveillance Image tagging Drug discovery Captioning Real Time Trade Pedestrian detection Image analysis Speech Diagnostic Search Credit / Risk Accident avoidance Facial recognition recognition assistance Recommendations Analysis and detection Natural language Cancer cell Real time Fraud Detection processing detection translation and Prevention Recommendation and sentiment analysis Efficient Deep Learning Depends on Mellanox 11
12 Deep Learning Demands Highest Performance TRAINING Scalability requires ultra-fast networking Same hardware needs as HPC Images Video Text TRAINING DATASET Billions of TFLOPS Faster access to storage RDMA SHARP PeerDirect, GPUDirect, ROCm, others Speech Tabular INFERENCING Time Series Highly transactional / supports many users Mellanox ultra-low latency Instant network response NEW DATA Billions of FLOPS RDMA PeerDirect, GPUDirect, ROCm, others 12
13 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates Performance Bottlenecks Analyze Data as it Moves! Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale 13
14 Data Centric Architecture to Overcome Latency Bottlenecks CPU-Centric (Onload) Data-Centric (Offload) Network In-Network Computing HPC / Machine Learning Communications Latencies of 30-40us HPC / Machine Learning Communications Latencies of 3-4us Intelligent Interconnect Paves the Road to Exascale Performance 14
15 Mellanox Technology Accelerations for Machine Learning CPU GPU GPU RDMA GPUDirect SHARP CPU CPU NVMe over Fabrics Security GPU GPU CPU In-Network Computing Key for Highest Return on Investment 15
16 In-Network Computing Enables Deep Learning Frameworks Middleware (MPI, grpc) - Optional CUDA SHARP rcuda GPUDirect RDMA NVMe over Fabrics Mellanox Interconnect Solutions Mellanox Accelerations for Machine Learning and Big Data 16
17 Mellanox SHARP for Gradient Computation CPU in a parameter server becomes the bottleneck quickly (roughly 4 nodes) TCP adds a lot of overhead and the traffic pattern is bursty SHARP performs the gradient averaging Removes the need for physical parameter server Removes all parameter server overhead SHARP Provides Better Scalability and Reduced Network Traffic 17
18 PeerDirect, GPUDirect RDMA and ASYNC Purpose-built for Acceleration of Deep Learning 18
19 What is GPUDirect Provides significant decrease in communication latency for acceleration devices Natively supported by Mellanox OFED Supports peer-to-peer communications between Mellanox adapters and third-party devices No unnecessary system memory copies & CPU overhead Enables GPUDirect RDMA, GPUDirect ASYNC, ROCm and others InfiniBand and Ethernet CPU CPU Chip set Chip set Vendor Device Chipset Chipset Vendor Device Designed for Deep Learning Acceleration 19
20 GPUDirect RDMA and GPUDirect ASYNC Direct Connectivity GPU - Interconnect 20
21 Higher is Better GPUDirect RDMA Performance GPU-GPU Internode Latency GPU-GPU Internode Bandwidth 10x 9.3X Lower is Better 2.18 usec Source: Prof. DK Panda 9.3X Better Latency 10X Better Throughput 21
22 NVIDIA NCCL 2.0 Near-Linear Scalability Optimized collective communication library Allreduce, Reduce, Broadcast, Reduce-scatter, Allgather Inter-node communication using InfiniBand verbs and GPUDirect RDMA Multi-rail support, Topology detection 50% performance improvement with NVIDIA DGX-1 across 32 NVIDIA Tesla V100 GPUs NVIDIA Accelerates Scalable Deep Learning with Mellanox 22
23 Performance and Scalability Examples 23
24 TensorFlow with Mellanox RDMA Reference Deployment Guide Unmatched Linear Scalability, No Additional Cost Up to 76% Efficiency and 50% Better Performance versus TCP 24
25 Accelerating TensorFlow with grpc over RDMA Open source Machine Learning from Google Distributed training with grpc framework Google s Optimized RPC for distributed network RDMA Acceleration over UCX Unified Communication X (UCX) Integration with upstream TensorFlow >2x Faster Lower is better 2X higher Performance with RDMA ~2X Acceleration for TensorFlow with RDMA 25
26 TensorFlow over RDMA in Apache Spark Environment Yahoo enhanced the TensorFlow C++ layer to enable RDMA over InfiniBand InfiniBand provides faster connectivity and supports accelerated offload capability InfiniBand Provides Near Linear Scalability for Inception Model Training Source: 26
27 2X Acceleration for Baidu Machine Learning Software from Baidu Usage: word prediction, translation, image processing RDMA (GPUDirect) speeds training Lowers latency, increases throughput More cores for training Even better results with optimized RDMA ~2X Acceleration for Paddle Training with RDMA 27
28 ChainerMN Depends on InfiniBand ChainerMN depends on MPI for inter-node communication NVIDIA NCCL library is then used for intra-node communication between GPUs Leveraging InfiniBand results in near linear performance Mellanox InfiniBand allows ChainerMN to achieve ~72% accuracy. Source: 28
29 Machine Learning Performance Comparison 60.3% 32 Accelerators 16 Accelerators DeepBench measures the performance of basic operations involved in training deep neural networks. Lower is Better 8 Accelerators InfiniBand Delivers 60% Better Performance with 2X Less Infrastructure 29
30 A Few Solution Examples Scalable Deep Learning Depends on Mellanox 30
31 NVIDIA DGX-1 World s first purpose-built system for deep learning SaturnV is #28 on the Top500, 3.3Pf with 124 nodes SaturnV is also #1 on the Green500 Fully integrated hardware 8x Tesla P100 (Pascal) w/16gb per GPU CUDA Cores 4x ConnectX-4 EDR 100Gb/s HCAs Fully integrated software stack Major deep learning frameworks Drivers, NVIDIA CUDA, NVIDIA Deep Learning SDK GPUDirect RDMA 31
32 NVIDIA DGX-1 Deep Learning Server Deep Learning Supercomputer in a Box NVIDIA SaturnV NVIDIA Machine Learning Supercomputer #28 on the Top Pf with 124 DGX-1 nodes #1 on the Green500 8 x NVIDIA Tesla P100 GPUs 4 x ConnectX -4 EDR 100G InfiniBand Adapters 5.3TFlops 16nm FinFET NVLINK 32
33 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and FPGA-based Compute and Storage Platforms X86 Open POWER GPU ARM FPGA Smart Interconnect to Unleash The Power of All Compute Architectures 33
34 Proven Advantages RDMA delivers 2X performance advantage over traditional TCP Machine Learning and HPC platforms share the same interconnect needs Scalable, flexible, high performance, high bandwidth, end-to-end connectivity Standards-based and supported by the largest eco-system Supports all compute architectures: x86, Power, ARM, GPU, FPGA etc. Native Offloading architecture RDMA, GPUDirect, SHARP and other core accelerations Backward and future compatible Scalable Machine Learning Depends on Mellanox 34
35 Thank You
Building the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationInterconnect Your Future
Interconnect Your Future Paving the Road to Exascale August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data
More informationInterconnect Your Future
Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer
More informationInterconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC
More informationIn-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017
In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric
More informationCorporate Update. Enabling The Use of Data January Mellanox Technologies
Corporate Update Enabling The Use of Data January 2018 Safe Harbor Statement These slides and the accompanying oral presentation contain forward-looking statements and information. The use of words such
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationIn-Network Computing. Paving the Road to Exascale. June 2017
In-Network Computing Paving the Road to Exascale June 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect -Centric (Onload) Data-Centric (Offload) Must Wait for the Data Creates
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationSolutions for Scalable HPC
Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End
More informationHigh Performance Computing
High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationIn-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017
In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationInterconnect Your Future Paving the Road to Exascale
Interconnect Your Future Paving the Road to Exascale CHPC, December 2017 90% of the World Data was Created in the Last 2 Years 2017 Mellanox Technologies 2 The Future Depends on Fastest Interconnects 1Gb/s
More informationSUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016
SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering
More informationPerformance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center
More informationVPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its
More informationVPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its
More informationInterconnect Your Future
#OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationEthernet. High-Performance Ethernet Adapter Cards
High-Performance Ethernet Adapter Cards Supporting Virtualization, Overlay Networks, CPU Offloads and RDMA over Converged Ethernet (RoCE), and Enabling Data Center Efficiency and Scalability Ethernet Mellanox
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationHigh-Performance Training for Deep Learning and Computer Vision HPC
High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationIBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads
IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads The engine to power your AI data pipeline Introduction: Artificial intelligence (AI) including deep learning (DL) and machine
More informationEfficient Communication Library for Large-Scale Deep Learning
IBM Research AI Efficient Communication Library for Large-Scale Deep Learning Mar 26, 2018 Minsik Cho (minsikcho@us.ibm.com) Deep Learning changing Our Life Automotive/transportation Security/public safety
More informationThe Tesla Accelerated Computing Platform
The Tesla Accelerated Computing Platform Axel Koehler, Principal Solution Architect HPC Advisory Council Meeting Lugano 22 March 2016 Introduction TESLA Platform for HPC Agenda TESLA Platform for HYPERSCALE
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationIn partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER
In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationWhy AI Frameworks Need (not only) RDMA?
Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationEfficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, and Dhabaleswar K. Panda Network-Based Computing Laboratory Department
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationPaving the Road to Exascale Computing. Yossi Avni
Paving the Road to Exascale Computing Yossi Avni HPC@mellanox.com Connectivity Solutions for Efficient Computing Enterprise HPC High-end HPC HPC Clouds ICs Mellanox Interconnect Networking Solutions Adapter
More informationCharacterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures
Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationIBM Power Advanced Compute (AC) AC922 Server
IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A
More informationSigns of Intelligent Life: AI Simplifies IoT
Signs of Intelligent Life: AI Simplifies IoT JEDEC Mobile & IOT Forum Stephen Lum Samsung Semiconductor, Inc. Copyright 2018 APPLICATIONS DRIVE CHANGES IN ARCHITECTURES x86 Processors Apps Processors FPGA
More informationDDN About Us Solving Large Enterprise and Web Scale Challenges
1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationDGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo
DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable
More informationARISTA: Improving Application Performance While Reducing Complexity
ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationINTRODUCING THE DGX FAMILY. Marc Domenech May 8, 2017
INTRODUCING THE DGX FAMILY Marc Domenech May 8, 2017 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering AI computing
More informationBroadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence
TM Artificial Intelligence Server for Fraud Date: Q2 2017 Application: Artificial Intelligence Tags: Artificial intelligence, GPU, GTX 1080 TI HM Revenue & Customs The UK s tax, payments and customs authority
More informationIBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX
IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency
More informationExploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters
Exploiting InfiniBand and Direct Technology for High Performance Collectives on Clusters Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth
More informationOCP3. 0. ConnectX Ethernet Adapter Cards for OCP Spec 3.0
OCP3. 0 ConnectX Ethernet Adapter Cards for OCP Spec 3.0 High Performance 10/25/40/50/100/200 GbE Ethernet Adapter Cards in the Open Compute Project Spec 3.0 Form Factor For illustration only. Actual products
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationPOWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017
POWERING THE AI REVOLUTION JENSEN HUANG, FOUNDER & CEO GTC 2017 LIFE AFTER MOORE S LAW 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 Transistors (thousands) 1.1X per year 10 4 10 3 1.5X per year
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationIs your IT Infrastructure Ready for Machine Learning & Artificial Intelligence?
BRKPAR-2955 Is your IT Infrastructure Ready for Machine Learning & Artificial Intelligence? Hoseb Dermanilian, EMEA BDM, NetApp Arnaud BASSALER, CSE, Cisco Systems Agenda Introduction AI, Machine Learning
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationOpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,
IWOPH workshop, ISC, Germany June 21, 2017 OpenPOWER Innovations for HPC IBM Research Christoph Hagleitner, hle@zurich.ibm.com IBM Research - Zurich Lab IBM Research - Zurich Established in 1956 45+ different
More informationMellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions
Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at
More informationBy John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.
A quiet revolutation is taking place in networking speeds for servers and storage, one that is converting 1Gb and 10 Gb connections to 25Gb, 50Gb and 100 Gb connections in order to support faster servers,
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio
More informationUnlock business value with HPC & Artificial Intelligence. FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager
Unlock business value with HPC & Artificial Intelligence FORUM TERATEC June 19, 2018 José RODRIGUES HPC Sales Manager Investment in HPC delivers compelling financial returns Financial Services Oil and
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationCisco UCS C480 ML M5 Rack Server Performance Characterization
White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco
More informationPredicting Service Outage Using Machine Learning Techniques. HPE Innovation Center
Predicting Service Outage Using Machine Learning Techniques HPE Innovation Center HPE Innovation Center - Our AI Expertise Sense Learn Comprehend Act Computer Vision Machine Learning Natural Language Processing
More informationEfficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning
Efficient and Scalable Multi-Source Streaming Broadcast on Clusters for Deep Learning Ching-Hsiang Chu 1, Xiaoyi Lu 1, Ammar A. Awan 1, Hari Subramoni 1, Jahanzeb Hashmi 1, Bracy Elton 2 and Dhabaleswar
More informationTACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE
TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE Nicola Rieke, Senior Deep Learning Solution Architect Healthcare EMEA Fausto Milletari, Senior Deep Learning Solution Architect Healthcare NALA INTRODUCTION
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More informationNVIDIA PLATFORM FOR AI
NVIDIA PLATFORM FOR AI João Paulo Navarro, Solutions Architect - Linkedin i am ai HTTPS://WWW.YOUTUBE.COM/WATCH?V=GIZ7KYRWZGQ 2 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 3 GPU COMPUTING
More informationChoosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710
COMPETITIVE BRIEF April 5 Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL7 Introduction: How to Choose a Network Interface Card... Comparison: Mellanox ConnectX
More informationСетевые технологии для систем хранения данных
Сетевые технологии для систем хранения данных Nov, 2018 Boris Neiman Sr. System Engineer 1 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide Offices ~2,900 Employees
More informationNVMe over Universal RDMA Fabrics
NVMe over Universal RDMA Fabrics Build a Flexible Scale-Out NVMe Fabric with Concurrent RoCE and iwarp Acceleration Broad spectrum Ethernet connectivity Universal RDMA NVMe Direct End-to-end solutions
More informationVoltaire Making Applications Run Faster
Voltaire Making Applications Run Faster Asaf Somekh Director, Marketing Voltaire, Inc. Agenda HPC Trends InfiniBand Voltaire Grid Backbone Deployment examples About Voltaire HPC Trends Clusters are the
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationLAMMPSCUDA GPU Performance. April 2011
LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council
More informationInfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014
InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance
More informationExploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR
Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationArchitectures for Scalable Media Object Search
Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationSupport for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth
Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU
More informationData-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP
Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Devices / edge network Cloud/data center Removing data Bottlenecks with Fpga acceleration
More informationAIRI SCALE-OUT AI-READY INFRASTRUCTURE ARCHITECTED BY PURE STORAGE AND NVIDIA WITH ARISTA 7060X SWITCH REFERENCE ARCHITECTURE
REFERENCE ARCHITECTURE AIRI SCALE-OUT AI-READY INFRASTRUCTURE ARCHITECTED BY PURE STORAGE AND NVIDIA WITH ARISTA 7060X SWITCH TABLE OF CONTENTS INTRODUCTION... 3 Accelerating Computation: NVIDIA DGX-1...
More information