S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
|
|
- Deirdre Parks
- 5 years ago
- Views:
Transcription
1 S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM
2 Evolving from compute systems to Cognitive Systems Dev Ecosystem Industry Alignment Partnerships Open Frameworks IBM Software P8 P9 P10 Not Just About Hardware Design It s about co-optimized software + hardware Open Accelerator Interfaces Accelerator Roadmaps which just work for ML, DL, and AI NDA until product announce 2
3 AI Infrastructure Stack Applications Segment Specific: Finance, Retail, Healthcare Cognitive APIs (Eg: Watson) In-House APIs Speech, Vision, NLP, Sentiment Machine & Deep Learning Libraries & Frameworks Distributed Computing TensorFlow, Caffe, SparkML Spark, MPI PowerAI Transform & Prep Data (ETL) Data Lake & Data Stores Hadoop HDFS, NoSQL DBs Accelerated Servers Storage Accelerated Infrastructure Think 2018 / DOC ID / Month XX, 2018 / 2018 IBM Corporation 3
4 PowerAI Integrated & Supported AI Platform Higher Productivity for Data Scientists Enable non-data Scientists to use AI Developer Ease-of-Use Tools Open Source Frameworks: Supported Distribution Faster Training Times via HW & SW Performance Optimizations 4
5 In IBM Cloud Available early April 2018 Delivered via IBM Cloud Catalog Billed through IBM Cloud Supported by IBM and Nimbix PowerAI Version 5.0 Native Distributed Deep Learning On-Demand Cloud Provisioning Delivered on CentOS Linux 5
6 PowerAI in IBM Cloud Ease of Use & Performance Developer Ease-of-Use Tools Open Source Frameworks: Supported Distribution PowerAI in IBM Cloud ü ü ü Leadership price performance Highly scalable Distributed Deep Learning Large Model Support ü Containerized and extensible Faster Training Times via HW & SW Performance Optimizations ü Turn-key solution ü Powered by trusted partner Nimbix ü PowerAI Version 5.0 6
7 IBM s Latest Processor: POWER9 POWER9 Family 14nm POWER8 Family 22nm POWER7 45 nm Enterprise - 8 Cores - SMT4 - edram L3 Cache 1H10 POWER7+ 32 nm Enterprise & Big Data Optimized Enterprise - Up to 12 Cores - 2.5x Larger L3 cache - On-die acceleration - Zero-power core idle state 2H12 - SMT8 - CAPI Acceleration - High Bandwidth Attach 1H14 2H16 Built for the Cognitive Era Only processor with NVLink, PCIe Gen 4 advanced IO interfaces and coherence Premier Platform for Accelerated Computing Processor Family with Scale-Up and Scale-Out Optimized Silicon 2H17 2H18+ 7
8 Systems Designed for AI POWER9 High-Speed System Memory OpenCAPI NVLink 2.0 PCIe Gen4 Fast & Large Memory System High Performance Cores Fastest Accelerator Interconnects 4-5X Memory Bandwidth 2x More Memory vs Intel 4X Threads per Core vs. Intel OpenCAPI / NVLink x vs. Intel
9 5x Faster Data Communication with Unique CPU- NVLink High-Speed Connection Store Large Models in System Memory 1 TB Memory 1 TB Memory 170GB/s 170GB/s Fast Transfer via NVLink Operate on One Layer at a Time NVLink 150 GB/s V100 Power 9 CPU V100 V100 Power 9 CPU V100 NVLink 150 GB/s IBM AC922 Power System Deep Learning Server (4- Config) Think 2018 / DOC ID / Month XX, 2018 / 2018 IBM Corporation 9
10 10 Accelerator Comparison POWER8 with NVLink 1.0 Pascal Technology POWER9 with NVLink 2.0 Volta Technology DDR4 P8 CPU DDR4 P9 CPU 80 GB/s NVLink NVLink 80 GB/s NVLink 150 GB/s NVLink 150 GB/s NVLink NVLink 80 GB/s 150 GB/s Graphics Memory Graphics Memory Graphics Memory Graphics Memory ü 2 Bricks per NVLink ü Duplex bandwidth ü 3 Bricks per NVLink ü Duplex bandwidth POWER9 with NVLink 2.0 delivers 87.5% increased bandwidth over POWER8
11 IBM POWER System Roadmap POWER S822LC Power Systems introduction of design and cost optimized Linux only servers 2X x86 memory bandwidth Dedicated technical compute versions for acceleration 2015 POWER S822LC for HPC Introduced the first processor with NVLink from CPU 2x memory bandwidth Air and water cooled versions 2016 AC922 Remains the only processor with NVLink (now 2nd Gen) and introduces PCIe Gen 4, Coherence for near direct access to system memory (2TB) Co-optimization with deep learning frameworks 2017 Higher NVLink through-put Significant advancements on memory bandwidths New memory architectures More dense accelerated compute options Future Unmatched track record of innovation delivery A portfolio to invest in 11
12 IBM Power System AC922 Realize unprecedented performance and application gains with POWER9 and NVLink POWER9 CPUs and up to 4 Volta NVLink 2.0 s in a versatile 2U Linux server PCIe Gen4 bus has double I/O Bandwidth vs. PCIe Gen3 CPU (Turbo)/ (Boost) enabled for improved data center efficiency and performance to be maintained at high levels High level System Overview 2-Socket, 2U Packaging 40 P9 Processor cores 4 NVIDIA Volta 2.0 s 1 TB Memory (16x - 64GB DIMMs) 4 PCIe Gen4 Slots 2x SFF (HDD/SSD), SATA, Up to 7.7 TB storage Supports 1.6TB and 3.2TB NVMe Adapters Redundant Hot Swap Power Supplies and Fans Default 3 year 9x5 warranty, 100% CRU 12
13 AC922 Configurations 4 s - Air (4Q 17)/Water Cooled (2Q 18) 6 s - Water Cooled (2Q 18) Up to 4 s, air/water cooled options 150GB/s of bandwidth from CPU- Coherent access to system memory PCIe Gen 4 and CAPI 2.0 to InfiniBand Water cooled options available in 2Q 18 Up to 6 s, water cooled only 100 GB/s of bandwidth from CPU- 13
14 Large Model Support (LMS) Traditional Model Support Large Model Support Limited memory on forces trade-off in model size / data resolution Use system memory and to support more complex and higher resolution data DDR4 CPU DDR4 NVLink PCIe Graphics Memory POWER CPU Graphics Memory Leveraging NVLink and coherence enables larger and more complex models Improves model accuracy with more images and higher resolution images NDA until product announce 14
15 Caffe with LMS (Large Model Support) Runtime of 1000 Iterations Large AI Models Train ~4 Times Faster Hours 3.8x Faster POWER9 Servers with NVLink to s vs x86 Servers with PCIe to s Time (secs) Mins 0 Xeon x v4 w/ 4x V100 s Power AC922 w/ 4x V100 s Think 2018 / DOC ID / Month XX, 2018 / 2018 IBM Corporation GoogleNet model on Enlarged ImageNet Dataset (2240x2240) 15
16 Distributed Deep Learning (DDL) Deep learning training takes days to weeks Limited scaling to multiple x86 servers 16 Days Down to 7 Hours 58x Faster 16 Days Near Ideal Scaling to 256 s Speedup Ideal Scaling DDL Actual Scaling 95%Scaling with 256 S 4 PowerAI with DDL enables 7 Hours scaling to 100s of servers 1 System 64 Systems Number of s ResNet-101, ImageNet-22K ResNet-50, ImageNet-1K Think 2018 / DOC ID / Month XX, 2018 / 2018 IBM Corporation Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System 16
17 InfiniBand EDR 100Gb/s PCIe Gen 4 verses PCIe Gen The PCIe Gen 4 Difference (Gen3 to Gen4 EDR InfiniBand bandwidth test) 400 AVERAGE GBITS/S Gen4 Dual Port Bidirectional, Gb/s Gen3 Dual Port Bidirectional, Gb/s MESSAGE SIZE BYTES ~2x faster IB network connectivity enabled PCIe Gen 4 17
18 Deep-Learning (DL) Neural Network Models Most ML approaches (linear regression, decision trees, association rules, etc.) do not need s Many ML frameworks (Python Scikit, Spark Mllib, etc.) do not support options Lots of multi-dimensional matrix multiplication operations Reference: An Analysis of Deep Neural Network Models for Practical Applications, A. Canziani et al., April 2017.
19 Deep-Learning Model Training - TensorFlow Single Node (No Cluster) Aggregate Number of Images Processed Per Second Higher is better InceptionV3 With same numbers of V100 s, Power9 servers deliver better performance than Amazon P3 instances while SoftLayer bare-metal servers with V100 s deliver similar performance to Amazon P3 instances. This could be attributed to Power9 CPU optimizations and CPU- NVLink support. With 4 x V100 s, Power9 server has higher performance than Amazon P3 instance with 8 x V100 s in single precision mode. AWS P3 instance does not scale well beyond 4 x V100 s in single precision mode (although it does scale well leveraging Tensor Cores in half precision (FP16) mode. TensorFlow 1.4 and 1.5 versions do not leverage the Tensor Cores in V100 s very well, so the latest TensorFlow 1.6-dev build was used for optimal half precision (FP16) performance ResNet x P100 2 x P100 s 1 x P100 2 x P100 PCIe PCIe s 1 x V100 1 x V100 (FP16) 2 x V100 s 2 x V100 s (FP16) 1 x P100 2 x P100 s 4 x P100 s 1 x V100 1 x V100 (FP16) 2 x V100 s 2 x V100 s (FP16) 4 x V100 s 4 x V100 s (FP16) 1 x V100 1 x V100 (FP16) 2 x V100 s 2 x V100 s (FP16) 4 x V100 s 4 x V100 s (FP16) 8 x V100 s 8 x V100 s (FP16) SL Bare- Metal Server SL Bare- Metal Server SL VSI (16 VCPUs) SL VSI (16 VCPUs) SL Bare- Metal Server SL Bare- Metal Server SL Bare- Metal Server SL Bare- Metal Server Power8 Minsky Power8 Minsky Power8 Minsky Power9 Power9 Power9 Power9 Power9 Power9 AWS P3 Instance AWS P3 Instance AWS P3 instance AWS P3 Instance AWS P3 Instance AWS P3 Instance AWS P3 Instance AWS P3 Instance Notes: Input dataset: ImageNet (crop size=224x224); Batch size = 64 per (for both InceptionV3 and ResNet50 neural net models) With V100 s, independent distribution mode for model variables and gradients was used for optimal performance. Mixed precision (FP16/32) leverages Tensor Cores in V100 s. SoftLayer bare-metal server has 48 logical CPU cores while Power9 server and AWS P3 instance have 64 logical CPU cores.
20 Aggregate Number of Images Processed Per Second Higher is better Deep-Learning Model Training - InceptionV3 on TensorFlow SoftLayer Bare Metal vs. Power8 Minsky vs. Power9 vs. Amazon P3 Instance Single Precision Unless Noted Otherwise; Single Node (No Cluster) For InceptionV3 on TensorFlow, half precision (FP16) on the V100 s uses Tensor Cores to achieve ~1.8X better performance than single precision. The larger performance gain (up to 4.4X) of FP16 on AWS P3 is due to the relatively low performance of the Deep Learning AMI with TensorFlow v1.4 used for single precision compared to TensorFlow 1.6-dev used for half precision mode. Given the same number of V100 s, SoftLayer servers generally deliver similar performance as Amazon P3 instances Power9 delivers better performance than Amazon P3 instance at the same number of V100 s across the board up to 1.58X in single precision mode Only Amazon P3 instance supports 8 x V100 s at the present time Number of s SL BM w/ P100 s SL VSI w/ P100 s SL BM w/ V100 s SL BM w/ V100 s (FP16) Power8 Minsky w/ P100 s Power9 w/ V100 s Power9 w/ V100 s (FP16) AWS P3 w/ V100 s AWS P3 w/ V100 s (FP16) Notes: Input dataset: ImageNet (crop size=224x224); Batch size = 64 per (for both InceptionV3 and ResNet50 neural net models) With V100 s, independent distribution mode for model variables and gradients was used for optimal performance. Mixed precision (FP16/32) leverages Tensor Cores in V100 s. SoftLayer bare-metal server has 48 logical CPU cores while Power9 server and AWS P3 instance have 64 logical CPU cores.
21 Aggregate Number of Images Processed Per Second Deep-Learning Model Training - Power9 Server w/ V100 s Impact of Model Variable Distribution & Gradient Aggregration Modes Higher is better For TensorFlow, independent distribution mode (replicated_distributed) for model variables and gradient aggregation delivers much better performance for 4 s (and higher) than the default parameter_server mode. Single Precision, Single Node (No Cluster) Number of s parameter_server, InceptionV3 parameter_server, ResNet replicated, InceptionV3 replicated, ResNet independent, InceptionV3 independent, ResNet Notes: Input dataset: ImageNet (crop size=224x224) For Caffe, highest batch sizes were used to fully exploit memory. For TensorFlow, batch size = 64 per. Mixed precision (16-bit input matrices, 32-bit accumulator) leverages Tensor Cores in V100 s
22 Number of Images Processed Per Second Deep-Learning Model Training - InceptionV3 on TensorFlow Impact of Number of POWER9 CPU Threads on TensorFlow (Number of vcpus = Number of logical CPU cores seen by the OS) (Single Precision, No Cluster, Batch Size = 64/) 1 (independent) 2 s (independent) 3 s (independent) 4 s (independent) 1 (parameter_server) 2 s (parameter_server) 3 s (parameter_server) 4 s (parameter_server) Number of POWER9 CPU Threads Number of Images Processed Per Second Deep-Learning Model Training - InceptionV3 on TensorFlow Impact of Number of POWER9 CPU Threads on TensorFlow (Number of vcpus = Number of logical CPU cores seen by the OS) (Half Precision - FP16, No Cluster, Batch Size = 64/) 1 (independent) 2 s (independent) 3 s (independent) 4 s (independent) 1 (parameter_server) 2 s (parameter_server) 3 s (parameter_server) 4 s (parameter_server) Number of POWER9 CPU Threads
23 THANK YOU!!!
Deep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationBeyond Training The next steps of Machine Learning. Chris /in/chrisparsonsdev
Beyond Training The next steps of Machine Learning Chris Parsons chrisparsons@uk.ibm.com @chrisparsonsdev /in/chrisparsonsdev What is this talk? Part 1 What is Machine Learning? AI Infrastructure PowerAI
More informationIBM Power Advanced Compute (AC) AC922 Server
IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationDeep Learning Inferencing on IBM Cloud with NVIDIA TensorRT
Deep Learning Inferencing on IBM Cloud with NVIDIA TensorRT Khoa Huynh Senior Technical Staff Member (STSM), IBM Larry Brown Senior Software Engineer, IBM Agenda Introduction Inferencing with PyCaffe TensorRT
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationDGX UPDATE. Customer Presentation Deck May 8, 2017
DGX UPDATE Customer Presentation Deck May 8, 2017 NVIDIA DGX-1: The World s Fastest AI Supercomputer FASTEST PATH TO DEEP LEARNING EFFORTLESS PRODUCTIVITY REVOLUTIONARY AI PERFORMANCE Fully-integrated
More informationUniversité IBM i 2017
Université IBM i 2017 17 et 18 mai IBM Client Center de Bois-Colombes S24 Architecture IBM POWER: tendances et stratégies Jeudi 18 mai 11h00-12h30 Jean-Luc Bonhommet IBM AGENDA IBM Power Systems - IBM
More informationNVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI
NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI Overview Unparalleled Value Product Portfolio Software Platform From Desk to Data Center to Cloud Summary AI researchers depend on computing performance to gain
More informationIBM Power Systems. Artificial Intelligence mit IBM Power 9 und Power AI / AI Vision. Ulrich Walter
IBM Power Systems Artificial Intelligence mit IBM Power 9 und Power AI / AI Vision Ulrich Walter A New Era of Computing has Emerged Centralized Computing Personal Computer Data Distributed Computing Client/
More informationOpenCAPI and its Roadmap
OpenCAPI and its Roadmap Myron Slota, President OpenCAPI Speaker name, Consortium Title Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI and
More informationOpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,
IWOPH workshop, ISC, Germany June 21, 2017 OpenPOWER Innovations for HPC IBM Research Christoph Hagleitner, hle@zurich.ibm.com IBM Research - Zurich Lab IBM Research - Zurich Established in 1956 45+ different
More informationIBM Power 9 надежная платформа для развертывания облаков. Ташкент. Юрий Кондратенко Cross-Brand Sales Specialist
IBM Power 9 надежная платформа для развертывания облаков Ташкент Юрий Кондратенко Cross-Brand Sales Specialist Power Systems Family POWER9 servers and solutions are built to crush today s most advanced
More informationDGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo
DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable
More informationIBM POWER9 Server Update
IBM POWER9 Server Update Luc Cloutier Advisory I/T Specialist, Power Server luc@ca.ibm.com Charts by: Simon Porstendorfer Principal Offering Manager Cognitive Systems Dylan Boday, Ph.D. Offering Manager,
More informationUCS M-Series + Citrix XenApp Optimizing high density XenApp deployment at Scale
In Collaboration with Intel UCS M-Series + Citrix XenApp Optimizing high density XenApp deployment at Scale Aniket Patankar UCS Product Manager May 2015 Cisco UCS - Powering Applications at Every Scale
More informationLooking ahead with IBM i. 10+ year roadmap
Looking ahead with IBM i 10+ year roadmap 1 Enterprises Trust IBM Power 80 of Fortune 100 have IBM Power Systems The top 10 banking firms have IBM Power Systems 9 of top 10 insurance companies have IBM
More informationPOWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist
POWER9 Announcement Martin Bušek IBM Server Solution Sales Specialist Announce Performance Launch GA 2/13 2/27 3/19 3/20 POWER9 is here!!! The new POWER9 processor ~1TB/s 1 st chip with PCIe4 4GHZ 2x Core
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationIBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland
IBM Power Systems Update David Spurway IBM Power Systems Product Manager STG, UK and Ireland Would you like to go fast? Go faster - win your race Doing More LESS With Power 8 POWER8 is the fastest around
More informationHPE ProLiant DL360 Gen P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21)
Digital data sheet HPE ProLiant DL360 Gen10 4110 1P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21) ProLiant DL Servers What's new Innovative design with greater flexibility to mix and match
More informationInspur AI Computing Platform
Inspur Server Inspur AI Computing Platform 3 Server NF5280M4 (2CPU + 3 ) 4 Server NF5280M5 (2 CPU + 4 ) Node (2U 4 Only) 8 Server NF5288M5 (2 CPU + 8 ) 16 Server SR BOX (16 P40 Only) Server target market
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationDeep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor
WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department
More informationHPE ProLiant ML350 Gen10 Server
Digital data sheet HPE ProLiant ML350 Gen10 Server ProLiant ML Servers What's new Support for Intel Xeon Scalable processors full stack. 2600 MT/s HPE DDR4 SmartMemory RDIMM/LRDIMM offering 8, 16, 32,
More informationDemocratizing Machine Learning on Kubernetes
Democratizing Machine Learning on Kubernetes Joy Qiao, Senior Solution Architect - AI and Research Group, Microsoft Lachlan Evenson - Principal Program Manager AKS/ACS, Microsoft Who are we? The Data Scientist
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationHPE ProLiant DL580 Gen10 Server
Digital data sheet HPE ProLiant DL580 Gen10 Server ProLiant DL Servers What's new Support for the new Intel Xeon Scalable Gold 6143 and Intel Xeon Scalable Platinum 8165 processors which support core boosting
More informationArm Processor Technology Update and Roadmap
Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture
More informationOpenFOAM Performance Testing and Profiling. October 2017
OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC
More informationData Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node
Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node Standard server node for PRIMERGY CX400 M1 multi-node server system
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationBroadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence
TM Artificial Intelligence Server for Fraud Date: Q2 2017 Application: Artificial Intelligence Tags: Artificial intelligence, GPU, GTX 1080 TI HM Revenue & Customs The UK s tax, payments and customs authority
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationAgenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >
Agenda Sun s x86 1. Sun s x86 Strategy 2. Sun s x86 Product Portfolio 3. Virtualization < 1 > 1. SUN s x86 Strategy Customer Challenges Power and cooling constraints are very real issues Energy costs are
More informationInterconnect Your Future
#OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox
More informationCERN openlab & IBM Research Workshop Trip Report
CERN openlab & IBM Research Workshop Trip Report Jakob Blomer, Javier Cervantes, Pere Mato, Radu Popescu 2018-12-03 Workshop Organization 1 full day at IBM Research Zürich ~25 participants from CERN ~10
More informationHPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)
Digital data sheet HPE ProLiant ML350 Gen10 4110 1P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01) ProLiant ML Servers What's new Support for Intel Xeon Scalable processors full stack. 2600
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationEPYC VIDEO CUG 2018 MAY 2018
AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal
More informationCisco UCS C480 ML M5 Rack Server Performance Characterization
White Paper Cisco UCS C480 ML M5 Rack Server Performance Characterization The Cisco UCS C480 ML M5 Rack Server platform is designed for artificial intelligence and machine-learning workloads. 2018 Cisco
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationIBM Technology and Solutions for Artificial Intelligence and HPC
IBM Technology and Solutions for Artificial Intelligence and HPC AI System Architecture IBM Power9 and Beyond Ulrich Walter Ulrich.walter@de.ibm.com Supercomputers Built for AI The race is on Use cases
More informationApril 2 nd, Bob Burroughs Director, HPC Solution Sales
April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software
More informationIBM Cloud for VMware Solutions
Introduction 2 IBM Cloud IBM Cloud for VMware Solutions Zeb Ahmed Senior Offering Manager VMware on IBM Cloud Mehran Hadipour Director Business Development - Zerto Internal Use Only Do not distribute 3
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationIn partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER
In partnership with VelocityAI REFERENCE JULY // 2018 Contents Introduction 01 Challenges with Existing AI/ML/DL Solutions 01 Accelerate AI/ML/DL Workloads with Vexata VelocityAI 02 VelocityAI Reference
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationIBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE
IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE Choosing IT infrastructure is a crucial decision, and the right choice will position your organization for success. IBM Power Systems provides an innovative platform
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationIBM Power User Group - Atlanta
IBM Power User Group - Atlanta Wes Showfety Open Source Database & HPC strategist, North America showfety@us.ibm.com 770-617-7377 LinkedIn: https://www.linkedin.com/in/wes-showfety-2399444 Twitter: @Wes_Show
More informationIBM Power Systems HPC Cluster
IBM Power Systems HPC Cluster Highlights Complete and fully Integrated HPC cluster for demanding workloads Modular and Extensible: match components & configurations to meet demands Integrated: racked &
More informationCisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr
Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for
More informationPOWER9. Jeff Stuecheli POWER Systems, IBM Systems IBM Corporation
POWER9 Jeff Stuecheli POWER Systems, IM Systems 2018 IM Corporation Recent and Future POWER Processor Roadmap POWER7 45 nm 2010 POWER7+ 32 nm 2012 POWER8 Family 22nm 2014 2016 POWER9 Family 14nm 2H17 2H18+
More informationHewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE
Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE Digital transformation is taking place in businesses of all sizes Big Data and Analytics Mobility Internet of Things
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationData Sheet FUJITSU Server PRIMERGY CX400 M1 Scale out Server
Data Sheet FUJITSU Server Scale out Server Data Sheet FUJITSU Server Scale out Server Scale-Out Smart for HPC, Cloud and Hyper-Converged Computing FUJITSU Server PRIMERGY will give you the servers you
More informationOpenPOWER Performance
OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack
More informationHPE ProLiant ML110 Gen10 Server
Digital data sheet HPE ProLiant ML110 Gen10 Server ProLiant ML Servers What's new New SMB focused offers regionally released as Smart Buy Express in the U.S. and Canada, Top Value in Europe, and Intelligent
More informationBoundless Computing Inspire an Intelligent Digital World
Huawei FusionServer V5 Rack Server Boundless Computing Inspire an Intelligent Digital World HUAWEI TECHNOLOGIES CO., LTD. 1288H V5 Server High-Density Deployment with Lower OPEX 1288H V5 (4-drive) 1288H
More informationIBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads
IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads The engine to power your AI data pipeline Introduction: Artificial intelligence (AI) including deep learning (DL) and machine
More informationData-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP
Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP Data center logic silicon Tam ~30% cagr Ai is exploding $8-10B Emerging as a
More informationAdvancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances
Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances Adrien Gaidon - Machine Learning Lead, Toyota Research Institute Mike Garrison - Senior Systems Engineer,
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationINCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE
www.iceotope.com DATA SHEET INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE BLADE SERVER TM PLATFORM 80% Our liquid cooling platform is proven to reduce cooling energy consumption by
More informationDensity Optimized System Enabling Next-Gen Performance
Product brief High Performance Computing (HPC) and Hyper-Converged Infrastructure (HCI) Intel Server Board S2600BP Product Family Featuring the Intel Xeon Processor Scalable Family Density Optimized System
More informationFast forward. To your <next>
Fast forward To your Navin Shenoy EXECUTIVE VICE PRESIDENT GENERAL MANAGER, DATA CENTER GROUP CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION Intel Xeon Scalable Platform The
More informationData Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node
Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node Datasheet for Red Hat certification Standard server node for PRIMERGY
More informationS8901 Quadro for AI, VR and Simulation
S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more
More informationTECHNOLOGIES CO., LTD.
A Fresh Look at HPC HUAWEI TECHNOLOGIES Francis Lam Director, Product Management www.huawei.com WORLD CLASS HPC SOLUTIONS TODAY 170+ Countries $74.8B 2016 Revenue 14.2% of Revenue in R&D 79,000 R&D Engineers
More information2014 LENOVO INTERNAL. ALL RIGHTS RESERVED.
2014 LENOVO INTERNAL. ALL RIGHTS RESERVED. Who is Lenovo? A $39 billion, Fortune 500 technology company - Publicly listed/traded on the Hong Kong Stock Exchange - 54,000 employees serving clients in 160+
More informationLS-DYNA Performance Benchmark and Profiling. April 2015
LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationData Sheet FUJITSU Server PRIMERGY CX400 M4 Scale out Server
Data Sheet FUJITSU Server PRIMERGY CX400 M4 Scale out Server Data Sheet FUJITSU Server PRIMERGY CX400 M4 Scale out Server Workload-specific power in a modular form factor FUJITSU Server PRIMERGY will give
More informationSupports up to four 3.5-inch SAS/SATA drives. Drive bays 1 and 2 support NVMe SSDs. A size-converter
, on page External Features, on page Serviceable Component Locations, on page Summary of Server Features, on page The server is orderable in different versions, each with a different front panel/drive-backplane
More informationIBM i テクニカル ワークショップ IBM i and the Future. Tim Rowe. IBM i Architect Application Development Systems Management.
IBM i and the Future Tim Rowe IBM i Architect Application Development Systems Management timmr@us.ibm.com IBM i Business 2015 International Business Machines Corporation 2 2015 International Business Machines
More informationGPU Accelerated Data Processing Speed of Thought Analytics at Scale
GPU Accelerated Data Processing Speed of Thought Analytics at Scale The benefits of Brytlyt s GPU Accelerated Database Brytlyt is an ultra-high performance database that combines patent pending intellectual
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationHPE ProLiant ML110 Gen P 8GB-R S100i 4LFF NHP SATA 350W PS DVD Entry Server/TV (P )
Digital data sheet HPE ProLiant ML110 Gen10 3104 1P 8GB-R S100i 4LFF NHP SATA 350W PS DVD Entry Server/TV (P03684-425) ProLiant ML Servers What's new New SMB focused offers regionally released as Smart
More informationCisco UCS C24 M3 Server
Data Sheet Cisco UCS C24 M3 Rack Server Product Overview The form-factor-agnostic Cisco Unified Computing System (Cisco UCS ) combines Cisco UCS C-Series Rack Servers and B-Series Blade Servers with networking
More informationHPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov
HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads Natalia Vassilieva, Sergey Serebryakov Deep learning ecosystem today Software Hardware 2 HPE s portfolio for deep learning Government,
More informationMACHINE LEARNING WITH NVIDIA AND IBM POWER AI
MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices
More informationFull Featured with Maximum Flexibility for Expansion
PRODUCT brief Data Center, Cloud, High Performance Computing Intel Server Board S2600WF Product Family Featuring the 2 nd Generation Intel Xeon Processor Scalable Family Full Featured with Maximum Flexibility
More informationCisco UCS B460 M4 Blade Server
Data Sheet Cisco UCS B460 M4 Blade Server Product Overview The new Cisco UCS B460 M4 Blade Server uses the power of the latest Intel Xeon processor E7 v3 product family to add new levels of performance
More informationData Sheet Fujitsu Server PRIMERGY CX400 M1 Compact and Easy
Data Sheet Fujitsu Server Compact and Easy Data Sheet Fujitsu Server Compact and Easy Scale-Out Smart for HPC, Cloud and Hyper-Converged Computing The Fujitsu Server helps to meet the immense challenges
More informationData Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure
Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure Scale-Out Smart for HPC and Cloud Computing with enhanced
More informationIBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems
IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems 2014 IBM Corporation Powerful Forces are Changing the Way Business Gets Done Data growing exponentially
More informationPower Technology For a Smarter Future
2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More information