Zhang HPC Application R&D Manager,Inspur
|
|
- Clifford Chapman
- 6 years ago
- Views:
Transcription
1 Zhang HPC Application R&D Manager,Inspur
2 Inspur-Nvidia GPU Joint Lab Introduction Caffe-MPI: Parallel CAFFE framework based on GPU cluster
3 Inspur-Nvidia GPU Joint Lab Introduction Inspur-Nvidia GPU Joint Lab App Research Directions Traditional HPC Deep Learning Field Application Clients Speed-up ratio Platform Life Science BLASTN Beijing Institute of Genomics 35X(kernel) 1GPU /1CPU core ET Institute of Biophysics, CSA 48X 1GPU /1CPU core CFD LBM_LES 100X 1GPU /1CPU core RNA 8X 24 GPU nodes /24 CPU nodes Oil&gas PSTM BGP 5X 6 GPU nodes / 6 CPU nodes Scandip 9X 4GPU+2CPU /2CPU Caffe Qihoo 12.5X 16GPU/1GPU CSP DNN IFlick 13X 16GPU/1GPU K-means Qihoo 35X 1GPU/1CPU core Neural Network Qihoo 270X 4GPU/1CPU core
4 Application :DNN Client:IFLYTEK Performance:16GPU/1GPU = 13X Mobile Phone Car Deep learning For speech recognition Intelligent customer service Business travel query
5 Application: neural network Client:Qihoo Performance:4 GPU/1 CPU core =270X Time(s)
6 ForwardBackward computing 80% Data parallel Weight computing 16% Some part can be paralleled Net update 4% Some part can be paralleled Caffe has many users, it is very popular in China. Caffe need a long training time for big data based one GPU node. Caffe s ForwardBackward computing,weight computing and net update all can be paralleled with GPU cluster.
7 What is Caffe-MPI? Developed by Inspur Open-source: Based on the Berkeley Vision and Learning Center (BVLC) Single GPU Caffe version A GPU Cluster Caffe version Support 16+ GPUs to Train
8 based on HPC Technology Hardware arch:ib+gpu cluster+lustre Software arch:mpi+pthread+cuda Data parallel on GPU Cluster GPU Cluster Configuration GPU master node GPU Salve Node Storage network Software Multi GPUs Multi GPUs Lustre 56Gb/s IB Linux/Cuda7.5/Mvapi ch2
9 MPI Mast-Slave model Master Process:Multi Pthread Threads+CUDA Threads Slave Process:CUDA Threads Reference:Q Ho,J Cipar,H Cui,JK Kim,S Lee,... More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.
10 Master Process (0 process) Three Pthread Groups Parallel read data and send data Weight Computing and The parameter update The parameter communication
11 Slave process CPU To receive training data from the master process To send weight data(gpu-to-gpu) To receive new net data(gpu-to-gpu) GPU ForwardBackward computing Slave Node The number of Slave process = the number of GPU
12 GPU parallel computing Computing & Communication asynchronous parallel Communication Optimization GPU RDMA:Weight Data and Net data between GPUs Total Time=max(T Read Data+Send Data,T ForwardBackWord Computing+ Weight Computing and Net Update+ Net Send )
13 Speed-up Ratio:16GPU/1GPU=10.45X Scalability efficiency:65%
14 Speed-up Ratio:16GPU/1GPU=10.74X Scalability efficiency:67%
15 Peformance speed by cudnn =21% Speed-up Ratio:16GPU/1GPU=12.66X Scalability efficiency:79% 1,4 0 0 G ooglen et(iterations= ,b atchsize=6 4 ) 1,3 8 0 Training T im e(s) 1, (C affe-m P I) 1 6 (C affe-m P I+cuD N N ) T he N um b er of G P U
16 Parallel read training data from Lustre Storage and send data to different GPUs GPU Cluster be divided into many groups Every group have a master node Every master node parallel read and send data with Multi Processes +Multi Threads Can support large-scale GPU computing for a big training platform
17 Speed-up Ratio:16GPU/1GPU=13X Scalability efficiency:81%
18 The Next work: Support cudnn 4.0 MPI Framework tuning Symmetric model Caffe-MPI version open source roadmap Q2:Computing-Intensive Model:support 32+ GPU parallel Q3:IO-Intensive Model:support 16+ GPU parallel Q4:Support Half Precision for Pascal GPU
19 Conclusions Caffe-MPI is based on HPC technology architecture Performance:16 GPU/1GPU=13X Caffe-MPI can support 16+ GPU to train big data Inspur will continue to open source new versions 32 GPU parallel version for Computing-Intensive Model 16+ GPU parallel version for IO Support Half Precision for Pascal GPU
20
HPC New Developments. Vangel Bojaxhi HPC Business Development Manager COMPUTING INSPIRES FUTURE
HPC New Developments Vangel Bojaxhi HPC Business Development Manager COMPUTING INSPIRES FUTURE Agenda Inspur Global Server & HPC Leader HPC Market & Technology Trends Inspur HPC Products, Integrated Solutions
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationDEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM
DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationDeep learning in MATLAB From Concept to CUDA Code
Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationDIGITS DEEP LEARNING GPU TRAINING SYSTEM
DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,
More informationDelivering Deep Learning to Mobile Devices via Offloading
Delivering Deep Learning to Mobile Devices via Offloading Xukan Ran*, Haoliang Chen*, Zhenming Liu 1, Jiasi Chen* *University of California, Riverside 1 College of William and Mary Deep learning on mobile
More informationTowards Scalable Machine Learning
Towards Scalable Machine Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Fraunhofer Center Machnine Larning Outline I Introduction
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationDeep Learning Frameworks with Spark and GPUs
Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing 2 What is High Performance Computing? There is no clear definition Computing on high performance computers Solving problems / doing research using computer modeling,
More informationHPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017
HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationGUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning
GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep
More informationAsynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms
Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer
More informationGPU-Accelerated Deep Learning
GPU-Accelerated Deep Learning July 6 th, 2016. Greg Heinrich. Credits: Alison B. Lowndes, Julie Bernauer, Leo K. Tam. PRACTICAL DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization,
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationSmall is the New Big: Data Analytics on the Edge
Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationDemystifying Deep Learning
Demystifying Deep Learning Let the computers do the hard work Jérémy Huard 2015 The MathWorks, Inc. 1 2 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More informationApplication Performance on IME
Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More information2015 The MathWorks, Inc. 1
2015 The MathWorks, Inc. 1 개발에서구현까지 MATLAB 환경에서의딥러닝 김종남 Application Engineer 2015 The MathWorks, Inc. 2 3 Why MATLAB for Deep Learning? MATLAB is Productive MATLAB is Fast MATLAB Integrates with Open Source
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationA Simulated Annealing algorithm for GPU clusters
A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper
More informationHPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017
HPC Innovation Lab Update Dell EMC HPC Community Meeting 3/28/2017 Dell EMC HPC Innovation Lab charter Design, develop and integrate Heading HPC systems Lorem ipsum Flexible reference dolor sit amet, architectures
More informationIntroduction to Deep Learning in Signal Processing & Communications with MATLAB
Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India 2019 The MathWorks, Inc. 1 Different Types
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationSirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski,
More informationToward Scalable Deep Learning
한국정보과학회 인공지능소사이어티 머신러닝연구회 두번째딥러닝워크샵 2015.10.16 Toward Scalable Deep Learning 윤성로 Electrical and Computer Engineering Seoul National University http://data.snu.ac.kr Breakthrough: Big Data + Machine Learning
More information10 Billion Parameter Neural Networks in your Basement. Adam Coates Stanford University
10 Billion Parameter Neural Networks in your Basement Adam Coates Stanford University Overview: two parts Deep learning and feature learning. ExciEng topic in machine learning. Major area of AI research.
More informationGROMACS (GPU) Performance Benchmark and Profiling. February 2016
GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute
More informationDemystifying Deep Learning
Demystifying Deep Learning Mandar Gujrathi Mandar.Gujrathi@mathworks.com.au 2015 The MathWorks, Inc. 1 2 Deep Learning Applications Voice assistants (speech to text) Teaching character to beat video game
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationObject recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK
Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK 17 May 2016, Melbourne 24 May 2016, Sydney Werner Scholz, CTO and Head of R&D, XENON Systems Mike Wang, Solutions Architect,
More informationINSPUR and HPC Innovation
INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community
More informationHPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab
HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationEfficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, and Dhabaleswar K. Panda Network-Based Computing Laboratory Department
More informationMachine Learning in WAN Research
Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationP I X E V I A : A I B A S E D, R E A L - T I M E C O M P U T E R V I S I O N S Y S T E M F O R D R O N E S
P I X E V I A : A I B A S E D, R E A L - T I M E C O M P U T E R V I S I O N S Y S T E M F O R D R O N E S Mindaugas Eglinskas, CEO at PIXEVIA www.pixevia.com Origins in R&D projects for Lithuanian MoD.
More informationMachine Learning in WAN Research
Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network
More informationLayer-wise Performance Bottleneck Analysis of Deep Neural Networks
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks Hengyu Zhao, Colin Weinshenker*, Mohamed Ibrahim*, Adwait Jog*, Jishen Zhao University of California, Santa Cruz, *The College of William
More informationCharacterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures
Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationDeveloping, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge
Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Ryan Hulguin Applications Engineer ryan.hulguin@arm.com Agenda Introduction Overview of Allinea Products
More informationArchitectures for Scalable Media Object Search
Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationSemantic Segmentation
Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationUsing CNN Across Intel Architecture
white paper Artificial Intelligence Object Classification Intel AI Builders Object Classification Using CNN Across Intel Architecture Table of Contents Abstract...1 1. Introduction...1 2. Setting up a
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationProf. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)
Director: Prof. Konstantinos Krampis agbiotec@gmail.com Office: Rm. 467F Belfer Research Building Phone: (212) 396-6930 Fax: (212) 650 3565 Facility Consultant:Carlos Lijeron 1/8 carlos@carotech.com Office:
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationDeep Learning on Modern Architectures. Keren Zhou 4/17/2017
Deep Learning on Modern Architectures Keren Zhou 4/17/2017 HPC Software Stack Application Algorithm Data Layout CPU GPU MIC Others HPC Software Stack Deep Learning Algorithm Data Layout CPU GPU MIC Others
More informationAdditive Manufacturing Defect Detection using Neural Networks
Additive Manufacturing Defect Detection using Neural Networks James Ferguson Department of Electrical Engineering and Computer Science University of Tennessee Knoxville Knoxville, Tennessee 37996 Jfergu35@vols.utk.edu
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationCS500 SMARTER CLUSTER SUPERCOMPUTERS
CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationINSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM
INSPUR and HPC Innovation Dong Qi (Forrest) Oversea PM dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community Inspur
More informationDGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo
DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER Markus Weber and Haiduong Vo NVIDIA DGX SYSTEMS Agenda NVIDIA DGX-1 NVIDIA DGX STATION 2 ONE YEAR LATER NVIDIA DGX-1 Barriers Toppled, the Unsolvable
More informationParallel Stochastic Gradient Descent: The case for native GPU-side GPI
Parallel Stochastic Gradient Descent: The case for native GPU-side GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer
More informationHigh-Performance Training for Deep Learning and Computer Vision HPC
High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationTHOUGHTS ABOUT THE FUTURE OF I/O
THOUGHTS ABOUT THE FUTURE OF I/O Dagstuhl Seminar Challenges and Opportunities of User-Level File Systems for HPC Franz-Josef Pfreundt, May 2017 Deep Learning I/O Challenges Memory Centric Computing :
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
VIRT1997BU Machine Learning on VMware vsphere with NVIDIA s #VMworld #VIRT1997BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology
More informationScalable deep learning on distributed GPUs with a GPU-specialized parameter server
Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Henggang Cui, Gregory R. Ganger, and Phillip B. Gibbons Carnegie Mellon University CMU-PDL-15-107 October 2015 Parallel
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationDefense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR
Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / 2017. 10. 31 syoh@add.re.kr Page 1/36 Overview 1. Introduction 2. Data Generation Synthesis 3. Distributed Deep Learning 4. Conclusions
More informationAdditive Manufacturing Defect Detection using Neural Networks. James Ferguson May 16, 2016
Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016 Outline Introduction Background Edge Detection Methods Results Porosity Detection Methods Results Conclusion /
More informationAdvanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied
More informationApplied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. Kim Hazelwood Facebook AI Infrastructure
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective Kim Hazelwood Facebook AI Infrastructure Citation Count Re-Emergence of Machine Learning 3000 Gradient-Based Learning Applied
More informationNVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016
NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING September 13, 2016 AI FOR AUTONOMOUS DRIVING MAPPING KALDI LOCALIZATION DRIVENET Training on DGX-1 NVIDIA DGX-1 NVIDIA DRIVE PX 2 Driving with DriveWorks
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationSolving the Non-Volatile Memory Conundrum for Deep Learning Workloads
Solving the Non-Volatile Memory Conundrum for Deep Learning Workloads Ahmet Inci and Diana Marculescu Department of Electrical and Computer Engineering Carnegie Mellon University ainci@andrew.cmu.edu Architectures
More informationSUPERCHARGE DEEP LEARNING WITH DGX-1. Markus Weber SC16 - November 2016
SUPERCHARGE DEEP LEARNING WITH DGX-1 Markus Weber SC16 - November 2016 NVIDIA Pioneered GPU Computing Founded 1993 $7B 9,500 Employees 100M NVIDIA GeForce Gamers The world s largest gaming platform Pioneering
More informationTHE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM
THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM FIRST WELCOME TO VERNE GLOBAL Established in Iceland 2007 Optimised industrial scale data center solutions exploiting Iceland s cool
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More informationrcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain
rcuda: hybrid - clusters Federico Silla Technical University of Valencia Spain Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that?
More informationEvaluating On-Node GPU Interconnects for Deep Learning Workloads
Evaluating On-Node GPU Interconnects for Deep Learning Workloads NATHAN TALLENT, NITIN GAWANDE, CHARLES SIEGEL ABHINAV VISHNU, ADOLFY HOISIE Pacific Northwest National Lab PMBS 217 (@ SC) November 13,
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More information