CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker
|
|
- Phillip Lee
- 5 years ago
- Views:
Transcription
1 CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker
2 CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It s a platform for the next generation of HPC, leveraging commodity driven improvements from the most rapidly evolving compute markets. 2
3 The next revolution: Power Efficiency Look at the market for the next generation of HPC components Power-effective computing driven by phones and tablets ARM, with architectural and experience advantages System-level software complexity is high HPC driven by accelerated computing All major vendors have switched to accelerators GPUs have an architectural efficiency advantage Titan gets 90% of its performance from the accelerator 3
4 Possible Obvious Power-efficient Future Power-efficient general purpose cores combined with Compute Accelerators Power control shared with mobile products Ultra-focused on power efficiency Competition forces rapid improvement Technology evolution driven by commodity market Bulk of compute power provided by inherently efficient GPUs Increase to over 50% of chip power for flops. 4
5 NVIDIA has these elements GPU and Computing ARM SoCs 5
6 Why CUDA on ARM? Development platforms for future HPC systems Explore the efficiency and performance trade-offs Utilize existing hardware: construct systems with ARM CPUs combined with a discrete GPU 6
7 Current Generation: MXM Devkit SECO carrier board: SECO MXM Devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia MXM GPU module Quadro 1000m (GF108) on 4 lanes of PCIe 96 CUDA cores with 269 GFlops peak Carrier provides I/O connectors, power supplies PCIe connected 1Gbps Ethernet (i82574), USB, SATA 7
8 8
9 Current Generation Software ARM Linux distribution L4T r15.2 softfp, Ubuntu Linux kernel Cuda 4.2 toolkit and samples, driver x86 system support for cross development nvcc cross-compiler support 9
10 Introducing KAYLA Support of Kepler-class GPU SM35 adds dynamic parallelism and other features 2 SMX, 384 CUDA cores Comes in MXM and PCIe form factor Capability approaching Logan SoC Integrated solution will be more power-efficient 10
11 Next Generation: mitx Devkit Seco carrier board: Seco mini-itx GPU devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia PCIe GPU ATX power supply supports higher power GPUs Qualified for gf108, gk107, gk104, and Kayla GPU Carrier provides I/O connectors 11
12 Next Generation Hardware 12
13 Next Generation Software Arm Linux distribution Based on L4T R16.2 hardfp, Ubuntu Linux kernel Cuda 5.0 toolkit and samples, driver Increased parity with x86 Linux (nvcuvid, nvprof, thrust) x86 system support for cross development nvcc cross-compiler support nfs-kernel-server support to ease cross compilation Back ported to SECO MXM Devkit 13
14 CUDA on ARM Roadmap Software CUDA releases starting with CUDA 5.5 and 319.xy include ARM support Native ARM compiler cuda-gdb: native ARM and client-server Long term plans for CUDA on the ARM platform Logan, Tegra with integrated Kepler class GPU ARMv8 64-bit platform support, starting with Parker Enable other partners and industry support 14
15 Notes on Comparing Compute Efficiency Measuring power isn t always easy Multiple points to measure input power Multiple power rails and components Different peripherals and activity Active cooling and over-cooling are significant power draws Measuring application power draw adds to the challenge I/O and DRAM activity can be power-hungry Different phases have different power profiles A power-efficient system has widely varying power draw Turn off the lights when you leave the room Recent activity has a big influence on present power draw 15
16 Power, Performance, and Benchmarks Current Power Condition 0.46A No GPU installed, SATA disk 0.50A 9.12W Idle power, fan off 0.60A 10.9W Idle power, slow fan 0.66A +1.1W Idle with SATA disk 0.86A +3.65W GPU power state set to maximum performance 1.06A 19.3W Running smoke at 27FPS, average (23W peak) 2.05A 37.4W Running real-time raytracing (41.1W peak) 16
17 Demos Glass, galaxy, and Ocean live demos 17
18 Developer Information Information: Forums:
CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker
CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It
More information7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT
7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also
More informationTHE LEADER IN VISUAL COMPUTING
MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationE4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU
E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationA176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O
The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS, and it consumes less than 17W at full load (8-10W at typical
More informationApalis A New Architecture for Embedded Computing
Apalis A New Architecture for Embedded Computing Agenda The Hardware Abstraction Pyramid The System-on-Module (SoM) Why Should You Use a SoM? Discovering Apalis Motivations Architectural Overview Standard
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationA176 C clone. GPGPU Fanless Small FF RediBuilt Supercomputer. Aitech
The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS at a remarkable level of energy efficiency, providing all the
More informationEmbedded Linux Conference San Diego 2016
Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationFTF Americas. FTF Brazil. freescale.com/ftf. Secure, Embedded Processing Solutions for the Internet of Tomorrow
Secure, Embedded Processing Solutions for the Internet of Tomorrow FTF Americas FTF Brazil June 22-25, 2015 September 15, 2015 JW Marriott Austin Grand Hyatt São Paulo Hotel TM freescale.com/ftf Freescale
More information. SMARC 2.0 Compliant
MSC SM2S-IMX8 NXP i.mx8 ARM Cortex -A72/A53 Description The new MSC SM2S-IMX8 module offers a quantum leap in terms of computing and graphics performance. It integrates the currently most powerful i.mx8
More informationGTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:
GTC 2013 March 18-21 San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: SPEAK - Showcase your work among the elite of graphics computing - Call
More informationApril 4-7, 2016 Silicon Valley
April 4-7, 2016 Silicon Valley TEGRA PLATFORMS GAMING DRONES ROBOTICS IVA AUTOMOTIVE 2 Compile Debug Profile Trace C/C++ NVTX NVIDIA Tools extension Getting Started CodeWorks JetPack Installers IDE Integration
More informationWhat is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
More informationBuilding supercomputers from commodity embedded chips
http://www.montblanc-project.eu Building supercomputers from commodity embedded chips Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationINTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC)
INTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC) Sida 1 Patrik Björklund-Director of sales Tritech Solutions WE ARE TRITECH Sida 2 Embedded products, solutions and engineering
More informationHypervisors at Hyperscale
Hypervisors at Hyperscale ARM, Xen, Servers and Evolution of the Data Center Larry Wikelius Co-Founder & VP Software 1 Overview l Market Dynamics l Technology Trends l Roadmaps Where are we today l Use
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationS CUDA on Xavier
S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000
More informationNSIGHT ECLIPSE EDITION
NSIGHT ECLIPSE EDITION DG-06450-001 _v8.0 September 2016 Getting Started Guide TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. About...1 Chapter 2. New and Noteworthy... 2 2.1. New in 7.5... 2 2.2.
More informationKontron Technology ARM based Embedded
Kontron Technology ARM based Embedded Daniel Piper Senior Marketing manager July 2012 1 05.07.2012 KT Longevity SBC & Motherboard Presentation KT ARM strategy & Products - 2 Kontron s ARM Strategy overall
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationMaximizing GPU Power for Vision and Depth Sensor Processing. From NVIDIA's Tegra K1 to GPUs on the Cloud. Chen Sagiv Eri Rubin SagivTech Ltd.
Maximizing GPU Power for Vision and Depth Sensor Processing From NVIDIA's Tegra K1 to GPUs on the Cloud Chen Sagiv Eri Rubin SagivTech Ltd. Today s Talk Mobile Revolution Mobile Cloud Concept 3D Imaging
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationKontron s ARM-based COM solutions and software services
Kontron s ARM-based COM solutions and software services Peter Müller Product Line Manager COMs Kontron Munich, 4 th July 2012 Kontron s ARM Strategy Why ARM COMs? How? new markets for mobile applications
More informationARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG
ARM and x86 on Qseven & COM Express Mini Zeljko Loncaric, Marketing Engineer, congatec AG Content COM Computer-On-Module Concept Qseven Key Points The Right ARM Integration with Freescale i.mx6 Qseven
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More information2 Port SuperSpeed Mini PCI Express USB 3.0 Adapter Card w/ Bracket Kit and UASP Support
2 Port SuperSpeed Mini PCI Express USB 3.0 Adapter Card w/ Bracket Kit and UASP Support Product ID: MPEXUSB3S22B The MPEXUSB3S22B 2-Port Mini PCI Express USB 3.0 Card with Bracket Kit adds two external
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationReducing Time-to-Market with i.mx6-based Qseven Modules
Reducing Time-to-Market with i.mx6-based Qseven Modules congatec Facts The preferred global vendor for innovative embedded solutions to enable competitive advantages for our customers. Founded December
More informationMatrix. Get Started Guide
Matrix Get Started Guide Overview Matrix is a single board mini computer based on ARM with a wide range of interface, equipped with a powerful i.mx6 Freescale processor, it can run Android, Linux and other
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationElaborazione dati real-time su architetture embedded many-core e FPGA
Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T
More informationNVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research
NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research Acronyms and Definition Check Point Term Definition NVMe Non-Volatile Memory Express NVMe-oF Non-Volatile Memory Express
More informationEmbedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017
Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017 Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2 GPGPU Product
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationHigh performance, multiple expansion Born for machine vision applications. ABOX-E7 Series
High performance, multiple expansion Born for machine vision applications ABOX-E7 Series High performance, Multiple I/Os 6/7 th Core i7/i5/i3 Desktop multi-core CPU processors, Max. expansion 6*GLAN, Expansion
More informationArm Processor Technology Update and Roadmap
Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture
More information96Boards - TV Platform
96Boards - TV Platform Presented by Mark Gregotski Developing the Specification Date BKK16-303 March 9, 2016 Event Linaro Connect BKK16 Overview Motivation for a TV Platform Specification Comparison with
More information. Micro SD Card Socket. SMARC 2.0 Compliant
MSC SM2S-IMX6 NXP i.mx6 ARM Cortex -A9 Description The design of the MSC SM2S-IMX6 module is based on NXP s i.mx 6 processors offering quad-, dual- and single-core ARM Cortex -A9 compute performance at
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationNvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025)
OVERVIEW Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025) The Nvidia Quadro K5200 8GB DVI-I, two DisplayPort Graphics Card by ThinkStation is based on Nvidia
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationThe Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration
The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright
More informationASTRI/CTA data analysis on parallel and low-power platforms
ICT Workshop INAF, Cefalù 2015 Universidade de São Paulo Instituto de Astronomia, Geofisica e Ciencias Atmosferica ASTRI/CTA data analysis on parallel and low-power platforms Alberto Madonna, Michele Mastropietro
More informationCarlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)
Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB
More informationRealization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects
Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects Markus Geveler, Stefan Turek, Dirk Ribbrock PACO Magdeburg 2015 / 7 / 7 markus.geveler@math.tu-dortmund.de
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationGPU A rchitectures Architectures Patrick Neill May
GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know
More informationAtos ARM solutions for HPC
Atos ARM solutions for HPC Eric Eppe Head of Solution Marketing & Portfolio HPC & Quantum Global Business Line Tuesday, March 7th, HPC User Forum, TERATEC Atos HPC and ARM A long time engagement 2012 2013
More informationExcellence in Electronics
Excellence in Electronics Distribution Logistics Programming Development Very Low Power Qseven Module by Peter Eckelmann, MSC Vertriebs GmbH Agenda Qseven Introduction Interfaces Mechanics and Cooling
More informationComputer Vision on Tegra K1. Chen Sagiv SagivTech Ltd.
Computer Vision on Tegra K1 Chen Sagiv SagivTech Ltd. Established in 2009 and headquartered in Israel Core domain expertise: GPU Computing and Computer Vision What we do: - Technology - Solutions - Projects
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationSATA Storage Duplicator Instruction on KC705 Rev Sep-13
SATA Storage Duplicator Instruction on KC705 Rev1.0 24-Sep-13 This document describes the step to run SATA Duplicator Demo for data duplication from one SATA disk to many SATA disk by using Design Gateway
More informationThe Era of Heterogeneous Computing
The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------
More informationTEGRA LINUX DRIVER PACKAGE R23.2
TEGRA LINUX DRIVER PACKAGE R23.2 RN_05071-R23 February 25, 2016 Advance Information Subject to Change Release Notes RN_05071-R23 TABLE OF CONTENTS 1.0 ABOUT THIS RELEASE... 3 1.1 What s New... 3 1.2 Login
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationARM in competition with x86 on COM solutions. ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104
ARM in competition with x86 on COM solutions ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104 On the way to a complete solution Services Software Platforms Silicon Operating
More informationEuropean energy efficient supercomputer project
http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All
More informationSpeed Up Your Codes Using GPU
Speed Up Your Codes Using GPU Wu Di and Yeo Khoon Seng (Department of Mechanical Engineering) The use of Graphics Processing Units (GPU) for rendering is well known, but their power for general parallel
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationBroadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence
TM Artificial Intelligence Server for Fraud Date: Q2 2017 Application: Artificial Intelligence Tags: Artificial intelligence, GPU, GTX 1080 TI HM Revenue & Customs The UK s tax, payments and customs authority
More informationIBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE
IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE Choosing IT infrastructure is a crucial decision, and the right choice will position your organization for success. IBM Power Systems provides an innovative platform
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationIt s Time for Mass Scale VDI Adoption
It s Time for Mass Scale VDI Adoption Cost-of-Performance Matters Doug Rainbolt, Alacritech Santa Clara, CA 1 Agenda Intro to Alacritech Business Motivation for VDI Adoption Constraint: Performance and
More information(Please refer "CPU Support List" for more information.)
109.95 EUR incl. 19% VAT, plus shipping Intel WiFi, Intel WiDi! Dual Gigabit LAN! Gigabyte Features! Intel Haswell! Supports 4 th Generation Intel Core processors Mini ITX Form Factor (17*17cm) GIGABYTE
More informationAdvanced CUDA Optimization 1. Introduction
Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines
More informationTEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich
TEGRA K1 AND THE AUTOMOTIVE INDUSTRY Gernot Ziegler, Timo Stich Previously: Tegra in Automotive Infotainment / Navigation Digital Instrument Cluster Passenger Entertainment TEGRA K1 with Kepler GPU GPU:
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationHigh Performance Computing
High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationExploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API
EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,
More informationGPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester
NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationUsing Graphics Chips for General Purpose Computation
White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1
More informationNSIGHT ECLIPSE EDITION
NSIGHT ECLIPSE EDITION DG-06450-001 _v7.0 March 2015 Getting Started Guide TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. About...1 Chapter 2. New and Noteworthy... 2 2.1. New in 7.0... 2 2.2. New
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationTEGRA LINUX DRIVER PACKAGE R24.1
TEGRA LINUX DRIVER PACKAGE R24.1 RN_05071-R24 June 15, 2016 Advance Information Subject to Change Release Notes RN_05071-R24 TABLE OF CONTENTS 1.0 ABOUT THIS RELEASE... 3 1.1 What s New... 3 1.2 Login
More informationExperiences Using Tegra K1 and X1 for Highly Energy Efficient Computing
Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationIndustry Collaboration and Innovation
Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance
More information