CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

Size: px
Start display at page:

Download "CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker"

Transcription

1 CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker

2 CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It s a platform for the next generation of HPC, leveraging commodity driven improvements from the most rapidly evolving compute markets. 2

3 The next revolution: Power Efficiency Look at the market for the next generation of HPC components Power-effective computing driven by phones and tablets ARM, with architectural and experience advantages System-level software complexity is high HPC driven by accelerated computing All major vendors have switched to accelerators GPUs have an architectural efficiency advantage Titan gets 90% of its performance from the accelerator 3

4 Possible Obvious Power-efficient Future Power-efficient general purpose cores combined with Compute Accelerators Power control shared with mobile products Ultra-focused on power efficiency Competition forces rapid improvement Technology evolution driven by commodity market Bulk of compute power provided by inherently efficient GPUs Increase to over 50% of chip power for flops. 4

5 NVIDIA has these elements GPU and Computing ARM SoCs 5

6 Why CUDA on ARM? Development platforms for future HPC systems Explore the efficiency and performance trade-offs Utilize existing hardware: construct systems with ARM CPUs combined with a discrete GPU 6

7 Current Generation: MXM Devkit SECO carrier board: SECO MXM Devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia MXM GPU module Quadro 1000m (GF108) on 4 lanes of PCIe 96 CUDA cores with 269 GFlops peak Carrier provides I/O connectors, power supplies PCIe connected 1Gbps Ethernet (i82574), USB, SATA 7

8 8

9 Current Generation Software ARM Linux distribution L4T r15.2 softfp, Ubuntu Linux kernel Cuda 4.2 toolkit and samples, driver x86 system support for cross development nvcc cross-compiler support 9

10 Introducing KAYLA Support of Kepler-class GPU SM35 adds dynamic parallelism and other features 2 SMX, 384 CUDA cores Comes in MXM and PCIe form factor Capability approaching Logan SoC Integrated solution will be more power-efficient 10

11 Next Generation: mitx Devkit Seco carrier board: Seco mini-itx GPU devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia PCIe GPU ATX power supply supports higher power GPUs Qualified for gf108, gk107, gk104, and Kayla GPU Carrier provides I/O connectors 11

12 Next Generation Hardware 12

13 Next Generation Software Arm Linux distribution Based on L4T R16.2 hardfp, Ubuntu Linux kernel Cuda 5.0 toolkit and samples, driver Increased parity with x86 Linux (nvcuvid, nvprof, thrust) x86 system support for cross development nvcc cross-compiler support nfs-kernel-server support to ease cross compilation Back ported to SECO MXM Devkit 13

14 CUDA on ARM Roadmap Software CUDA releases starting with CUDA 5.5 and 319.xy include ARM support Native ARM compiler cuda-gdb: native ARM and client-server Long term plans for CUDA on the ARM platform Logan, Tegra with integrated Kepler class GPU ARMv8 64-bit platform support, starting with Parker Enable other partners and industry support 14

15 Notes on Comparing Compute Efficiency Measuring power isn t always easy Multiple points to measure input power Multiple power rails and components Different peripherals and activity Active cooling and over-cooling are significant power draws Measuring application power draw adds to the challenge I/O and DRAM activity can be power-hungry Different phases have different power profiles A power-efficient system has widely varying power draw Turn off the lights when you leave the room Recent activity has a big influence on present power draw 15

16 Power, Performance, and Benchmarks Current Power Condition 0.46A No GPU installed, SATA disk 0.50A 9.12W Idle power, fan off 0.60A 10.9W Idle power, slow fan 0.66A +1.1W Idle with SATA disk 0.86A +3.65W GPU power state set to maximum performance 1.06A 19.3W Running smoke at 27FPS, average (23W peak) 2.05A 37.4W Running real-time raytracing (41.1W peak) 16

17 Demos Glass, galaxy, and Ocean live demos 17

18 Developer Information Information: Forums:

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It

More information

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT 7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

GPU Computing with NVIDIA s new Kepler Architecture

GPU Computing with NVIDIA s new Kepler Architecture GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,

More information

E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU

E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

A176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O

A176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS, and it consumes less than 17W at full load (8-10W at typical

More information

Apalis A New Architecture for Embedded Computing

Apalis A New Architecture for Embedded Computing Apalis A New Architecture for Embedded Computing Agenda The Hardware Abstraction Pyramid The System-on-Module (SoM) Why Should You Use a SoM? Discovering Apalis Motivations Architectural Overview Standard

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

A176 C clone. GPGPU Fanless Small FF RediBuilt Supercomputer. Aitech

A176 C clone. GPGPU Fanless Small FF RediBuilt Supercomputer.  Aitech The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS at a remarkable level of energy efficiency, providing all the

More information

Embedded Linux Conference San Diego 2016

Embedded Linux Conference San Diego 2016 Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory

More information

Building supercomputers from embedded technologies

Building supercomputers from embedded technologies http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

FTF Americas. FTF Brazil. freescale.com/ftf. Secure, Embedded Processing Solutions for the Internet of Tomorrow

FTF Americas. FTF Brazil. freescale.com/ftf. Secure, Embedded Processing Solutions for the Internet of Tomorrow Secure, Embedded Processing Solutions for the Internet of Tomorrow FTF Americas FTF Brazil June 22-25, 2015 September 15, 2015 JW Marriott Austin Grand Hyatt São Paulo Hotel TM freescale.com/ftf Freescale

More information

. SMARC 2.0 Compliant

. SMARC 2.0 Compliant MSC SM2S-IMX8 NXP i.mx8 ARM Cortex -A72/A53 Description The new MSC SM2S-IMX8 module offers a quantum leap in terms of computing and graphics performance. It integrates the currently most powerful i.mx8

More information

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: GTC 2013 March 18-21 San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: SPEAK - Showcase your work among the elite of graphics computing - Call

More information

April 4-7, 2016 Silicon Valley

April 4-7, 2016 Silicon Valley April 4-7, 2016 Silicon Valley TEGRA PLATFORMS GAMING DRONES ROBOTICS IVA AUTOMOTIVE 2 Compile Debug Profile Trace C/C++ NVTX NVIDIA Tools extension Getting Started CodeWorks JetPack Installers IDE Integration

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Building supercomputers from commodity embedded chips

Building supercomputers from commodity embedded chips http://www.montblanc-project.eu Building supercomputers from commodity embedded chips Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

INTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC)

INTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC) INTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC) Sida 1 Patrik Björklund-Director of sales Tritech Solutions WE ARE TRITECH Sida 2 Embedded products, solutions and engineering

More information

Hypervisors at Hyperscale

Hypervisors at Hyperscale Hypervisors at Hyperscale ARM, Xen, Servers and Evolution of the Data Center Larry Wikelius Co-Founder & VP Software 1 Overview l Market Dynamics l Technology Trends l Roadmaps Where are we today l Use

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

S CUDA on Xavier

S CUDA on Xavier S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000

More information

NSIGHT ECLIPSE EDITION

NSIGHT ECLIPSE EDITION NSIGHT ECLIPSE EDITION DG-06450-001 _v8.0 September 2016 Getting Started Guide TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. About...1 Chapter 2. New and Noteworthy... 2 2.1. New in 7.5... 2 2.2.

More information

Kontron Technology ARM based Embedded

Kontron Technology ARM based Embedded Kontron Technology ARM based Embedded Daniel Piper Senior Marketing manager July 2012 1 05.07.2012 KT Longevity SBC & Motherboard Presentation KT ARM strategy & Products - 2 Kontron s ARM Strategy overall

More information

Stan Posey, NVIDIA, Santa Clara, CA, USA

Stan Posey, NVIDIA, Santa Clara, CA, USA Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with

More information

Maximizing GPU Power for Vision and Depth Sensor Processing. From NVIDIA's Tegra K1 to GPUs on the Cloud. Chen Sagiv Eri Rubin SagivTech Ltd.

Maximizing GPU Power for Vision and Depth Sensor Processing. From NVIDIA's Tegra K1 to GPUs on the Cloud. Chen Sagiv Eri Rubin SagivTech Ltd. Maximizing GPU Power for Vision and Depth Sensor Processing From NVIDIA's Tegra K1 to GPUs on the Cloud Chen Sagiv Eri Rubin SagivTech Ltd. Today s Talk Mobile Revolution Mobile Cloud Concept 3D Imaging

More information

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid

More information

Kontron s ARM-based COM solutions and software services

Kontron s ARM-based COM solutions and software services Kontron s ARM-based COM solutions and software services Peter Müller Product Line Manager COMs Kontron Munich, 4 th July 2012 Kontron s ARM Strategy Why ARM COMs? How? new markets for mobile applications

More information

ARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG

ARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG ARM and x86 on Qseven & COM Express Mini Zeljko Loncaric, Marketing Engineer, congatec AG Content COM Computer-On-Module Concept Qseven Key Points The Right ARM Integration with Freescale i.mx6 Qseven

More information

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier

More information

GPUS FOR NGVLA. M Clark, April 2015

GPUS FOR NGVLA. M Clark, April 2015 S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40

More information

2 Port SuperSpeed Mini PCI Express USB 3.0 Adapter Card w/ Bracket Kit and UASP Support

2 Port SuperSpeed Mini PCI Express USB 3.0 Adapter Card w/ Bracket Kit and UASP Support 2 Port SuperSpeed Mini PCI Express USB 3.0 Adapter Card w/ Bracket Kit and UASP Support Product ID: MPEXUSB3S22B The MPEXUSB3S22B 2-Port Mini PCI Express USB 3.0 Card with Bracket Kit adds two external

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

Reducing Time-to-Market with i.mx6-based Qseven Modules

Reducing Time-to-Market with i.mx6-based Qseven Modules Reducing Time-to-Market with i.mx6-based Qseven Modules congatec Facts The preferred global vendor for innovative embedded solutions to enable competitive advantages for our customers. Founded December

More information

Matrix. Get Started Guide

Matrix. Get Started Guide Matrix Get Started Guide Overview Matrix is a single board mini computer based on ARM with a wide range of interface, equipped with a powerful i.mx6 Freescale processor, it can run Android, Linux and other

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Elaborazione dati real-time su architetture embedded many-core e FPGA

Elaborazione dati real-time su architetture embedded many-core e FPGA Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T

More information

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research Acronyms and Definition Check Point Term Definition NVMe Non-Volatile Memory Express NVMe-oF Non-Volatile Memory Express

More information

Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017

Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017 Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017 Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2 GPGPU Product

More information

Autonomous Driving Solutions

Autonomous Driving Solutions Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

High performance, multiple expansion Born for machine vision applications. ABOX-E7 Series

High performance, multiple expansion Born for machine vision applications. ABOX-E7 Series High performance, multiple expansion Born for machine vision applications ABOX-E7 Series High performance, Multiple I/Os 6/7 th Core i7/i5/i3 Desktop multi-core CPU processors, Max. expansion 6*GLAN, Expansion

More information

Arm Processor Technology Update and Roadmap

Arm Processor Technology Update and Roadmap Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture

More information

96Boards - TV Platform

96Boards - TV Platform 96Boards - TV Platform Presented by Mark Gregotski Developing the Specification Date BKK16-303 March 9, 2016 Event Linaro Connect BKK16 Overview Motivation for a TV Platform Specification Comparison with

More information

. Micro SD Card Socket. SMARC 2.0 Compliant

. Micro SD Card Socket. SMARC 2.0 Compliant MSC SM2S-IMX6 NXP i.mx6 ARM Cortex -A9 Description The design of the MSC SM2S-IMX6 module is based on NXP s i.mx 6 processors offering quad-, dual- and single-core ARM Cortex -A9 compute performance at

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025)

Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025) OVERVIEW Nvidia Quadro K5200 8GB two DVI-I two DisplayPort Graphics Card by ThinkStation (4X60G69025) The Nvidia Quadro K5200 8GB DVI-I, two DisplayPort Graphics Card by ThinkStation is based on Nvidia

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,

More information

The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration

The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright

More information

ASTRI/CTA data analysis on parallel and low-power platforms

ASTRI/CTA data analysis on parallel and low-power platforms ICT Workshop INAF, Cefalù 2015 Universidade de São Paulo Instituto de Astronomia, Geofisica e Ciencias Atmosferica ASTRI/CTA data analysis on parallel and low-power platforms Alberto Madonna, Michele Mastropietro

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects

Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects Markus Geveler, Stefan Turek, Dirk Ribbrock PACO Magdeburg 2015 / 7 / 7 markus.geveler@math.tu-dortmund.de

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

GPU A rchitectures Architectures Patrick Neill May

GPU A rchitectures Architectures Patrick Neill May GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know

More information

Atos ARM solutions for HPC

Atos ARM solutions for HPC Atos ARM solutions for HPC Eric Eppe Head of Solution Marketing & Portfolio HPC & Quantum Global Business Line Tuesday, March 7th, HPC User Forum, TERATEC Atos HPC and ARM A long time engagement 2012 2013

More information

Excellence in Electronics

Excellence in Electronics Excellence in Electronics Distribution Logistics Programming Development Very Low Power Qseven Module by Peter Eckelmann, MSC Vertriebs GmbH Agenda Qseven Introduction Interfaces Mechanics and Cooling

More information

Computer Vision on Tegra K1. Chen Sagiv SagivTech Ltd.

Computer Vision on Tegra K1. Chen Sagiv SagivTech Ltd. Computer Vision on Tegra K1 Chen Sagiv SagivTech Ltd. Established in 2009 and headquartered in Israel Core domain expertise: GPU Computing and Computer Vision What we do: - Technology - Solutions - Projects

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

SATA Storage Duplicator Instruction on KC705 Rev Sep-13

SATA Storage Duplicator Instruction on KC705 Rev Sep-13 SATA Storage Duplicator Instruction on KC705 Rev1.0 24-Sep-13 This document describes the step to run SATA Duplicator Demo for data duplication from one SATA disk to many SATA disk by using Design Gateway

More information

The Era of Heterogeneous Computing

The Era of Heterogeneous Computing The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------

More information

TEGRA LINUX DRIVER PACKAGE R23.2

TEGRA LINUX DRIVER PACKAGE R23.2 TEGRA LINUX DRIVER PACKAGE R23.2 RN_05071-R23 February 25, 2016 Advance Information Subject to Change Release Notes RN_05071-R23 TABLE OF CONTENTS 1.0 ABOUT THIS RELEASE... 3 1.1 What s New... 3 1.2 Login

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

ARM in competition with x86 on COM solutions. ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104

ARM in competition with x86 on COM solutions. ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104 ARM in competition with x86 on COM solutions ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104 On the way to a complete solution Services Software Platforms Silicon Operating

More information

European energy efficient supercomputer project

European energy efficient supercomputer project http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All

More information

Speed Up Your Codes Using GPU

Speed Up Your Codes Using GPU Speed Up Your Codes Using GPU Wu Di and Yeo Khoon Seng (Department of Mechanical Engineering) The use of Graphics Processing Units (GPU) for rendering is well known, but their power for general parallel

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence TM Artificial Intelligence Server for Fraud Date: Q2 2017 Application: Artificial Intelligence Tags: Artificial intelligence, GPU, GTX 1080 TI HM Revenue & Customs The UK s tax, payments and customs authority

More information

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE Choosing IT infrastructure is a crucial decision, and the right choice will position your organization for success. IBM Power Systems provides an innovative platform

More information

IBM Deep Learning Solutions

IBM Deep Learning Solutions IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle

More information

It s Time for Mass Scale VDI Adoption

It s Time for Mass Scale VDI Adoption It s Time for Mass Scale VDI Adoption Cost-of-Performance Matters Doug Rainbolt, Alacritech Santa Clara, CA 1 Agenda Intro to Alacritech Business Motivation for VDI Adoption Constraint: Performance and

More information

(Please refer "CPU Support List" for more information.)

(Please refer CPU Support List for more information.) 109.95 EUR incl. 19% VAT, plus shipping Intel WiFi, Intel WiDi! Dual Gigabit LAN! Gigabyte Features! Intel Haswell! Supports 4 th Generation Intel Core processors Mini ITX Form Factor (17*17cm) GIGABYTE

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich TEGRA K1 AND THE AUTOMOTIVE INDUSTRY Gernot Ziegler, Timo Stich Previously: Tegra in Automotive Infotainment / Navigation Digital Instrument Cluster Passenger Entertainment TEGRA K1 with Kepler GPU GPU:

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

High Performance Computing

High Performance Computing High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,

More information

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Using Graphics Chips for General Purpose Computation

Using Graphics Chips for General Purpose Computation White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1

More information

NSIGHT ECLIPSE EDITION

NSIGHT ECLIPSE EDITION NSIGHT ECLIPSE EDITION DG-06450-001 _v7.0 March 2015 Getting Started Guide TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. About...1 Chapter 2. New and Noteworthy... 2 2.1. New in 7.0... 2 2.2. New

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

Accelerating High Performance Computing.

Accelerating High Performance Computing. Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

TEGRA LINUX DRIVER PACKAGE R24.1

TEGRA LINUX DRIVER PACKAGE R24.1 TEGRA LINUX DRIVER PACKAGE R24.1 RN_05071-R24 June 15, 2016 Advance Information Subject to Change Release Notes RN_05071-R24 TABLE OF CONTENTS 1.0 ABOUT THIS RELEASE... 3 1.1 What s New... 3 1.2 Login

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

Industry Collaboration and Innovation

Industry Collaboration and Innovation Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance

More information