CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

Size: px

Start display at page:

Download "CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker"

Phillip Lee
5 years ago
Views:

1 CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker

2 CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It s a platform for the next generation of HPC, leveraging commodity driven improvements from the most rapidly evolving compute markets. 2

3 The next revolution: Power Efficiency Look at the market for the next generation of HPC components Power-effective computing driven by phones and tablets ARM, with architectural and experience advantages System-level software complexity is high HPC driven by accelerated computing All major vendors have switched to accelerators GPUs have an architectural efficiency advantage Titan gets 90% of its performance from the accelerator 3

4 Possible Obvious Power-efficient Future Power-efficient general purpose cores combined with Compute Accelerators Power control shared with mobile products Ultra-focused on power efficiency Competition forces rapid improvement Technology evolution driven by commodity market Bulk of compute power provided by inherently efficient GPUs Increase to over 50% of chip power for flops. 4

5 NVIDIA has these elements GPU and Computing ARM SoCs 5

6 Why CUDA on ARM? Development platforms for future HPC systems Explore the efficiency and performance trade-offs Utilize existing hardware: construct systems with ARM CPUs combined with a discrete GPU 6

7 Current Generation: MXM Devkit SECO carrier board: SECO MXM Devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia MXM GPU module Quadro 1000m (GF108) on 4 lanes of PCIe 96 CUDA cores with 269 GFlops peak Carrier provides I/O connectors, power supplies PCIe connected 1Gbps Ethernet (i82574), USB, SATA 7

8 8

9 Current Generation Software ARM Linux distribution L4T r15.2 softfp, Ubuntu Linux kernel Cuda 4.2 toolkit and samples, driver x86 system support for cross development nvcc cross-compiler support 9

10 Introducing KAYLA Support of Kepler-class GPU SM35 adds dynamic parallelism and other features 2 SMX, 384 CUDA cores Comes in MXM and PCIe form factor Capability approaching Logan SoC Integrated solution will be more power-efficient 10

11 Next Generation: mitx Devkit Seco carrier board: Seco mini-itx GPU devkit NVidia Tegra 3 CPU on Q7 module 4 arm A9 cores, NEON and VFPv3 2GB DRAM, and 4-8GB embedded flash NVidia PCIe GPU ATX power supply supports higher power GPUs Qualified for gf108, gk107, gk104, and Kayla GPU Carrier provides I/O connectors 11

12 Next Generation Hardware 12

13 Next Generation Software Arm Linux distribution Based on L4T R16.2 hardfp, Ubuntu Linux kernel Cuda 5.0 toolkit and samples, driver Increased parity with x86 Linux (nvcuvid, nvprof, thrust) x86 system support for cross development nvcc cross-compiler support nfs-kernel-server support to ease cross compilation Back ported to SECO MXM Devkit 13

14 CUDA on ARM Roadmap Software CUDA releases starting with CUDA 5.5 and 319.xy include ARM support Native ARM compiler cuda-gdb: native ARM and client-server Long term plans for CUDA on the ARM platform Logan, Tegra with integrated Kepler class GPU ARMv8 64-bit platform support, starting with Parker Enable other partners and industry support 14

15 Notes on Comparing Compute Efficiency Measuring power isn t always easy Multiple points to measure input power Multiple power rails and components Different peripherals and activity Active cooling and over-cooling are significant power draws Measuring application power draw adds to the challenge I/O and DRAM activity can be power-hungry Different phases have different power profiles A power-efficient system has widely varying power draw Turn off the lights when you leave the room Recent activity has a big influence on present power draw 15

16 Power, Performance, and Benchmarks Current Power Condition 0.46A No GPU installed, SATA disk 0.50A 9.12W Idle power, fan off 0.60A 10.9W Idle power, slow fan 0.66A +1.1W Idle with SATA disk 0.86A +3.65W GPU power state set to maximum performance 1.06A 19.3W Running smoke at 27FPS, average (23W peak) 2.05A 37.4W Running real-time raytracing (41.1W peak) 16

17 Demos Glass, galaxy, and Ocean live demos 17

18 Developer Information Information: Forums:

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It