Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Similar documents
Next Generation Visual Computing

3D Graphics in Future Mobile Devices. Steve Steele, ARM

ARM big.little Technology Unleashed An Improved User Experience Delivered

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

The Benefits of GPU Compute on ARM Mali GPUs

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Building blocks for 64-bit Systems Development of System IP in ARM

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Each Milliwatt Matters

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

ARM instruction sets and CPUs for wide-ranging applications

Bifrost - The GPU architecture for next five billion

Many-core back to the future. Matt Horsnell ARM Research and Development

Building Ultra-Low Power Wearable SoCs

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

The Bifrost GPU architecture and the ARM Mali-G71 GPU

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Welcome. Altera Technology Roadshow 2013

A Study on C-group controlled big.little Architecture

Mapping applications into MPSoC

Profiling and Debugging Games on Mobile Platforms

Growth outside Cell Phone Applications

Altera SDK for OpenCL

Enable AI on Mobile Devices

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

UEFI ARM Update. UEFI PlugFest March 18-22, 2013 Andrew N. Sloss (ARM, Inc.) presented by

Next Generation Enterprise Solutions from ARM

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

Beyond Hardware IP An overview of Arm development solutions

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Developing the Bifrost GPU architecture for mainstream graphics

«Real Time Embedded systems» Multi Masters Systems

ARM Mobile GPU Compute Accelerates UX Differentiation

Effective System Design with ARM System IP

Multimedia in Mobile Phones. Architectures and Trends Lund

The ARM Cortex-A9 Processors

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

F28HS Hardware-Software Interface: Systems Programming

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Graphics, Mobile Computing, APIs and Life

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

ECE 571 Advanced Microprocessor-Based Design Lecture 22

European energy efficient supercomputer project

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

CCIX: a new coherent multichip interconnect for accelerated use cases

The Mont-Blanc approach towards Exascale

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

A backward glance and a forward view

Utilization-based Power Modeling of Modern Mobile Application Processor

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Cut Power Consumption by 5x Without Losing Performance

ARMv8-A CPU Architecture Overview

ARM Mali -400 MP. The Scalable Multicore Graphics Processing Unit. Under embargo until June 2 nd, 2008

Samsung System LSI Business

ARM the Company ARM the Research Collaborator

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

KeyStone II. CorePac Overview

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

ARMv8-A Software Development

Advanced IP solutions enabling the autonomous driving revolution

Heterogeneous Architecture. Luca Benini

Arm s Latest CPU for Laptop-Class Performance

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Building supercomputers from embedded technologies

ARM processors driving automotive innovation

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Formal for Everyone Challenges in Achievable Multicore Design and Verification. FMCAD 25 Oct 2012 Daryl Stewart

Embedded Systems: Projects

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Mali GPU acceleration of HEVC and VP9 Decoder

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

Position Paper: OpenMP scheduling on ARM big.little architecture

Artificial Intelligence Enriched User Experience with ARM Technologies

CMP Conference 20 th January Director of Business Development EMEA

CMP Conference 25 th January 2012 Research - Education. Director of Business Development EMEAI

Dongjun Shin Samsung Electronics

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Technologies Toward Intelligent Mobile and Connecting Devices. Ivan H.P. Lin, Sr Segment Marketing Manager

Roadmap Directions for the RISC-V Architecture

WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

POWERVR MBX & SGX OpenVG Support and Resources

ARM Vision for Thermal Management and Energy Aware Scheduling on Linux

Mobile & IoT Market Trends and Memory Requirements

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Leveraging the Benefits of Symmetric Multiprocessing (SMP) in Mobile Devices

New ARMv8-R technology for real-time control in safetyrelated

Transcription:

Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

The ARM Business Model Global leader in the development of semiconductor IP R&D outsourcing for semiconductor companies Innovative business model yields high margins Upfront license fee flexible licensing models Ongoing royalties typically based on a percentage of chip price Technology reused across multiple applications Long-term, secular growth markets Approximately 850 licenses Grows by 60-90 every year More than 250 potential royalty payers >8bn ARM-based chips in 11 >25% CAGR over last 5 years

ARM Technology Advanced consumer products are incorporating more and more ARM technology from processor and multimedia IP to software Processor IP Design of the brain of the chip Physical IP Design of the building blocks of the chip Software development tools

ARM Mali GPU Momentum #1 graphics in Android tablets (>50% market share) #1 in Smart TVs (>70% market share) >20% Android 4 smartphones Mali GPU shipments outpacing industry growth rate & gaining market share

Comprehensive GPU Compute Support ARM s best-in-class CPU know-how combined with expertise in graphics technology enabling complex use-cases Computational photography: Panorama stitching Image recognition: Face, smile, landmark, context Image improvement, stabilization, editing, filtering By moving GPU Compute tasks onto the GPU will enable lower power consumption and faster response over being solely run on the CPU

Mali GPU Compute: No FUD... Facts Passed Khronos Conformance OpenCL 1.1 Full Profile on Linux and Android Proven in Silicon Samsung Exynos 5 Dual, implements Full Profile OpenCL and Renderscript DDK available now Mali-T604 shipping in real products Google Chromebook Google Nexus 10 InSignal Arndale Community Board API exposed for developers OpenCL on Linux for Arndale platform Renderscript computation on Android for Nexus 10

Compute Use Case Example ARM Seemore demo OpenCL 1.1 FP accelerated world Interactive items and lights Bullet physics broad-phase fully OpenCL accelerated on GPU Performance boost GPU Kernel speedup >10x But system speedup is less ARM integration goal Take the system cost out!

Integration: Coherency SoCs are heterogenous systems But sharing data can still be costly CoreLink Cortex AMBA Mali Cache flushes, locks, syncs reduces the heterogeneous benefit HW coherency makes sharing data cheap and automatic ARM is in leading position with full technology coverage Cortex CPUs Mali GPUs CoreLink system IP AMBA bus protocols

Integration: Address Space Alignment The 32-bit address space is running out, even in mobile Midgard architecture built for full 64-bit addresses Embedded distributed Mali MMU for VA to PA/IPA translation Mali-T604: 48-bit VA and 40-bit PA/IPA Uses ARMv7 LPAE page table format, just like Cortex-A15 & Cortex-A7 Multiple simultaneous address spaces supported Mali GPUs run many threads in parallel Independent processes may execute on GPU simultaneously Seamless process transitions ensures maximum utilization/efficiency

big LITTLE Fine-Tuned to Different Performance Points Most energy-efficient applications processor from ARM Simple, in-order, 8 stage pipelines Performance better than mainstream, high-volume smartphones (Cortex-A8 and Cortex-A9) Cortex-A7 Cortex-A53 Highest performance in mobile power envelope Complex, out-of-order, multi-issue pipelines Up to 2x the performance of today s high-end smartphones Q u e u e I s s u e I n t e g e r Cortex-A15 Cortex-A57

ARM System Scalability Introducing CCI-400 Cache Coherent Interconnect Processor to Processor Coherency and I/O coherency Memory and synchronization barriers Virtualization support with distributed virtual memory signaling Quad Cortex-A15 MPCore A15 A15 A15 A15 Quad Cortex-A7 MPCore A7 A7 A7 A7 Mali-T624 GPU Core GPU Core GPU Core GPU Core Processor Coherency (SCU) Up to 4MB L2 cache Processor Coherency (SCU) Up to 4MB L2 cache Mali L2 Cache 128-bit AMBA 4 128-bit AMBA 4 MMU-400 CoreLink CCI-400 Cache Coherent Interconnect

Low to Medium Intensity Use Cases 250MHz aggregate 1200000 1000000 800000 500000 200000 STANDBYWFI CPU OFF Cluster OFF 180MHz aggregate 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CPU0 CPU1 CPU0 Audio (MP3) Angry Birds Camcorder 720p Video GT Racer DVFS profiles from leading Dual Cortex-A9 Smartphone Demonstrates that many common applications require only low to moderate processing power All of these use cases will run predominantly on the LITTLE cores The small % peaks to high MHz don t necessarily require migration to big cores CPU1 The short term DVFS system response to any increase in average load is to go to the highest MHz CPU0 CPU1 CPU0 90MHz aggregate CPU1 500MHz aggregate CPU0 CPU1

big and LITTLE CPU Performance Cortex-A9 powers high end mobile devices today Cortex-A7 delivers comparable performance at lower power and area Cortex-A15 delivers significantly higher performance 3 2.5 Cortex-A9 1.2GHz 2 Cortex-A7 1GHz actual 1.5 1 0.5 0 Quadrant - CPU Caffeinemark 3.0 BrowserMark V8 Benchmark Kraken Geomean Note: Cortex-A15 and Cortex-A7 results are from a test platform, with lower memory performance than production systems will deliver Cortex-A7 1.2GHz est. Cortex-A15 1.2GHz Cortex-A15 1.6GHz est.

Summary Getting the maximum efficiency out of modern SoCs is highly complex Interactions between many sub-system to optimize Requires new innovations and technology focus ARM Cortex-A15 / coherent Mali / big.little enable highest performance and scalability from mobile through to console class gaming. ARM continues to drive the development for better system integrations Cortex CPUs, Mali GPUs and CoreLink fabric leading the way Future v8 AArch64 with multi-cluster for next-generation gaming Thank you