3D Graphics in Future Mobile Devices. Steve Steele, ARM

Similar documents
Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Developing the Bifrost GPU architecture for mainstream graphics

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Bifrost - The GPU architecture for next five billion

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Multimedia in Mobile Phones. Architectures and Trends Lund

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

ARM Mali -400 MP. The Scalable Multicore Graphics Processing Unit. Under embargo until June 2 nd, 2008

Each Milliwatt Matters

Building blocks for 64-bit Systems Development of System IP in ARM

Building Ultra-Low Power Wearable SoCs

ARM Multimedia IP: working together to drive down system power and bandwidth

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

POWERVR MBX & SGX OpenVG Support and Resources

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Driving 4K High-Resolution Embedded Displays in New Applications with MIPI DSI and VESA DSC (Synopsys & Arm)

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Next Generation Visual Computing

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

ARM Technology Forum. Multimedia Technology From ARM September Dennis Laudick VP Partner Marketing

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Next Generation Enterprise Solutions from ARM

Mobile & IoT Market Trends and Memory Requirements

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

SIGGRAPH Briefing August 2014

Samsung System LSI Business

Mobile & IoT Market Trends and Memory Requirements

Effective System Design with ARM System IP

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Ultra Low Power GPUs for Wearables

Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

ARM big.little Technology Unleashed An Improved User Experience Delivered

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015

Mobile & IoT Market Trends and Memory Requirements

Profiling and Debugging Games on Mobile Platforms

Kevin Donnelly, General Manager, Memory and Interface Division

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

The Benefits of GPU Compute on ARM Mali GPUs

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Graphics, Mobile Computing, APIs and Life

The Handheld Graphics. Market. Size, needs, and opportunities. Jon Peddie Research

Growth outside Cell Phone Applications

Falanx Microsystems. Company Overview

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

THE LEADER IN VISUAL COMPUTING

CMP Conference 20 th January Director of Business Development EMEA

AR Standards Update Austin, March 2012

Introduction to OpenGL ES 3.0

IoT, Wearable, Networking and Automotive Markets Driving External Memory Innovation Jim Cooke, Sr. Ecosystem Enabling Manager, Embedded Business Unit

1. Data plane blocks can be optimized for different applications. 2. The IP blocks can be reused and the design complexity decreases.

The ARM Cortex-A9 Processors

Dave Shreiner, ARM March 2009

Overview. Technology Details. D/AVE NX Preliminary Product Brief

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

ARM the Company ARM the Research Collaborator

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

System-on-Chip. Outline. Example: iphone 3GS disassembled. System-on-Chip is Everywhere! SoC Challenges. SoC Challenges and Current Solutions

Dr. Ajoy Bose. SoC Realization Building a Bridge to New Markets and Renewed Growth. Chairman, President & CEO Atrenta Inc.

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Developed Hybrid Memory System for New SoC. -Why choose Wide I/O?

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Mobile AR Hardware Futures

All Programmable: from Silicon to System

ARM instruction sets and CPUs for wide-ranging applications

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Toward a Memory-centric Architecture

So you think developing an SoC needs to be complex or expensive? Think again

Achieving Console Quality Games on Mobile

Mali-G72: Enabling tomorrow s technology today

Applications and Implementations

A backward glance and a forward view

STM32MP1 Microprocessor Continuing the STM32 Success Story. Press Presentation

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.

Bringing the benefits of Cortex-M processors to FPGA

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

European energy efficient supercomputer project

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1.

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

PowerVR Hardware. Architecture Overview for Developers

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Reducing Time-to-Market with i.mx6-based Qseven Modules

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST

Copyright Khronos Group Page 1

Arm Processor Technology Update and Roadmap

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Achieves excellent performance of 1,920 MIPS and a single-chip solution for nextgeneration car information systems

Designing with NXP i.mx8m SoC

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

Higher compression efficiency, exceptional image quality, faster encoding time and lower costs

Transcription:

3D Graphics in Future Mobile Devices Steve Steele, ARM

Market Trends Mobile Computing Market Growth

Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and Tablets) Entry-level in 2015 >750 million devices 1400 1200 1000 800 >$400 $200- $350 Latest features Greatest specs Tailored to fit a given budget 250% growth since 2012; >10x since 2010 Nearly 4 times the 2013 Notebook PC forecasts 600 <$150 400 200 0 2010 2015 Entry Level Mid Range Premium Source: ARM and Gartner Estimates ARM provides targeted solutions to deliver the best features and specs at every price point

2010 2011 2012 2013 2014 2015 2016 2017 Volume in millions Driving Market Change 2000 Smart Mobile Device Shipments (Smartphones and Tablets) 80-100mm 2 Typical GPU 1800 1600 1400 1200 1000 >$400 $200- $350 50-80mm 2 ARM Mali -T760 GPU - Increased GPU & SoC energy efficiency - Increased performance scalability - Advanced memory system scalability - Reduced bandwidth consumption 800 600 25-40mm 2 400 <$150 200 0 Entry Level Mid Range Premium ARM Mali-T720 GPU - Increased graphics area efficiency - Optimized for Android - OpenGL ES 3.0 support* - Reduced cost & Time-to-Market Source: ARM and Gartner Estimates *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

ARM Mali-T600 Series GPU Overview Midgard Architecture - the foundation of ARM s GPU roadmap providing increased performance, flexibility and software compatibility Innovation driving 64bit GPU Compute, bandwidth saving technologies Scalability - an ARM Mali-T600 Series GPU to suit every application GPU Compute - 64-bit double precision, IEEE-754 compliant floating point Feature-rich - all popular OSs, multiple APIs including DirectX 11, next generation OpenGL ES, OpenCL Full Profile and RenderScript compute

Leading on Lowering System Power GPUs have a major impact on SoC architecture Area, memory bandwidth, energy and implementation ARM focuses on system wide power efficiency not just IP components Energy Saving Features in the Mali-T62x system: GPU efficiency 50% performance increase or less energy/frame in same area ASTC* 90% TE** TE 50% ARM POP 19% SAVING SAVING SAVING ARM POP IP Up to 27% higher frequency 27% lower area ARM Mali GPUs are leaders in balancing power, area and functionality *Adaptive Scalable Texture Compression (ASTC) **Transaction Elimination (TE)

Mali Momentum through 2013 The most widely licensed GPU 84 Mali licenses to date More than 10x Growth in volume in 2 years 152M units shipped in 2012, in more than 230 devices >300M units shipped YTD 2013 2013 has already seen shipments double from 2012 Strength in key market segments #1 Android GPU IP supplier >20% Android Smartphone #1 in Android tablets (>50%) #1 in Digital Smart TVs (>70%)

Premium Mobile Devices Energy efficient 3D Graphics

Mali High-end GPU Solutions HIGH-END GPU ROADMAP Mali-T604 First Midgard architecture product OpenGL ES 3.0 support Scalable to 4 cores Mali-T760 400% energy efficiency of Mali-T604 Scalable to 16 cores Major bandwidth reduction Mali-T678 Highest performance Mali-T600 GPU OpenGL ES 3.0 support Scalable to 8 cores Mali-T624 50% performance uplift OpenGL ES 3.0 support Scalable to 4 cores Mali-T628 50% performance uplift OpenGL ES 3.0 support Scalable to 8 cores Mali-T622 Small Full Profile GPU Compute solution 50% more energy efficient than Mali-T604 Designed for Graphics and GPU Compute Full Profile, 64-bit Compute Closer CPU-GPU links Efficient use of system resources Coherent memory links Cortex -A15 / Cortex-A53 / Cortex-A57 Protecting partner investments Common software platform reduces costs and TTM Multicore scales performance to address multiple form factors Advanced products get to market early ARM Mali-T628 GPU silicon shipping now in consumer products Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

GB/Sec Premium Mobile Device Requirements Increasing screen size and content complexity demand increased performance Increased performance demands advanced memory technologies to achieve greater bandwidth Mobile computing devices thermal budget doesn t increase with performance Rigid thermal constraints force the adoption of advanced silicon process and memory technologies to achieve greater performance 4.5 4 INCREASED THERMAL IMPACT 3.5 3 2.5 2 INCREASED RESOLUTION 1.5 1 0.5 0 HD FHD QXGA WQXGA QSXGA 4K UHD Source: ARM data

ARM Mali-T760 GPU Overview Increased performance and energy efficiency ~400% increase in energy efficiency over ARM Mali-T604 GPU Multi-processor scalable graphics performance Coherently scales up to 16 shader core configurations 3D graphics acceleration and Compute APIs Khronos* compliant OpenGL ES 3.0/2.0, 1.1 Microsoft Windows compliant Direct3D 11.1 Full Profile OpenCL 1.1 RenderScript/FilterScript Major reduction in bandwidth and SoC power Memory system optimizations ARM Frame Buffer Compression (AFBC) Smart Composition Proven software DDK quality and performance # s L2 Cache Size Clock Freq (MHz) Pixel Fillrate Triangle Rate Floating Point Performance Up to MP16 2 x 512kB 600MHz 9.6 Gpix/Sec 1066.6 MTri/s 326.4 GFLOPS *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

ARM Mali-T760 GPU Scalability Coherent shader core scalability using an improved L2 Cache interconnect Scalable up to 16 coherent shader cores Significant reduction in wire count Evenly distributed cache utilization Single or Dual L2 Cache slice each with individual master port Consistent ACE -Lite interface Coherent L2 Cache Interconnect L2 Cache L2 Cache

AFBC Application in SoC Employing AFBC throughout the SoC saves significant system bandwidth and power CPU GPU DISPLAY CONTROLLER SW AFBC ENCODE (TEXTURES) AFBC DECODE AFBC ENCODE AFBC DECODE Uncompressed textures Frame Buffer Objects (render to texture) AFBC Raw Compression Rates Source: ARM data

Smart Composition Better than 50% reduction of texture read bandwidth on simple Android UI use cases Significantly reducing read bandwidth and composition work by ignoring repetitive tile data

Additional Features Reduce system bandwidth when sending video output YCrCb framebuffer output Hardware assisted global illumination Improved Multiple Render Target Increased capabilities to support DirectX 11.1 HW super-sampling, TIR Continuing support for next generation APIs Direct3D 11.1 Feature Level 11 and next generation OpenGL ES ARM POP IP for Mali is available from the Physical IP Division ARM s EDA tool partners are working with our IP now to prepare for supporting our licensees 28HPM and 16FF process libraries

Entry-level Smart Mobile Devices Rapid Implementation of 3D Graphics

Mali Mid-range GPU Solutions MID-RANGE GPU ROADMAP Mali-300 Entry-level OpenGL ES 2.0 GPU Mali-450 MP Leading OpenGL ES 2.0 performance 2x Mali-400 MP performance Scalable to 8 cores Mali-T720 First OpenGL ES 3.0 GPU in Mid-Range Market Optimized for Android Reduced Time to Market Mali-400 MP First OpenGL ES 2.0 multi-core GPU Scalable to 4 cores Leading area-efficiency UTGARD ARCHITECTURE MIDGARD ARCHITECTURE Designed for cost-effective Graphics solutions OpenGL ES, OpenVG Closer CPU-GPU links Efficient use of system resources Cortex-A7 / Cortex-A12 / Cortex-A53 Protecting partner investments Common software platform reduces costs and TTM Multicore scales performance to address multiple form factors Proven solutions ARM Mali-400 MP and Mali-450 MP GPU silicon shipping in hundreds of millions of consumer products Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

Entry Level Smartphone Tomorrow ARM Mali-T720 GPU Extends lead in area and energy efficiency Supports OpenGL ES 3.0* Area-efficient performance GIC-400 Quad or Dual-core Cortex-A53/Cortex-A7 Most energy-efficient 32-bit CPU 64-bit architecture for new software Mali-V500 Mali-Display Non Coherent Devices Quad Cortex -A7/ Cortex-A53 ACE I/O Coherent Devices Mali-T720 GPU ACE-Lite ADB-400 MMU-500 MMU-500 ADB-400 MMU-500 AXI4 NIC-400 AXI4 ACE ACE ACE-Lite + DVM Link CCI-400r1 ACE-Lite + DVM ACE-Lite + DVM ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite PHY DMC-400/3 rd Party DMC PHY AXI4 NIC-400 Configurable: AXI4/AXI3/AHB/APB Single Channel DDR2 @533 MHz DDR3/2 LPDDR2/3 DDR3/2 LPDDR2/3 Other Slaves Other Slaves *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

Design Costs ($M) Break even Unit Volume (M Units) Time to Market Defines Profit Margin Time-to-market becomes the key differentiator which protects profit margins Engineering costs grow as process nodes advance ASPs sharply reduced as competition appears in low-cost markets $140 Basic SoC Silicon and Software Design Costs 25.0 Unit volume (M) to break even for Basic SoC Designs @28nm $120 20.0 $100 $80 SW Design Costs 15.0 $60 SoC Design Costs 10.0 $40 $20 5.0 $0 90nm 65nm 45nm 32nm 28nm 22nm 14nm - Source: Semico Research, 2011 0.0 $5.00 $10.00 $15.00 $20.00 $25.00 $30.00 Average Selling Price ($)

ARM Mali-T720 GPU Overview Increased energy efficiency of previous cost optimized ARM Mali GPU ~ 150% energy efficiency increase over previous cost optimized GPUs Android optimized version of ARM Mali-T62x ~30% area reduction from previous Midgard generation Up to 15% less dynamic power Natural companion to quad core implementations of: Cortex-A7 / Cortex-A12 / Cortex-A53 Targeted optimizations for Android OS OpenGL ES 3.0, Renderscript/FilterScript support Maintain the area efficiency leadership of ARM Mali-4xx GPU Area efficient OpenGL ES 3.0* GPU Significantly decreased time-to-market Ease of implementation & integration in SoC designs # s L2 Cache Size Clock Freq (MHz) Pixel Fillrate Triangle Rate Floating Point Performance Up to MP8 2x128kB 600MHz 4.8 Gpix/Sec 533.2 Mtri/s 81.6 GFLOPS *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance

Decreased Time-to-Market Increased routing density significantly decreases die area Constrained routing layers to minimize cost Opportunities for optimizing layout utilization Crafted reference methodologies to match tool chain ARM POP IP available from the ARM Physical IP Division ARM s EDA tool partners are working with our IP now to prepare for supporting our licensees on 28nm process libraries

ARM POP IP Exploration Worksheet Extensive exploration analysis to map the wide range of Vt/L options on a process to evaluate multiple PPA trade-offs. Determine most effective Vt/L choice for optimizing different design goals. The dominant effects of Vt/L choice can be easily seen in this post-synthesis dataset. Use datasheets to estimate GPU PPA

Extensive Mali Ecosystem

Summary ARM Mali-T760 GPU addresses the needs of high-end mobile and consumer devices by providing: Scalable graphics and compute performance with up to 16 core configurations capable of twice the performance of previous generations ~400% increase in energy efficiency over ARM Mali-T604 GPU Included technology capable of greatly reducing SoC level memory bandwidth utilization and power consumption ARM Mali-T720 GPU solve the needs of semiconductor partners focusing on low-cost mobile devices by providing: Significant reduction in shader core area and increased area efficiency Greater facilities to assist licensees to drastically reduce time-to-market Targeted optimizations for Android OS - OpenGL ES 3.0, RenderScript/FilterScript support

Steve Steele Senior Product Manager steve.steele@arm.com