3D Graphics in Future Mobile Devices Steve Steele, ARM
Market Trends Mobile Computing Market Growth
Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and Tablets) Entry-level in 2015 >750 million devices 1400 1200 1000 800 >$400 $200- $350 Latest features Greatest specs Tailored to fit a given budget 250% growth since 2012; >10x since 2010 Nearly 4 times the 2013 Notebook PC forecasts 600 <$150 400 200 0 2010 2015 Entry Level Mid Range Premium Source: ARM and Gartner Estimates ARM provides targeted solutions to deliver the best features and specs at every price point
2010 2011 2012 2013 2014 2015 2016 2017 Volume in millions Driving Market Change 2000 Smart Mobile Device Shipments (Smartphones and Tablets) 80-100mm 2 Typical GPU 1800 1600 1400 1200 1000 >$400 $200- $350 50-80mm 2 ARM Mali -T760 GPU - Increased GPU & SoC energy efficiency - Increased performance scalability - Advanced memory system scalability - Reduced bandwidth consumption 800 600 25-40mm 2 400 <$150 200 0 Entry Level Mid Range Premium ARM Mali-T720 GPU - Increased graphics area efficiency - Optimized for Android - OpenGL ES 3.0 support* - Reduced cost & Time-to-Market Source: ARM and Gartner Estimates *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
ARM Mali-T600 Series GPU Overview Midgard Architecture - the foundation of ARM s GPU roadmap providing increased performance, flexibility and software compatibility Innovation driving 64bit GPU Compute, bandwidth saving technologies Scalability - an ARM Mali-T600 Series GPU to suit every application GPU Compute - 64-bit double precision, IEEE-754 compliant floating point Feature-rich - all popular OSs, multiple APIs including DirectX 11, next generation OpenGL ES, OpenCL Full Profile and RenderScript compute
Leading on Lowering System Power GPUs have a major impact on SoC architecture Area, memory bandwidth, energy and implementation ARM focuses on system wide power efficiency not just IP components Energy Saving Features in the Mali-T62x system: GPU efficiency 50% performance increase or less energy/frame in same area ASTC* 90% TE** TE 50% ARM POP 19% SAVING SAVING SAVING ARM POP IP Up to 27% higher frequency 27% lower area ARM Mali GPUs are leaders in balancing power, area and functionality *Adaptive Scalable Texture Compression (ASTC) **Transaction Elimination (TE)
Mali Momentum through 2013 The most widely licensed GPU 84 Mali licenses to date More than 10x Growth in volume in 2 years 152M units shipped in 2012, in more than 230 devices >300M units shipped YTD 2013 2013 has already seen shipments double from 2012 Strength in key market segments #1 Android GPU IP supplier >20% Android Smartphone #1 in Android tablets (>50%) #1 in Digital Smart TVs (>70%)
Premium Mobile Devices Energy efficient 3D Graphics
Mali High-end GPU Solutions HIGH-END GPU ROADMAP Mali-T604 First Midgard architecture product OpenGL ES 3.0 support Scalable to 4 cores Mali-T760 400% energy efficiency of Mali-T604 Scalable to 16 cores Major bandwidth reduction Mali-T678 Highest performance Mali-T600 GPU OpenGL ES 3.0 support Scalable to 8 cores Mali-T624 50% performance uplift OpenGL ES 3.0 support Scalable to 4 cores Mali-T628 50% performance uplift OpenGL ES 3.0 support Scalable to 8 cores Mali-T622 Small Full Profile GPU Compute solution 50% more energy efficient than Mali-T604 Designed for Graphics and GPU Compute Full Profile, 64-bit Compute Closer CPU-GPU links Efficient use of system resources Coherent memory links Cortex -A15 / Cortex-A53 / Cortex-A57 Protecting partner investments Common software platform reduces costs and TTM Multicore scales performance to address multiple form factors Advanced products get to market early ARM Mali-T628 GPU silicon shipping now in consumer products Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
GB/Sec Premium Mobile Device Requirements Increasing screen size and content complexity demand increased performance Increased performance demands advanced memory technologies to achieve greater bandwidth Mobile computing devices thermal budget doesn t increase with performance Rigid thermal constraints force the adoption of advanced silicon process and memory technologies to achieve greater performance 4.5 4 INCREASED THERMAL IMPACT 3.5 3 2.5 2 INCREASED RESOLUTION 1.5 1 0.5 0 HD FHD QXGA WQXGA QSXGA 4K UHD Source: ARM data
ARM Mali-T760 GPU Overview Increased performance and energy efficiency ~400% increase in energy efficiency over ARM Mali-T604 GPU Multi-processor scalable graphics performance Coherently scales up to 16 shader core configurations 3D graphics acceleration and Compute APIs Khronos* compliant OpenGL ES 3.0/2.0, 1.1 Microsoft Windows compliant Direct3D 11.1 Full Profile OpenCL 1.1 RenderScript/FilterScript Major reduction in bandwidth and SoC power Memory system optimizations ARM Frame Buffer Compression (AFBC) Smart Composition Proven software DDK quality and performance # s L2 Cache Size Clock Freq (MHz) Pixel Fillrate Triangle Rate Floating Point Performance Up to MP16 2 x 512kB 600MHz 9.6 Gpix/Sec 1066.6 MTri/s 326.4 GFLOPS *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
ARM Mali-T760 GPU Scalability Coherent shader core scalability using an improved L2 Cache interconnect Scalable up to 16 coherent shader cores Significant reduction in wire count Evenly distributed cache utilization Single or Dual L2 Cache slice each with individual master port Consistent ACE -Lite interface Coherent L2 Cache Interconnect L2 Cache L2 Cache
AFBC Application in SoC Employing AFBC throughout the SoC saves significant system bandwidth and power CPU GPU DISPLAY CONTROLLER SW AFBC ENCODE (TEXTURES) AFBC DECODE AFBC ENCODE AFBC DECODE Uncompressed textures Frame Buffer Objects (render to texture) AFBC Raw Compression Rates Source: ARM data
Smart Composition Better than 50% reduction of texture read bandwidth on simple Android UI use cases Significantly reducing read bandwidth and composition work by ignoring repetitive tile data
Additional Features Reduce system bandwidth when sending video output YCrCb framebuffer output Hardware assisted global illumination Improved Multiple Render Target Increased capabilities to support DirectX 11.1 HW super-sampling, TIR Continuing support for next generation APIs Direct3D 11.1 Feature Level 11 and next generation OpenGL ES ARM POP IP for Mali is available from the Physical IP Division ARM s EDA tool partners are working with our IP now to prepare for supporting our licensees 28HPM and 16FF process libraries
Entry-level Smart Mobile Devices Rapid Implementation of 3D Graphics
Mali Mid-range GPU Solutions MID-RANGE GPU ROADMAP Mali-300 Entry-level OpenGL ES 2.0 GPU Mali-450 MP Leading OpenGL ES 2.0 performance 2x Mali-400 MP performance Scalable to 8 cores Mali-T720 First OpenGL ES 3.0 GPU in Mid-Range Market Optimized for Android Reduced Time to Market Mali-400 MP First OpenGL ES 2.0 multi-core GPU Scalable to 4 cores Leading area-efficiency UTGARD ARCHITECTURE MIDGARD ARCHITECTURE Designed for cost-effective Graphics solutions OpenGL ES, OpenVG Closer CPU-GPU links Efficient use of system resources Cortex-A7 / Cortex-A12 / Cortex-A53 Protecting partner investments Common software platform reduces costs and TTM Multicore scales performance to address multiple form factors Proven solutions ARM Mali-400 MP and Mali-450 MP GPU silicon shipping in hundreds of millions of consumer products Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
Entry Level Smartphone Tomorrow ARM Mali-T720 GPU Extends lead in area and energy efficiency Supports OpenGL ES 3.0* Area-efficient performance GIC-400 Quad or Dual-core Cortex-A53/Cortex-A7 Most energy-efficient 32-bit CPU 64-bit architecture for new software Mali-V500 Mali-Display Non Coherent Devices Quad Cortex -A7/ Cortex-A53 ACE I/O Coherent Devices Mali-T720 GPU ACE-Lite ADB-400 MMU-500 MMU-500 ADB-400 MMU-500 AXI4 NIC-400 AXI4 ACE ACE ACE-Lite + DVM Link CCI-400r1 ACE-Lite + DVM ACE-Lite + DVM ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite ACE-Lite PHY DMC-400/3 rd Party DMC PHY AXI4 NIC-400 Configurable: AXI4/AXI3/AHB/APB Single Channel DDR2 @533 MHz DDR3/2 LPDDR2/3 DDR3/2 LPDDR2/3 Other Slaves Other Slaves *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
Design Costs ($M) Break even Unit Volume (M Units) Time to Market Defines Profit Margin Time-to-market becomes the key differentiator which protects profit margins Engineering costs grow as process nodes advance ASPs sharply reduced as competition appears in low-cost markets $140 Basic SoC Silicon and Software Design Costs 25.0 Unit volume (M) to break even for Basic SoC Designs @28nm $120 20.0 $100 $80 SW Design Costs 15.0 $60 SoC Design Costs 10.0 $40 $20 5.0 $0 90nm 65nm 45nm 32nm 28nm 22nm 14nm - Source: Semico Research, 2011 0.0 $5.00 $10.00 $15.00 $20.00 $25.00 $30.00 Average Selling Price ($)
ARM Mali-T720 GPU Overview Increased energy efficiency of previous cost optimized ARM Mali GPU ~ 150% energy efficiency increase over previous cost optimized GPUs Android optimized version of ARM Mali-T62x ~30% area reduction from previous Midgard generation Up to 15% less dynamic power Natural companion to quad core implementations of: Cortex-A7 / Cortex-A12 / Cortex-A53 Targeted optimizations for Android OS OpenGL ES 3.0, Renderscript/FilterScript support Maintain the area efficiency leadership of ARM Mali-4xx GPU Area efficient OpenGL ES 3.0* GPU Significantly decreased time-to-market Ease of implementation & integration in SoC designs # s L2 Cache Size Clock Freq (MHz) Pixel Fillrate Triangle Rate Floating Point Performance Up to MP8 2x128kB 600MHz 4.8 Gpix/Sec 533.2 Mtri/s 81.6 GFLOPS *Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance
Decreased Time-to-Market Increased routing density significantly decreases die area Constrained routing layers to minimize cost Opportunities for optimizing layout utilization Crafted reference methodologies to match tool chain ARM POP IP available from the ARM Physical IP Division ARM s EDA tool partners are working with our IP now to prepare for supporting our licensees on 28nm process libraries
ARM POP IP Exploration Worksheet Extensive exploration analysis to map the wide range of Vt/L options on a process to evaluate multiple PPA trade-offs. Determine most effective Vt/L choice for optimizing different design goals. The dominant effects of Vt/L choice can be easily seen in this post-synthesis dataset. Use datasheets to estimate GPU PPA
Extensive Mali Ecosystem
Summary ARM Mali-T760 GPU addresses the needs of high-end mobile and consumer devices by providing: Scalable graphics and compute performance with up to 16 core configurations capable of twice the performance of previous generations ~400% increase in energy efficiency over ARM Mali-T604 GPU Included technology capable of greatly reducing SoC level memory bandwidth utilization and power consumption ARM Mali-T720 GPU solve the needs of semiconductor partners focusing on low-cost mobile devices by providing: Significant reduction in shader core area and increased area efficiency Greater facilities to assist licensees to drastically reduce time-to-market Targeted optimizations for Android OS - OpenGL ES 3.0, RenderScript/FilterScript support
Steve Steele Senior Product Manager steve.steele@arm.com