The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Similar documents
HyperTransport Technology

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

Maximizing Six-Core AMD Opteron Processor Performance with RHEL


AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

Quad-core Press Briefing First Quarter Update

POWER DELIVERY CHALLENGES FOR NEXT GENERATION HIGH PERFORMANCE SOCS

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

Anatomy of AMD s TeraScale Graphics Engine

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Understanding GPGPU Vector Register File Usage

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

Crusoe Processor Model TM5800

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

viewdle! - machine vision experts

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

SIMULATOR AMD RESEARCH JUNE 14, 2015

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

AMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures

Six-Core AMD Opteron Processor

Family 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Family 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

EPYC VIDEO CUG 2018 MAY 2018

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

FUSION PROCESSORS AND HPC

Embedded Systems Architecture

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Generic System Calls for GPUs

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD Opteron 4200 Series Processor

AMD EPYC CORPORATE BRAND GUIDELINES

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

DR. LISA SU

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

RegMutex: Inter-Warp GPU Register Time-Sharing

ROCm: An open platform for GPU computing exploration

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

October Quick Reference Guide

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

AMD HyperTransport Technology-Based System Architecture

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

Designing Natural Interfaces

Resource Saving: Latest Innovation in Optimized Cloud Infrastructure

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

AMD Ryzen Master Overclocking User s Guide. for AMD Ryzen and Ryzen Threadripper Processors

Embedded System Architecture

NUMA Topology for AMD EPYC Naples Family Processors

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

High-Speed NAND Flash

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

AMD s Next Generation Microprocessor Architecture

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

Sequential Consistency for Heterogeneous-Race-Free

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

AMD HD7750 PCIe ADD-IN BOARD. Datasheet (GFX-A3T2-01FST1)

The Future of Computing: AMD Vision

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

INTRODUCING RYZEN MARCH

KBC1122/KBC1122P. Mobile KBC with Super I/O, SFI, ADC and DAC with SMSC SentinelAlert! TM PRODUCT FEATURES. Data Brief

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

AMD E M PCIEx16 GFX-AE6460F16-5C

Vulkan (including Vulkan Fast Paths)

Graphics Hardware 2008

Maintaining Cache Coherency with AMD Opteron Processors using FPGA s. Parag Beeraka February 11, 2009

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

ECE 571 Advanced Microprocessor-Based Design Lecture 24

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

Transcription:

Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management Thermal management 2 Next-Generation Mobile Computing August 2007

The Evolution of the Notebook PC Customers increasingly demand more for less Reduced cost and complexity Smaller form factors Increased battery life Durability and reliability Enhanced connectivity Simplified user interface and interaction 3 Next-Generation Mobile Computing 3 August 2007 Architectural Challenges for s in Mobile Platforms UMA configurations present increasing challenges: Performance: Bandwidth for DirectX 10/Vista Power efficiency and battery life (avoid the local frame buffer) Power efficiency limitations: Frequency and voltage cross dependencies limit the granularity of power savings Frequency agility To be addressed without changing the CPU core! 4 Next-Generation Mobile Computing August 2007

Addressing these Challenges Increase system bandwidth for UMA solutions Performance: HyperTransport 3, dedicated display refresh virtual channel, maximize DRAM efficiency Power: HT3 power management extensions, Memory controller on its own power plane Power efficiency Split power planes, independent frequency selection (core0, core1, NB are independent) Fast frequency changes without PLL relock using digital frequency synthesizers Improved efficiency allows lower power for fixed workloads, or higher performance for fixed power 5 Next-Generation Mobile Computing August 2007 Introducing Griffin Enhanced performance and battery life New infrastructure features optimized for the mobile segment System on chip (SOC) methodology to accelerate design and integration New mobile optimized Power optimized DDR2 Enhanced HyperTransport 3 connectivity Integrating 65nm cores and larger L2 caches 6 6 Next-Generation Mobile Computing August 2007

A Few Details Dual AMD64 CPU cores Dedicated 1 MB L2 caches Dual channel DDR2 interface 64 bits per channel All speeds supported up to DDR2-800 12.8 GB/s peak memory bandwidth 16 GB max memory configuration HyperTransport 3 I/O link Speeds supported up to 2.6 GHz 16x16 bit 10.4 GB/s simultaneous peak bandwidth in each direction Dynamic link power management 160 mm 2, 225.6 million transistors HT3 L2 Cache 0 L2 Cache 1 CPU CPU DDR2 7 7 Next-Generation Mobile Computing August 2007 Mobile Optimized Memory Controller New integrated memory controller features: Improvements in DRAM efficiency Improved DRAM prefetcher Dedicated Display Refresh channel Extended battery life with new power saving features Operates on separate power plane, and at a lower voltage than the cores Enables C4 Deeper Sleep on systems with UMA graphics without the need for local frame buffer LCD Display L2 Cache On-die Memory Controller DDR Interface Dual Channel DDR2 L2 Cache Crossbar/Host Bridge Hyper- Transport TM 3.0 Controller HyperTransport 3.0 Interface HT3 up to 16x16 link Chipset with GPU Cost and power efficient solution for UMA 8 Next-Generation Mobile Computing August 2007

DRAM Efficiency Maximization Aggressive use of bypass paths to minimize idle latency Bank state tracking Chip selects are interleaved to increase number of distinct banks Unganging of DRAM channels further increases number of banks 16 banks per channel are tracked, using LRU algorithm Pages can be closed dynamically, based on bank access pattern Out of Order (OOO) scheduling of requests, based on: Request priority (programmable by type: low/med/high) Page status (miss/hit/conflict) Write bursting Writes are accumulated and done at once to minimize bus turnaround 9 Next-Generation Mobile Computing August 2007 Advanced DRAM Prefetcher Memory Prefetch Table (MPT) has entries for 8 outstanding threads Prefetches 2 requests ahead Capable of tracking single (+n, +n, +n) or double (+n, +m, +n) strided patterns, with strides up to ±4 lines When request is received: stride2 = change in address + stride1 stride1 = change in address MPT 0 stride1 stride2 addr 1 stride1 stride2 addr 2 stride1 stride2 addr 3 stride1 stride2 addr 4 stride1 stride2 addr 5 stride1 stride2 addr 6 stride1 stride2 addr 7 stride1 stride2 addr stride2 + last is predicted address; prefetches are issued when confidence threshold achieved A A+n A+n+m A+2n+m A+2n+2m stride1 stride2 +n +m +n+m +n +n+m +m +n+m 10 Next-Generation Mobile Computing August 2007

Display Refresh Optimization High bandwidth, high latency, but latency guarantee required to avoid display buffer underrun Doesn't fit well in existing HyperTransport TM VC sets Base channels provide no latency guarantees Isoc channels are low bandwidth; can starve other traffic Requests arrive via HyperTransport TM Isoc channel Detected by decoding coherence and ordering requirements Has dedicated buffer and routing resources internally Chipset must manage interaction with Isoc traffic Memory controller priority is variable based on age 11 Next-Generation Mobile Computing August 2007 Dynamic Performance Scaling Capabilities Reduced power consumption to increase battery life Increased performance with instantaneous frequency transitioning Separate voltage planes for each core Each core can operate at independent frequency and voltage No PLL relock required Minimize power consumption by always running at optimal P-state Lower operating minimum p-state Reduced processor utilization with simplified p-state transitions Max P P 1 P 2 P 3 P 4 P 5 P 6 Min P C1 - Halt V 0 V 1 V 2 V 3 V 4 Deep Sleep Max P P 1 P 2 P 3 P 4 P 5 P 6 Min P C1 - Halt Deeper Sleep (AltVID) V 0 V 1 V 2 V 3 V 4 Increased battery life with advanced power management features 12 Next-Generation Mobile Computing August 2007

Power-optimized HyperTransport 3 Increased performance with over 3x increase in peak I/O bandwidth Delivers increased bandwidth for DX10, a requirement for 2008 Windows Vista Premium Logo Extended battery life with power reduction features Dynamic scaling of link widths Disconnecting HyperTransport TM when idle, even while cores are executing HW-autonomous, no SW support needed X x2 x16 x8 x4 X x16 x8 x4 x2 HyperTransport 3 increases performance while helping to extend battery life with power management features Note: Animated scaling of link widths shown here are for illustration purposes. HT disconnect is required between each link width change. Actual link width scaling may vary based on implementation. 13 Next-Generation Mobile Computing August 2007 Voltage Planes and Control Fixed Analog and I/O Voltage Planes VDDIO & VTT for DDR PHY VDDA for on-die PLL VLDT for HyperTransport PHY Independently-Variable Voltage Planes VDD0 for CPU core 0 VDD1 for CPU core 1 VDDNB for on-die NorthBridge SVI DC-DC VDD1 VDD0 SVI VDDNB CPU + 1MB L2 CPU + 1MB L2 On-Die NorthBridge Serial VID Interface (SVI) Protocol provides pin-efficient means to control multiple independent voltage planes 14 14 Next-Generation Mobile Computing August 2007

Fine Grain Power Management CPU core power-state transitions Simple software interface Cores continue execution while frequency changes are in progress Autonomous hardware power management CPU and chipset work together to establish the most power efficient settings Eliminates reliance on software for maximum power efficiency The CPU informs the chipset when P-states change or HALT condition reached The chipset monitors I/O traffic and CPU state and establishes the optimal power management profile for a given set of system conditions Current-generation power management schemes remain supported Power efficiency via dynamic hardware power management 15 Next-Generation Mobile Computing August 2007 Autonomous Hardware Power Management Gfx Driver HT: 4 bits up, 4 bits down DRAM Active Cores executing driver software GPU Render HT: 16 bits up, 16 bits down DRAM Active Cores in deep sleep Display Active Deep Sleep HT: 2 bits up, 8 bits down DRAM Active Cores in deep sleep HT: disconnected DRAM Self Refresh & Tristated Cores in deep sleep 16 Next-Generation Mobile Computing August 2007

Power and Performance Tradeoffs Feature Independent CPU core voltage planes and frequency selection Separate voltage plane for ondie NorthBridge Dynamic HT link power management Autonomous hardware control of CPU core deep sleep state Autonomous hardware control of DRAM self-refresh state CPU core deep sleep wakeup to service probes at lowest P-State Attributes Power consumption matches CPU performance delivered Enables CPU deep sleep with integrated graphics. Power consumption matches interconnect bandwidth delivered Increased residency in CPU deep sleep state Increased residency in DRAM sleep state Increased residency in CPU deep sleep state 17 17 Next-Generation Mobile Computing August 2007 Griffin Power Management Visualization System Memory On-die HT Chipset/ GPU State P0, C0 C1 High P0, C0 Active utilization 16x16, 2.6 GHz, connected High Perf state Power Vmax,Fmax Vmax,Fmax Pmax, C0 Vmax,Fmax 18 Next-Generation Mobile Computing August 2007

Griffin Power Management Visualization System Memory On-die HT Chipset/ GPU State P0, C0 C1 High P3, C0 Active utilization 16x16, 2.6 GHz, connected High Perf state Vmax,Fmax Power Vint, Fmax /2 Vmax,Fmax 19 Next-Generation Mobile Computing August 2007 Griffin Power Management Visualization System Memory On-die HT Chipset/ GPU State P0, C0 C1 Pmin, C1 High P3, C0 Active utilization 16x16, 2.6 GHz, connected High Perf state Power Vmin Vint, Fmax /2 Vmax,Fmax 20 Next-Generation Mobile Computing August 2007

Griffin Power Management Visualization System Memory On-die HT Chipset/ GPU State Pmin, C1 Pmin, C1 Self refresh Clocks gated Disconnected Low Perf state C1E with AltVID applied (Deeper Sleep) Power 21 Next-Generation Mobile Computing August 2007 Griffin Power Management Visualization System Memory On-die HT Chipset/ GPU State Pmin, C1 Pmin, C1 Active Active 2x8, 2.6 GHz, connected Low Perf state (refresh) C1E with AltVID applied (Deeper Sleep) Power 22 Next-Generation Mobile Computing August 2007

AMD Multi-Point Thermal Control Multiple on-die thermal sensors Simplified thermal management Designed to automatically reduce p-state when temperature exceeds pre-defined limit Rev. G Thermal Monitor Circuit PROCHOT SMBUS Embedded Controller New memory power management External thermal monitor can force reduction in memory temperature with MEMHOT signal Griffin = Thermal sensor SB-TSI PROCHOT MEMHOT Embedded Controller Simple and accurate thermal management 23 Next-Generation Mobile Computing August 2007 Summary Griffin is an important evolutionary step in AMD s notebook product portfolio Greater memory efficiency for improved processor and graphics bandwidth Enhanced power management Improved thermal management Optimized for UMA power/performance while maintaining direct CPU-Memory connection Simplified, reliable platform architecture Low platform cost Maximal CPU performance Tighter integration of the chipset and CPU an important milestone to Fusion 24 24 Next-Generation Mobile Computing August 2007

Disclaimer The information presented in this document is for information purposes only. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including, but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN OR FOR THE PERFORMANCE OR OPERATION OF ANY PERSON, INCLUDING, WITHOUT LIMITATION, ANY LOST PROFITS, BUSINESS INTERRUPTION, DAMAGE TO OR DESTRUCTION OF PROPERTY, OR LOSS OF PROGRAMS OR OTHER DATA, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Trademark Attribution AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. PCIe is a registered trademark of PCI-SIG. HyperTransport is a licensed trademark of the HyperTransport Consortium. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. 2007 Advanced Micro Devices, Inc. All rights reserved. 25 Next-Generation Mobile Computing August 2007