RADEON X1000 Memory Controller

Similar documents
Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

AMD E M PCIEx16 GFX-AE6460F16-5C

Anatomy of AMD s TeraScale Graphics Engine

POWERVR MBX. Technology Overview

AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx)

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

PowerVR Hardware. Architecture Overview for Developers

CS427 Multicore Architecture and Parallel Computing

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Graphics Processing Unit Architecture (GPU Arch)

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

From Shader Code to a Teraflop: How Shader Cores Work

Mali-G72 Enabling tomorrow s technology today

Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems. Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft)

Spotlight: ATI Radeon HD 4890 Graphics

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION

Xbox 360 high-level architecture

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Monday Morning. Graphics Hardware

PowerVR Series5. Architecture Guide for Developers

The NVIDIA GeForce 8800 GPU

Windowing System on a 3D Pipeline. February 2005

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Mali-G72: Enabling tomorrow s technology today

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Evolution of GPUs Chris Seitz

Falanx Microsystems. Company Overview

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST

Technical Brief. AGP 8X Evolving the Graphics Interface

Efficient and Scalable Shading for Many Lights

Filtering theory: Battling Aliasing with Antialiasing. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Speed up a Machine-Learning-based Image Super-Resolution Algorithm on GPGPU

Overview. Technology Details. D/AVE NX Preliminary Product Brief

AMD Embedded MXM Module. Datasheet. (EM93F/EM91F xx) Manufacturer P/N: EM91F PE(MXM Fanless) Manufacturer P/N: EM93F PI(MXM Fanless)

AMD Embedded MXM Module E6460 Datasheet. (EM91F -xx)

ECE 571 Advanced Microprocessor-Based Design Lecture 18

Scanline Rendering 2 1/42

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

GeForce4. John Montrym Henry Moreton

AMD Radeon HD 2900 Highlights

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

Real-Time Rendering Architectures

Antonio R. Miele Marco D. Santambrogio

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB

Filtering theory: Battling Aliasing with Antialiasing. Department of Computer Engineering Chalmers University of Technology

AMD E8860 PCIe ADD-IN BOARD. Datasheet (AEGX-A5T8-20FMT1)

Programming Graphics Hardware

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Technical Brief. Lumenex Engine The New Standard In GPU Image Quality

Spring 2009 Prof. Hyesoon Kim

The Need for Programmability

CME 213 S PRING Eric Darve

NVIDIA nforce IGP TwinBank Memory Architecture

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1

Hardware-driven visibility culling

ASYNCHRONOUS SHADERS WHITE PAPER 0

Table of Contents 2-4

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1.

Graphics Hardware. Ulf Assarsson. Graphics hardware why? Recall the following. Perspective-correct texturing

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam

AMD HD5450 PCIe X1 ADD-IN BOARD. Datasheet. Advantech model number: GFX-A3T5-71FST1

ECE 571 Advanced Microprocessor-Based Design Lecture 20

GPU for HPC. October 2010

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

TUNING CUDA APPLICATIONS FOR MAXWELL

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

SAPPHIRE TOXIC R9 280X 3GB GDDR5

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

High-Quality Surface Splatting on Today s GPUs

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Multimedia in Mobile Phones. Architectures and Trends Lund

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

GPUs and GPGPUs. Greg Blanton John T. Lubia

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

Xbox 360 Architecture. Lennard Streat Samuel Echefu

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

arxiv: v1 [physics.comp-ph] 4 Nov 2013

TUNING CUDA APPLICATIONS FOR MAXWELL

A Reconfigurable Architecture for Load-Balanced Rendering

GPU Architecture and Function. Michael Foster and Ian Frasch

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Course Recap + 3D Graphics on Mobile GPUs

Mobile HW and Bandwidth

Spring 2011 Prof. Hyesoon Kim

AN501: Latency Settings and their Impact on Memory Performance. John Beekley, VP Applications Engineering, Corsair Memory, Inc.

Addressing the Memory Wall

Drawing Fast The Graphics Pipeline

Transcription:

RADEON X1000 Memory Controller

Ring Bus Memory Controller Supports today s fastest graphics memory devices GDDR3, 48+ GB/sec 512-bit Ring Bus Simplifies layout and enables extreme memory clock scaling New Cache Design Fully Associative for more optimal performance Improved Hyper Z Better compression and hidden surface removal Programmable Arbitration Logic Maximizes memory efficiency Can be upgraded via software

Radeon X1800 Ring Bus Two internal 256-bit rings Run in opposite directions to minimize latency Return requested data to clients Memory writes use crossbar switch Circle around the periphery of the chip Reduces routing complexity Permits higher clock speeds One ring stop per pair of memory s Linked directly to memory interface Ring Stop Ring Stop Memory Controller Ring Stop Read Path Write Path Ring Stop

Programmable Arbitration Logic Prioritizes memory access requests Predicts impact of each request on overall performance Uses feedback system to maximize memory and GPU efficiency Programmable parameters Can be tuned via driver updates

Memory Channels Memory Devices Radeon X1800 8x s Memory Controller 256 bit interface Radeon X850 64-bit 64-bit Memory Devices 64-bit 64-bit 4x64-bit s Memory Controller

Cache Design Fully Associative Caches Cache lines can map to any location in external memory Earlier designs used Direct Mapped & N-Way Associative Caches Could only access limited blocks of external memory Direct Mapped Cache Graphics Memory Cache Texture, Color, Z & Stencil caches are all now fully associative Reduces memory bandwidth requirements Minimizes cache contention stalls Optimized game performance Fully Associative Cache Graphics Memory Cache Gains up to 25% clock for clock in fill/bandwidth bound cases

Cache Performance X1800 Z Cache Misses Relative to X850 120% 110% 100% 90% Cache Misses 80% 70% Battlefield 2 Far Cry 3DMark05 GT1 X850 Baseline 60% 50% 40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Demo Progress(Frames)

Cache Performance X1800 Texture Cache Misses Relative to X850 120% 110% 100% 90% Cache Misses 80% 70% Battlefield 2 Far Cry HL2: Lost Coast X850 Baseline 60% 50% 40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Demo Progress(Frames)

Hyper Z Improved Hierarchical Z Buffer Detects and discards hidden pixels before shading Important in scenes with heavy overdraw (i.e. overlapping objects) New technique uses floating point for improved precision Catches up to 60% more hidden pixels than X850 1 2 3 4 5 6 Improved Z Compression Z Buffer data is typically the largest user of memory bandwidth Bandwidth can be reduced by up to 8:1 using lossless compression New method achieves higher compression ratios more often

Memory Controller Performance New technology benefits most apparent in most bandwidth-demanding situations High resolutions (1600x1200 and up) Anti-Aliasing (4x and 6x modes, Adaptive AA) Anisotropic Filtering (8x and 16x Quality AF modes) Frame rates over 2x faster than previous generation in these cases

Anti-Aliasing Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Far Cry - Regulator 0% 1600x1200 1600x1200 (4XAA) 1600x1200 (6XAA) Radeon X1800 XT Radeon X850 XT Geforce 7800GTX GeForce 6800 Ultra

Anti-Aliasing Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Battlefield 2 0% 1600x1200 1600x1200 (4XAA) 1600x1200 (6XAA) Radeon X1800 XT Radeon X850 XT GeForce 7800 GTX GeForce 6800 Ultra