RADEON X1000 Memory Controller

Size: px
Start display at page:

Download "RADEON X1000 Memory Controller"

Transcription

1 RADEON X1000 Memory Controller

2 Ring Bus Memory Controller Supports today s fastest graphics memory devices GDDR3, 48+ GB/sec 512-bit Ring Bus Simplifies layout and enables extreme memory clock scaling New Cache Design Fully Associative for more optimal performance Improved Hyper Z Better compression and hidden surface removal Programmable Arbitration Logic Maximizes memory efficiency Can be upgraded via software

3 Radeon X1800 Ring Bus Two internal 256-bit rings Run in opposite directions to minimize latency Return requested data to clients Memory writes use crossbar switch Circle around the periphery of the chip Reduces routing complexity Permits higher clock speeds One ring stop per pair of memory s Linked directly to memory interface Ring Stop Ring Stop Memory Controller Ring Stop Read Path Write Path Ring Stop

4 Programmable Arbitration Logic Prioritizes memory access requests Predicts impact of each request on overall performance Uses feedback system to maximize memory and GPU efficiency Programmable parameters Can be tuned via driver updates

5 Memory Channels Memory Devices Radeon X1800 8x s Memory Controller 256 bit interface Radeon X bit 64-bit Memory Devices 64-bit 64-bit 4x64-bit s Memory Controller

6 Cache Design Fully Associative Caches Cache lines can map to any location in external memory Earlier designs used Direct Mapped & N-Way Associative Caches Could only access limited blocks of external memory Direct Mapped Cache Graphics Memory Cache Texture, Color, Z & Stencil caches are all now fully associative Reduces memory bandwidth requirements Minimizes cache contention stalls Optimized game performance Fully Associative Cache Graphics Memory Cache Gains up to 25% clock for clock in fill/bandwidth bound cases

7 Cache Performance X1800 Z Cache Misses Relative to X % 110% 100% 90% Cache Misses 80% 70% Battlefield 2 Far Cry 3DMark05 GT1 X850 Baseline 60% 50% 40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Demo Progress(Frames)

8 Cache Performance X1800 Texture Cache Misses Relative to X % 110% 100% 90% Cache Misses 80% 70% Battlefield 2 Far Cry HL2: Lost Coast X850 Baseline 60% 50% 40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Demo Progress(Frames)

9 Hyper Z Improved Hierarchical Z Buffer Detects and discards hidden pixels before shading Important in scenes with heavy overdraw (i.e. overlapping objects) New technique uses floating point for improved precision Catches up to 60% more hidden pixels than X Improved Z Compression Z Buffer data is typically the largest user of memory bandwidth Bandwidth can be reduced by up to 8:1 using lossless compression New method achieves higher compression ratios more often

10 Memory Controller Performance New technology benefits most apparent in most bandwidth-demanding situations High resolutions (1600x1200 and up) Anti-Aliasing (4x and 6x modes, Adaptive AA) Anisotropic Filtering (8x and 16x Quality AF modes) Frame rates over 2x faster than previous generation in these cases

11 Anti-Aliasing Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Far Cry - Regulator 0% 1600x x1200 (4XAA) 1600x1200 (6XAA) Radeon X1800 XT Radeon X850 XT Geforce 7800GTX GeForce 6800 Ultra

12 Anti-Aliasing Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Battlefield 2 0% 1600x x1200 (4XAA) 1600x1200 (6XAA) Radeon X1800 XT Radeon X850 XT GeForce 7800 GTX GeForce 6800 Ultra

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

AMD E M PCIEx16 GFX-AE6460F16-5C

AMD E M PCIEx16 GFX-AE6460F16-5C AMD E6460 512M PCIEx16 GFX-AE6460F16-5C MPN numbers: 1A1-E000141ADP Embedded PCIe Graphics 1 x DL DVI-D, 1 x HDMI, 1 x VGA CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface...

More information

Anatomy of AMD s TeraScale Graphics Engine

Anatomy of AMD s TeraScale Graphics Engine Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream

More information

POWERVR MBX. Technology Overview

POWERVR MBX. Technology Overview POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied

More information

AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx)

AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx) AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet (ER93FLA/ER91FLA-xx) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1 AMD HD5450 PCIe ADD-IN BOARD Datasheet AEGX-A3T5-01FST1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display System... 5 2.4.

More information

AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J

AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J AMD E8860 2GB PCIEx16 Mini DP X6 GFX-AE8860F16-5J MPN NUMBERS: 1A1-E000188ADP Embedded PCIe Graphics 6 x Mini DP with cable locking CONTENTS 1. Spe c ification... 2. Fun ctional 3 Overview... 4 2.1. Memory

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

From Shader Code to a Teraflop: How Shader Cores Work

From Shader Code to a Teraflop: How Shader Cores Work From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University This talk 1. Three major ideas that make GPU processing cores run fast 2. Closer look at real GPU designs NVIDIA

More information

Mali-G72 Enabling tomorrow s technology today

Mali-G72 Enabling tomorrow s technology today Mali-G72 Enabling tomorrow s technology today Alan Tsai Senior Regional Marketing Manager Media Processing Group, ARM ARM Tech Forum Taipei July 4 th 2017 Mali High Performance GPU success 2 Mali-G71 in

More information

Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems. Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft)

Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems. Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft) Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft) 0 Outline Introduction on Multi-GPU rendering RTT surface

More information

Spotlight: ATI Radeon HD 4890 Graphics

Spotlight: ATI Radeon HD 4890 Graphics Catalog Spring 2009 Spotlight: ATI Radeon HD 4890 Graphics Relentlessly pursuing performance is what we do and that s what ATI Radeon HD 4890 GPUs are all about. We took the graphics card that won more

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

Xbox 360 high-level architecture

Xbox 360 high-level architecture 11/2/11 Xbox 360 s Xenon vs. Playstation 3 s Cell Both chips clocked at a 3.2 GHz Architectural Comparison: Xbox 360 vs. Playstation 3 Prof. Aaron Lanterman School of Electrical and Computer Engineering

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

Monday Morning. Graphics Hardware

Monday Morning. Graphics Hardware Monday Morning Department of Computer Engineering Graphics Hardware Ulf Assarsson Skärmen består av massa pixlar 3D-Rendering Objects are often made of triangles x,y,z- coordinate for each vertex Y X Z

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014 Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms Visual Computing Systems Review: mechanisms to reduce aliasing in the graphics pipeline When sampling visibility?! -

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) From Shader Code to a Teraflop: How GPU Shader Cores Work Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) 1 This talk Three major ideas that make GPU processing cores run fast Closer look at real

More information

AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J

AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J AMD E8870 4GB PCIEX16 Mini DP X4 Low profile ER24FL-SK4 GFX-AE8870L16-5J MPN : 1A1-E000236ADP Embedded PCIe Graphics 4 x Mini DP with cable locking REV 1.0 Page 2 of 15 2016 CONTENTS 1. Specification...

More information

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university Real-Time Buffer Compression Michael Doggett Department of Computer Science Lund university Project 3D graphics project Demo, Game Implement 3D graphics algorithm(s) C++/OpenGL(Lab2)/iOS/android/3D engine

More information

Mali-G72: Enabling tomorrow s technology today

Mali-G72: Enabling tomorrow s technology today Mali-G72: Enabling tomorrow s technology today Ploutarchos Galatsopoulos Senior Product Manager Media Processing Group, ARM ARM Tech Forum Korea June 28 th 2017 ARM Mali: The world s #1 shipping GPU ~50%

More information

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know. Administrivia HW0 scores, HW1 peer-review assignments out. HW2 out, due Nov. 2. If you re having Cython trouble with HW2, let us know. Review on Wednesday: Post questions on Piazza Introduction to GPUs

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

Falanx Microsystems. Company Overview

Falanx Microsystems. Company Overview Image Quality no compromise Company Falanx Overview Microsystems Company Overview Design and license silicon graphics IP cores targeted at mobile phones and system-on-chip Core Competencies Computer Graphics

More information

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST

SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST SAPPHIRE DUAL-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

Technical Brief. AGP 8X Evolving the Graphics Interface

Technical Brief. AGP 8X Evolving the Graphics Interface Technical Brief AGP 8X Evolving the Graphics Interface Increasing Graphics Bandwidth No one needs to be convinced that the overall PC experience is increasingly dependent on the efficient processing of

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

Filtering theory: Battling Aliasing with Antialiasing. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Filtering theory: Battling Aliasing with Antialiasing. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Filtering theory: Battling Aliasing with Antialiasing Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology 1 What is aliasing? 2 Why care at all? l Quality!! l Example:

More information

Speed up a Machine-Learning-based Image Super-Resolution Algorithm on GPGPU

Speed up a Machine-Learning-based Image Super-Resolution Algorithm on GPGPU Speed up a Machine-Learning-based Image Super-Resolution Algorithm on GPGPU Ke Ma 1, and Yao Song 2 1 Department of Computer Sciences 2 Department of Electrical and Computer Engineering University of Wisconsin-Madison

More information

Overview. Technology Details. D/AVE NX Preliminary Product Brief

Overview. Technology Details. D/AVE NX Preliminary Product Brief Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics

More information

AMD Embedded MXM Module. Datasheet. (EM93F/EM91F xx) Manufacturer P/N: EM91F PE(MXM Fanless) Manufacturer P/N: EM93F PI(MXM Fanless)

AMD Embedded MXM Module. Datasheet. (EM93F/EM91F xx) Manufacturer P/N: EM91F PE(MXM Fanless) Manufacturer P/N: EM93F PI(MXM Fanless) Datasheet (EM93F/EM91F xx) Manufacturer P/N: EM91F PE(MXM Fanless) Manufacturer P/N: EM93F PI(MXM Fanless) CONTENTS 1. Feature...3 2. Functional Overview...4 2.1. Memory Interface...4 2.2. Acceleration

More information

AMD Embedded MXM Module E6460 Datasheet. (EM91F -xx)

AMD Embedded MXM Module E6460 Datasheet. (EM91F -xx) AMD Embedded MXM Module E6460 Datasheet (EM91F -xx) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display System... 5 2.4. DVI/HDMI

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 18

ECE 571 Advanced Microprocessor-Based Design Lecture 18 ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

AMD Radeon HD 2900 Highlights

AMD Radeon HD 2900 Highlights C O N F I D E N T I A L 2007 Hot Chips 19 AMD s Radeon HD 2900 2 nd Generation Unified Shader Architecture Mike Mantor Fellow AMD Graphics Products Group michael.mantor@amd.com AMD Radeon HD 2900 Highlights

More information

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen

Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,

More information

Real-Time Rendering Architectures

Real-Time Rendering Architectures Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB

User Guide. NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB NVIDIA Quadro FX 4700 X2 BY PNY Technologies Part No. VCQFX4700X2-PCIE-PB User Guide PNY Technologies, Inc. 299 Webro Rd. Parsippany, NJ 07054-0218 Tel: 408.567.5500 Fax: 408.855.0680 Features and specifications

More information

Filtering theory: Battling Aliasing with Antialiasing. Department of Computer Engineering Chalmers University of Technology

Filtering theory: Battling Aliasing with Antialiasing. Department of Computer Engineering Chalmers University of Technology Filtering theory: Battling Aliasing with Antialiasing Department of Computer Engineering Chalmers University of Technology 1 What is aliasing? 2 Why care at all? l Quality!! l Example: Final fantasy The

More information

AMD E8860 PCIe ADD-IN BOARD. Datasheet (AEGX-A5T8-20FMT1)

AMD E8860 PCIe ADD-IN BOARD. Datasheet (AEGX-A5T8-20FMT1) AMD E8860 PCIe ADD-IN BOARD Datasheet (AEGX-A5T8-20FMT1) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Configuration Support... 4 2.2. Acceleration Features... 4 2.3. Avivo Display System...

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge

More information

Technical Brief. Lumenex Engine The New Standard In GPU Image Quality

Technical Brief. Lumenex Engine The New Standard In GPU Image Quality Technical Brief Lumenex Engine The New Standard In GPU Image Quality Introduction At NVIDIA, we are extremely passionate about image quality. The people who design our award winning Geforce graphics processors

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

The Need for Programmability

The Need for Programmability Visual Processing The next graphics revolution GPUs Graphics Processors have been engineered for extreme speed - Highly parallel pipelines exploits natural parallelism in pixel and vertex processing -

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

NVIDIA nforce IGP TwinBank Memory Architecture

NVIDIA nforce IGP TwinBank Memory Architecture NVIDIA nforce IGP TwinBank Memory Architecture I. Memory Bandwidth and Capacity There s Never Enough With the recent advances in PC technologies, including high-speed processors, large broadband pipelines,

More information

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1

AMD HD5450 PCI ADD-IN BOARD. Datasheet. Advantech model number:gfx-a3t5-61fst1 AMD HD5450 PCI ADD-IN BOARD Datasheet Advantech model number:gfx-a3t5-61fst1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

ASYNCHRONOUS SHADERS WHITE PAPER 0

ASYNCHRONOUS SHADERS WHITE PAPER 0 ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped

More information

Table of Contents 2-4

Table of Contents 2-4 Setting Up TS 2018 with a single nvidia card, using nvidia Control Panel (NVCP) PLUS (optional) nvidia Inspector (NVI). Single Standard and GSync Monitor settings. Setting up DSR in TS 2018 This is a guide

More information

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1.

2 x Maximum Display Monitor(s) support MHz Core Clock 28 nm Chip 384 x Stream Processors. 145(L)X95(W)X26(H) mm Size. 1. Model 11215-01-20G SAPPHIRE R7 250 2GB DDR3 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software 2 x Maximum Display Monitor(s) support 1 x D-Sub(VGA) 1 x HDMI (with 3D)

More information

Graphics Hardware. Ulf Assarsson. Graphics hardware why? Recall the following. Perspective-correct texturing

Graphics Hardware. Ulf Assarsson. Graphics hardware why? Recall the following. Perspective-correct texturing Department of Computer Engineering Graphics Hardware Graphics hardware why? About 100x faster! Another reason: about 100x faster! Simple to pipeline and parallelize Ulf Assarsson 2 Current hardware based

More information

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam Convolution Soup: A case study in CUDA optimization The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam Optimization GPUs are very fast BUT Naïve programming can result in disappointing performance

More information

AMD HD5450 PCIe X1 ADD-IN BOARD. Datasheet. Advantech model number: GFX-A3T5-71FST1

AMD HD5450 PCIe X1 ADD-IN BOARD. Datasheet. Advantech model number: GFX-A3T5-71FST1 AMD HD5450 PCIe X1 ADD-IN BOARD Datasheet Advantech model number: GFX-A3T5-71FST1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 20

ECE 571 Advanced Microprocessor-Based Design Lecture 20 ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

SAPPHIRE TOXIC R9 280X 3GB GDDR5

SAPPHIRE TOXIC R9 280X 3GB GDDR5 SAPPHIRE TOXIC R9 280X 3GB GDDR5 Specification Display Support Output GPU Video Memory Dimension Software Accessory 5 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 2 x Mini-DisplayPort 1 x Single-Link

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

High-Quality Surface Splatting on Today s GPUs

High-Quality Surface Splatting on Today s GPUs High-Quality Surface Splatting on Today s GPUs M. Botsch, A. Hornung, M. Zwicker, L. Kobbelt Presented by Julian Yu-Chung Chen CS594 GPU Programming 2006-03-30 Outline Point Based Rendering Surface splatting

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Project3 Cache Race Games night Monday, May 4 th, 5pm Come, eat, drink, have fun and be merry! Location: B17 Upson Hall

More information

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI

More information

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015 CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015 Announcements Project 2 due tomorrow at 2pm Grading window

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg The Ultimate Developers Toolkit Jonathan Zarge Dan Ginsburg February 20, 2008 Agenda GPU PerfStudio GPU ShaderAnalyzer RenderMonkey Additional Tools Tootle GPU MeshMapper CubeMapGen The Compressonator

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Xbox 360 Architecture. Lennard Streat Samuel Echefu

Xbox 360 Architecture. Lennard Streat Samuel Echefu Xbox 360 Architecture Lennard Streat Samuel Echefu Overview Introduction Hardware Overview CPU Architecture GPU Architecture Comparison Against Competing Technologies Implications of Technology Introduction

More information

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

arxiv: v1 [physics.comp-ph] 4 Nov 2013

arxiv: v1 [physics.comp-ph] 4 Nov 2013 arxiv:1311.0590v1 [physics.comp-ph] 4 Nov 2013 Performance of Kepler GTX Titan GPUs and Xeon Phi System, Weonjong Lee, and Jeonghwan Pak Lattice Gauge Theory Research Center, CTP, and FPRD, Department

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

A Reconfigurable Architecture for Load-Balanced Rendering

A Reconfigurable Architecture for Load-Balanced Rendering A Reconfigurable Architecture for Load-Balanced Rendering Jiawen Chen Michael I. Gordon William Thies Matthias Zwicker Kari Pulli Frédo Durand Graphics Hardware July 31, 2005, Los Angeles, CA The Load

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing March 2015 Introductions James Rumble Developer Technology Engineer Ray Tracing Support Justin DeCell Software Design Engineer Ray Tracing

More information

Course Recap + 3D Graphics on Mobile GPUs

Course Recap + 3D Graphics on Mobile GPUs Lecture 18: Course Recap + 3D Graphics on Mobile GPUs Interactive Computer Graphics Q. What is a big concern in mobile computing? A. Power Two reasons to save power Run at higher performance for a fixed

More information

Mobile HW and Bandwidth

Mobile HW and Bandwidth Your logo on white Mobile HW and Bandwidth Andrew Gruber Qualcomm Technologies, Inc. Agenda and Goals Describe the Power and Bandwidth challenges facing Mobile Graphics Describe some of the Power Saving

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

AN501: Latency Settings and their Impact on Memory Performance. John Beekley, VP Applications Engineering, Corsair Memory, Inc.

AN501: Latency Settings and their Impact on Memory Performance. John Beekley, VP Applications Engineering, Corsair Memory, Inc. AN501: Latency Settings and their Impact on Memory Performance John Beekley, VP Applications Engineering, Corsair Memory, Inc. Introduction Memory modules are currently available which support a wide variety

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information