The Source for GPU Programming

Size: px
Start display at page:

Download "The Source for GPU Programming"

Transcription

1 The Source for GPU Programming developer.nvidia.com Latest News Developer Events Calendar Technical Documentation Conference Presentations GPU Programming Guide Powerful Tools, SDKs, and more... Join our FREE registered developer program for early access to NVIDIA drivers, cutting edge tools, online support forums, and more!

2 GeForce 6 Series Performance Matthias Wloka Developer Technology

3 GeForce 6 Series Specific Performance Instancing Vertex- and Pixel-Shaders 3.0 Branching and Looping Vertex Texture Fetch Hardware Shadow Maps Z- and Stencil-Cull FP16 Filter and Blend, MRTs

4 Marketing Speak Translation SM3, i.e., Shader Model 3 hardware Sometimes shorthand for Every GeForce 6 feature not in GeForce FX Not just VS/PS 3.0 See previous slide! GeForce 6200 does not support fp16 filter/blend Okay, because: value cards lack memory b/w to use fp16 render-targets

5 Simplified Graphics Pipeline CPU Geometry Storage Instancing Vertex Shader 3.0 Geometry Processor Rasterizer Z/Stencil Cull Pixel Shader 3.0 Fragment Processor Frame Buffer Common bottlenecks: CPU Fragment processor Texture Storage + Filtering Fp16 Filter Shadow Maps Fp16 Blend MRT New features help address these bottlenecks

6 CPU Bottleneck Getting Worse Courtesy Ian Buck, Stanford University

7 Explicitly Address CPU Bottleneck Reduce draw calls Budget/Design for your draw calls! Use instancing to reduce batches Use über-shaders to eliminate batches/passes Use fp16 blending to eliminate passes Move more computations to GPU: GPGPU: General-Purpose Computations Using GPUs See

8 Detail of a Single Vertex Shader Pipeline Input Vertex Data Vertex Texture Fetch FP32 Scalar Unit FP32 Vector Unit Branch Unit Texture Cache Primitive Assembly Viewport Processing To Setup

9 Instancing: What Is It? Let s GPU loop over vertex buffers: Tree Model VB Transform Matrices VB Single draw call generates many instances of object

10 Instancing Demo Complex lighting, post-processing Simple CPU collision

11 Instancing Advantages Alternatives: One draw call / instance, change state in-between Static batching (static pre-transformed VB) Dynamic batching (dynamic 2 stream instancing) Vertex constant instancing See Instancing code sample and whitepaper: Individual_Samples/samples.html Most flexible and has the least Draw calls Memory overhead CPU/Bus overhead

12 But Multiple vertex streams GPU does extra work Vertex sizes are larger Transform matrix is a per vertex attribute

13 Attribute Bound Extra data fetched per instance Explains slowdown Vertex cache optimize Cache hit saves all vertex work: Including attribute access Pack input attributes as tightly as possible Even if vertex shader work required to unpack Move constants or derivables out of attributes

14 Instancing Performance Instancing Method Comparison (Note: % is relative to HW instancing in each group) [28 poly mesh] % FPS(relative to HW Instancing) % % 80.00% 60.00% 40.00% 20.00% Single Draw Calls Dynamic 2 Stream Instancing Static 2 Stream Instancing VS Constant Instancing Hardware Instancing Static Pretransformed VB 0.00% # Polys

15 Another View FPS per polys [28poly mesh] FPS # Polys Single Draw Calls Static 2 Stream Instancing Hardware Instancing Dynamic 2 Stream Instancing VS Constant Instancing Static Pretransformed VB

16 Vertex Shader 3.0: Flow Control Vertex flow control near optimal: Branch instructions have fixed ~1 cycle overhead Divergence is full speed (MIMD) Vertex branching is a win Except for short branches Compiler/Driver decides Example: Single unified v-shader for 1, 2, 3, and 4 bone skinning Use branches and loops to Consolidate batches Skip over unnecessary work

17 Vertex Texture Fetch (VTF) Mipmapped texture fetches from vertex: Only R32f and R32G32B32A32f formats Only point-sampling Up to 4 different texture stages Sample as often as you like Large latency Equivalent to instructions

18 Cover the Latency Latency means you can hide other ops in it For free Compiler/driver does this for you if possible texldl r0, v0, sampler0 mul r1, v1, c0 // stuff not depending on vtf result add r1, r1, r0 Branch over VTF if possible Dependent VTFs are slow Less chance to hide latency // use vtf result for the first time

19 Vertex Texture Fetch Performance GeForce 6800 capable of peak 600 MVerts / s Minimalist (err, read no) work per vertex Max with a single VTF: 33 MVerts / s Not all vertices in frame need to be displaced 1 Million displaced 33 fps! Do not use as general constant memory replacement

20 Early Z and Stencil Cull Cull pixels that (will) fail depth/stencil tests before entering pixel-shader For maximum z-cull: Render roughly front to back Or even better: render z-only pass before normal rendering Do stencil-only passes for other cull tricks

21 Things That Disable Z Culling Changing depth-test direction For example, less-equal to greater-equal Only resets on clear

22 Z-Cull Uses Highly Compressed Z-Rep Triangles with holes (alpha test/texkill/clip planes) are not occluding Small triangles are bad occluders Small ~= less than 4x4 pixels Z-cull may not recognize triangle as occluder Good Bad

23 Things That Disable Stencil Culling Changing stencil function, reference, or mask Only resets on clear Writing stencil while rejecting based on stencil Write stencil in separate pass from rejecting color/z

24 Stencil Cull Example 1. Render light volume with color write disabled Depth func = LESS, Stencil func = ALWAYS Stencil Z-FAIL = REPLACE (with value X) Rest of stencil ops set to KEEP 2. Render with lighting shader Depth Func = ALWAYS, Stencil Func = EQUAL, all ops = KEEP, Stencil Ref = X Unlit pixels will be culled because stencil does not match reference value

25 Fast Z-Only Rendering GeForce FX and 6 Series render z/stencil at double speed! Important for dynamic shadow maps! Makes z-first/only pass (for z-cull benefits) attractive Only enabled if: No color-writes Disable pixel shaders (no depth replace, no texkill) Disable alpha test/color key 8-bit/component color buffer bound (not float) No user clip planes No AA

26 Pixel Shader 3.0 Performance What is Pixel Shader 3.0? 3.0 shaders help both CPU and GPU bottlenecks Consolidate draw calls / passes (über-shaders) Early-outs with dynamic branching Gory performance details of particular pixel shader 3.0 features

27 Detail of a Single Pixel Shader Pipeline Texture Filter Bi Bi // Tri Tri // Aniso 1 full speed 4 tap full speed 16:1 Aniso w/ w/ Trilinear FP16 Texture Filtering Texture Data FP Texture Processor Input Fragment Data FP32 Shader Unit 1 Shader Unit 1 4 FP Ops // pixel Co-Issue Texture Address Calc Free fp16 normalize + mini ALU Texture Cache FP32 Shader Unit 2 Shader Unit 2 4 FP Ops // pixel Co-Issue + mini ALU SIMD Architecture Co-Issue FP32 Computation Shader Model 3.0 Branch Processor Fog ALU Output Shaded Fragments

28 Half (fp16) Performance Half (fp16) still matters! Critical for GeForce FX performance Reduces register pressure Better able to hide texture latency Fast fp16 normalize Compiler/driver can NOT help you with this

29 GeForce 6 Single Cycle Normalize() Pixel shader unit has single-cycle normalize Caveat: only for 3-component 16-bit float values float3 f3; half3 h3; half4 h4; f3 = normalize(f3); // slow: dp3/rsq/mul h3 = normalize(f3); // fast: nrmh h4 = normalize(h4); // slow: dp4/rsq/mul h4.xyz = normalize(h4.xyz); // fast: nrmh

30 GeForce 6 Superscalar Execution Executes multiple instructions simultaneously For example, in a single cycle you can execute Two 2-vector instructions, or One 3-vector and one scalar instruction Plus, there are 2 math units per shader pipe Use swizzle / write masks to help compiler half4 A, B; A.w = sin(a.w); // A = sin(a.w) not enough A.xyz = A.xyz * B.xyz;

31 GeForce 6 Series Co-Issue 2 different instructions executing in the same cycle in same shader units 2 separate shader units 4 instructions/pixel/cycle Shader Unit 1 Shader Unit 2 R G B A Operation 1 Operation 2 R G B A Operation 3 Operation 4

32 Flow Control Performance Overview Flow control instruction costs: Not free, but useful Instruction if / endif if / else / endif call ret loop / endloop Cost (Cycles) Additional costs when pixels diverge (more later)

33 Looping Costs DirectX ps.3.0 supports only static loops Unrolling is faster Compiler/driver can do that for you Nonetheless useful because Reduces high-level code-complexity Reduces passes Multiple lights in a single pass can be a big win Number of lights unknown at compile time Reduces proliferation of pre-compiled shaders Thousands of shaders from just a few templates Overcomes DirectX s 512 static instruction limit

34 Branching Costs Branching can provide substantial boost If able to skip > 6 instruction cycles, and If the branch condition is coherent vs. Coherent Incoherent Noisy branch conditions cause performance loss Potentially worse than taking both branches all the time

35 How Coherent Do I Have To Be? GPU has hundreds of pixels in flight Best if coherent over regions of > ~1000 pixels That s only ~30x30! You need to experiment in your own application Soft shadow demo shows: Incoherent branches on small portion of screen is still a big win

36 Combine Branching With Others Back face register (vface) Shade front faces differently from back faces Position register (vpos) Shade based on position For example, skip or simplify distant pixels Early out: If in shadow, don t do lighting computations If out of range (attenuation zero), don t light Applies to vs.3.0 as well

37 Soft Shadow Demo

38 How Soft Shadow Demo Works Takes 8 test samples from shadow map If all 8 in shadow or all 8 in the light then done If on the edge (some in shadow/some in light) Do 56 more samples for additional quality 64 samples at much lower cost! Quick-and-dirty importance sampling Dynamic sampling > 2x faster Vs. 64 samples everywhere

39 Hardware Shadow Maps In DirectX, Render to a depth format texture (D3DFMT_D24X8, D3DFMT_D16) Use tex2dproj to sample Shadow map comparison happens automatically In OpenGL, Render to DEPTH_COMPONENT texture Use TEXTURE_COMPARE_MODE_ARB with COMPARE_R_TO_TEXTURE

40 Hardware Shadow Map Performance Shadow map comparison is free (full speed) No need to compare and filter in the shader If bilinear state is on, Then percentage closer filtering of 4 nearest texels Use single tap for performance Quality roughly equivalent to 4-tap PCF R32F Use multiple taps for higher quality 4-tap HW shadow map roughly as fast as 4-tap manual-pcf R32F

41 Hardware Shadow Map Fallback Possible to use R32F or R16F shadow maps Render depth to single-channel float texture in shader Multiple jittered samples for high quality / soft edges Easy to maintain hardware shadow maps and R32F/R16F code paths: Same setup and pipeline as any shadow map technique HW shadow map shader code simpler and faster HW shadow maps buy speed or quality (or both)

42 Texture Instruction Performance Texldb (scalar LOD bias): Full speed Texldl (explicit scalar LOD selection): Full speed Hardware need not calculate derivatives for LOD Possible to dynamically branch over these instructions Texldd (gradient-based LOD selection): Factor 10 slower! But when you need to use this, you need to use this

43 Floating Point Texture Performance Prefer 64bpp float textures and render targets Half the bandwidth of 128bpp (fp32) textures More importantly: double cache coherence Poor cache coherence destroys performance Fp16 textures 2x faster than fp32 if texture bound Also important: efficient channel allocation Use R32F buffers for scalar data, and R16G16F for 2-vectors Double cache coherence again!

44 Common Sense Texture Performance Use mipmaps GPU fetches local neighborhood for each texel Sharper/Crisper textures Use anisotropic filtering Use better mipmap generation (use texture tools) Do NOT use LOD bias LOD bias is slower and lower quality

45 Normal Maps Use D3DFMT_V8U8 or DXT5 To store x and y Derive z in shader Simon Green s normal map compression paper Compares quality of variety of formats

46 Multiple Render Targets MRTs useful for reducing rendering passes When you need to output more than single 4-vector Deferred shading, particle physics, GPGPU algorithms Replaces up to four passes with one But MRT is not free High bandwidth cost, especially with float formats Small overhead per target rendered GeForce 6 has a sweet spot of 3 render targets (RTs) Split 6 passes into 2 3-RT passes Not 1 4-RT pass and 1 2-RT pass

47 Other Render Target Advice Do not render entire scene to a texture Not getting AA If user turns on control panel AA, hard to detect Instead, render to back buffer, then stretchrect Drivers give performance priority to back buffer Ahead of texture surfaces AA works with back buffer

48 Full Screen Effects Use scissor rects to restrict rendering Light bounds, etc. Do not use full screen quads Use full-screen triangles with scissor rect instead Completely avoids inefficient diagonals

49 Floating Point Blending GeForce FX needs to emulate float blending Using ping-pong buffer Lots of context switches and additional passes Blending, e.g., lots of particles becomes infeasible But fp16 is 2x bandwidth vs. A8R8G8B8

50 Increased Read Back Performance Pre-GeForce 6 Best case, < 200MB/s, all chipsets Only PCI cycles used to write back to host memory GeForce 6800 (AGP) 600 MB/s GB/s, depending on AGP chipset PCI-E Workstation boards 1.0 GB/s on Quadro FX 4400 Up to 2.4 GB/s on Quadro FX 1400

51 Read Back Still a BAD Idea Read back still synchronizes CPU and GPU CPU stalls until GPU finishes all rendering Can you afford wasting precious CPU cycles? GPU pipeline drains completely and becomes idle

52 Memory Allocation Order of resource allocation affects performance Allocate render targets first Sort order by pitch (bpp * width) Sort pitch groups by frequency of use (most used first) Then create vertex and pixel shaders Load / create remaining textures

53 Conclusion Lots of new/fast features Instancing, vs.3.0 flow control, vertex texture fetch Z-/Stencil-cull, fast z-only Fast normalize, ps.3.0 flow control Hardware shadow maps, fp16 blending With some sneaky gotchas Use these features to attack bottlenecks CPU Pixel shaders...

54 Questions? NVIDIA GPU Programming Guide: gpu_programming_guide.html Matthias Wloka

55 The Source for GPU Programming developer.nvidia.com Latest News Developer Events Calendar Technical Documentation Conference Presentations GPU Programming Guide Powerful Tools, SDKs, and more... Join our FREE registered developer program for early access to NVIDIA drivers, cutting edge tools, online support forums, and more!

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI

More information

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance

More information

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation Direct3D API Issues: Instancing and Floating-point Specials Cem Cebenoyan NVIDIA Corporation Agenda Really two mini-talks today Instancing API Usage Performance / pitfalls Floating-point specials DirectX

More information

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies How to Work on Next Gen Effects Now: Bridging DX10 and DX9 Guennadi Riguer ATI Technologies Overview New pipeline and new cool things Simulating some DX10 features in DX9 Experimental techniques Why This

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

Readings on graphics architecture for Advanced Computer Architecture class

Readings on graphics architecture for Advanced Computer Architecture class Readings on graphics architecture for Advanced Computer Architecture class Attached are several short readings on graphics architecture. They are a mix of application-focused and hardware-focused readings.

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips Today s Agenda DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips Optimization for DirectX 9 Graphics Mike Burrows, Microsoft - Performance

More information

GCN Performance Tweets AMD Developer Relations

GCN Performance Tweets AMD Developer Relations AMD Developer Relations Overview This document lists all GCN ( Graphics Core Next ) performance tweets that were released on Twitter during the first few months of 2013. Each performance tweet in this

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Hardware-driven Visibility Culling Jeong Hyun Kim

Hardware-driven Visibility Culling Jeong Hyun Kim Hardware-driven Visibility Culling Jeong Hyun Kim KAIST (Korea Advanced Institute of Science and Technology) Contents Introduction Background Clipping Culling Z-max (Z-min) Filter Programmable culling

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for

More information

Performance OpenGL Programming (for whatever reason)

Performance OpenGL Programming (for whatever reason) Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON Deferred Rendering in Killzone 2 Michal Valient Senior Programmer, Guerrilla Talk Outline Forward & Deferred Rendering Overview G-Buffer Layout Shader Creation Deferred Rendering in Detail Rendering Passes

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

gems_ch28.qxp 2/26/ :49 AM Page 469 PART V PERFORMANCE AND PRACTICALITIES

gems_ch28.qxp 2/26/ :49 AM Page 469 PART V PERFORMANCE AND PRACTICALITIES gems_ch28.qxp 2/26/2004 12:49 AM Page 469 PART V PERFORMANCE AND PRACTICALITIES gems_ch28.qxp 2/26/2004 12:49 AM Page 470 gems_ch28.qxp 2/26/2004 12:49 AM Page 471 As GPUs become more complex, incorporating

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge

More information

Ultimate Graphics Performance for DirectX 10 Hardware

Ultimate Graphics Performance for DirectX 10 Hardware Ultimate Graphics Performance for DirectX 10 Hardware Nicolas Thibieroz European Developer Relations AMD Graphics Products Group nicolas.thibieroz@amd.com V1.01 Generic API Usage DX10 designed for performance

More information

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game GDC Europe 2005 Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game Lars M. Bishop NVIDIA Embedded Developer Technology 1 Agenda GoForce 3D capabilities Strengths and weaknesses

More information

E.Order of Operations

E.Order of Operations Appendix E E.Order of Operations This book describes all the performed between initial specification of vertices and final writing of fragments into the framebuffer. The chapters of this book are arranged

More information

Monday Morning. Graphics Hardware

Monday Morning. Graphics Hardware Monday Morning Department of Computer Engineering Graphics Hardware Ulf Assarsson Skärmen består av massa pixlar 3D-Rendering Objects are often made of triangles x,y,z- coordinate for each vertex Y X Z

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

GPU Target Applications

GPU Target Applications John Montrym Henry Moreton GPU Target Applications (Graphics Processing Unit) 1 Interactive Gaming (50M units, 10M gamers) Cinematic quality rendering in real time. Digital Content Creation (DCC) (1M prof,

More information

PowerVR Performance Recommendations. The Golden Rules

PowerVR Performance Recommendations. The Golden Rules PowerVR Performance Recommendations Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind. Redistribution

More information

GeForce3 OpenGL Performance. John Spitzer

GeForce3 OpenGL Performance. John Spitzer GeForce3 OpenGL Performance John Spitzer GeForce3 OpenGL Performance John Spitzer Manager, OpenGL Applications Engineering jspitzer@nvidia.com Possible Performance Bottlenecks They mirror the OpenGL pipeline

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp

More information

Lecture 2. Shaders, GLSL and GPGPU

Lecture 2. Shaders, GLSL and GPGPU Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders

More information

Sung-Eui Yoon ( 윤성의 )

Sung-Eui Yoon ( 윤성의 ) Introduction to Computer Graphics and OpenGL Graphics Hardware Sung-Eui Yoon ( 윤성의 ) Course URL: http://sglab.kaist.ac.kr/~sungeui/etri_cg/ Class Objectives Understand how GPUs have been evolved Understand

More information

Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control

Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control Pedro V. Sander ATI Research John R. Isidoro ATI Research Jason L. Mitchell ATI Research Introduction In last year s course,

More information

1.2.3 The Graphics Hardware Pipeline

1.2.3 The Graphics Hardware Pipeline Figure 1-3. The Graphics Hardware Pipeline 1.2.3 The Graphics Hardware Pipeline A pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior

More information

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Overview Tools for the analysis Finding pipeline bottlenecks Practice identifying the problems Analysis Tools NVPerfHUD Graph

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Building scalable 3D applications. Ville Miettinen Hybrid Graphics Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game

More information

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Performance: Bottlenecks Sources of bottlenecks CPU Transfer Processing Rasterizer

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

3D buzzwords. Adding programmability to the pipeline 6/7/16. Bandwidth Gravity of modern computer systems

3D buzzwords. Adding programmability to the pipeline 6/7/16. Bandwidth Gravity of modern computer systems Bandwidth Gravity of modern computer systems GPUs Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology The bandwidth between key components

More information

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Changhao Jiang (cjiang@cs.uiuc.edu) Marc Snir (snir@cs.uiuc.edu) University of Illinois Urbana Champaign GPU becomes more powerful

More information

Save the Nanosecond! PC Graphics Performance for the next 3 years. Richard Huddy European Developer Relations Manager ATI Technologies, Inc.

Save the Nanosecond! PC Graphics Performance for the next 3 years. Richard Huddy European Developer Relations Manager ATI Technologies, Inc. Save the Nanosecond! PC Graphics Performance for the next 3 years Richard Huddy European Developer Relations Manager ATI Technologies, Inc. A funny thing happened to me ATI is now broadly recognised and

More information

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications

More information

Could you make the XNA functions yourself?

Could you make the XNA functions yourself? 1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which

More information

Programmable Graphics Hardware

Programmable Graphics Hardware Programmable Graphics Hardware Outline 2/ 49 A brief Introduction into Programmable Graphics Hardware Hardware Graphics Pipeline Shading Languages Tools GPGPU Resources Hardware Graphics Pipeline 3/ 49

More information

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil Real-Time Hair Simulation and Rendering on the GPU Sarah Tariq Louis Bavoil Results 166 simulated strands 0.99 Million triangles Stationary: 64 fps Moving: 41 fps 8800GTX, 1920x1200, 8XMSAA Results 166

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Graphics Hardware. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 2/26/07 1

Graphics Hardware. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 2/26/07 1 Graphics Hardware Computer Graphics COMP 770 (236) Spring 2007 Instructor: Brandon Lloyd 2/26/07 1 From last time Texture coordinates Uses of texture maps reflectance and other surface parameters lighting

More information

Real-World Applications of Computer Arithmetic

Real-World Applications of Computer Arithmetic 1 Commercial Applications Real-World Applications of Computer Arithmetic Stuart Oberman General purpose microprocessors with high performance FPUs AMD Athlon Intel P4 Intel Itanium Application specific

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini

More information

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position

More information

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of

More information

GPU Architecture. Samuli Laine NVIDIA Research

GPU Architecture. Samuli Laine NVIDIA Research GPU Architecture Samuli Laine NVIDIA Research Today The graphics pipeline: Evolution of the GPU Throughput-optimized parallel processor design I.e., the GPU Contrast with latency-optimized (CPU-like) design

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V: Principles in Computer Architecture Parallelism and Locality Fall 2008 Lecture 10 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should we

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

In-Game Special Effects and Lighting

In-Game Special Effects and Lighting In-Game Special Effects and Lighting Introduction! Tomas Arce! Special Thanks! Matthias Wloka! Craig Galley! Stephen Broumley! Cryrus Lum! Sumie Arce! Inevitable! nvidia! Bungy What Is Per-Pixel Pixel

More information

GoForce 3D: Coming to a Pixel Near You

GoForce 3D: Coming to a Pixel Near You GoForce 3D: Coming to a Pixel Near You CEDEC 2004 NVIDIA Actively Developing Handheld Solutions Exciting and Growing Market Fully Committed to developing World Class graphics products for the mobile Already

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

Render-To-Texture Caching. D. Sim Dietrich Jr.

Render-To-Texture Caching. D. Sim Dietrich Jr. Render-To-Texture Caching D. Sim Dietrich Jr. What is Render-To-Texture Caching? Pixel shaders are becoming more complex and expensive Per-pixel shadows Dynamic Normal Maps Bullet holes Water simulation

More information

Using Virtual Texturing to Handle Massive Texture Data

Using Virtual Texturing to Handle Massive Texture Data Using Virtual Texturing to Handle Massive Texture Data San Jose Convention Center - Room A1 Tuesday, September, 21st, 14:00-14:50 J.M.P. Van Waveren id Software Evan Hart NVIDIA How we describe our environment?

More information

2.11 Particle Systems

2.11 Particle Systems 2.11 Particle Systems 320491: Advanced Graphics - Chapter 2 152 Particle Systems Lagrangian method not mesh-based set of particles to model time-dependent phenomena such as snow fire smoke 320491: Advanced

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming

More information

DirectX 10 Performance. Per Vognsen

DirectX 10 Performance. Per Vognsen DirectX 10 Performance Per Vognsen Outline General DX10 API usage Designed for performance Batching and Instancing State Management Constant Buffer Management Resource Updates and Management Reading the

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

From Brook to CUDA. GPU Technology Conference

From Brook to CUDA. GPU Technology Conference From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i

More information

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment Dominic Filion, Senior Engineer Blizzard Entertainment Rob McNaughton, Lead Technical Artist Blizzard Entertainment Screen-space techniques Deferred rendering Screen-space ambient occlusion Depth of Field

More information

Rationale for Non-Programmable Additions to OpenGL 2.0

Rationale for Non-Programmable Additions to OpenGL 2.0 Rationale for Non-Programmable Additions to OpenGL 2.0 NVIDIA Corporation March 23, 2004 This white paper provides a rationale for a set of functional additions to the 2.0 revision of the OpenGL graphics

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

Rendering Objects. Need to transform all geometry then

Rendering Objects. Need to transform all geometry then Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform

More information

What s New with GPGPU?

What s New with GPGPU? What s New with GPGPU? John Owens Assistant Professor, Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Microprocessor Scaling is Slowing

More information

The GPGPU Programming Model

The GPGPU Programming Model The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming

More information

Interactive Cloth Simulation. Matthias Wloka NVIDIA Corporation

Interactive Cloth Simulation. Matthias Wloka NVIDIA Corporation Interactive Cloth Simulation Matthias Wloka NVIDIA Corporation MWloka@nvidia.com Overview Higher-order surfaces Vertex-shader deformations Lighting modes Per-vertex diffuse Per-pixel diffuse with bump-map

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information