Efficient and Scalable Shading for Many Lights

Similar documents
Tiled Shading - Preprint To appear in JGT 2011

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Real-Time Rendering Architectures

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Antonio R. Miele Marco D. Santambrogio

CS 418: Interactive Computer Graphics. Basic Shading in WebGL. Eric Shaffer

CS427 Multicore Architecture and Parallel Computing

The Rasterization Pipeline

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Shaders. Slide credit to Prof. Zwicker

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

PowerVR Hardware. Architecture Overview for Developers

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Bifrost - The GPU architecture for next five billion

Objectives. Introduce the OpenGL shading Methods 1) Light and material functions on MV.js 2) per vertex vs per fragment shading 3) Where to carry out

Spring 2009 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Current Trends in Computer Graphics Hardware

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

Real-Time Hair Rendering on the GPU NVIDIA

Mali Demos: Behind the Pixels. Stacy Smith

Portland State University ECE 588/688. Graphics Processors

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

GPU for HPC. October 2010

PowerVR Series5. Architecture Guide for Developers

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Zeyang Li Carnegie Mellon University

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

GPGPU. Peter Laurens 1st-year PhD Student, NSC

The Rasterization Pipeline

Real-time Graphics 9. GPGPU

GLSL Introduction. Fu-Chung Huang. Thanks for materials from many other people

From Shader Code to a Teraflop: How Shader Cores Work

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 17: Shading in OpenGL. CITS3003 Graphics & Animation

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

Lighting and Shading II. Angel and Shreiner: Interactive Computer Graphics 7E Addison-Wesley 2015

Programming Graphics Hardware

Rendering 13 Deferred Shading

GPU Architecture and Function. Michael Foster and Ian Frasch

Real-time Graphics 9. GPGPU

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

GLSL Introduction. Fu-Chung Huang. Thanks for materials from many other people

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

GRAPHICS PROCESSING UNITS

Monday Morning. Graphics Hardware

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Computer Graphics with OpenGL ES (J. Han) Chapter 6 Fragment shader

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Deferred Rendering Due: Wednesday November 15 at 10pm

GPU Architecture. Michael Doggett Department of Computer Science Lund university

AGGREGATE G-BUFFER ANTI-ALIASING

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Scientific Computing on GPUs: GPU Architecture Overview

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

CS195V Week 9. GPU Architecture and Other Shading Languages

Objectives Shading in OpenGL. Front and Back Faces. OpenGL shading. Introduce the OpenGL shading methods. Discuss polygonal shading

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

GPU A rchitectures Architectures Patrick Neill May

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Scanline Rendering 2 1/42

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

GpuPy: Accelerating NumPy With a GPU

WebGL and GLSL Basics. CS559 Fall 2015 Lecture 10 October 6, 2015

UberFlow: A GPU-Based Particle Engine

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

Programmable Graphics Hardware

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Models and Architectures

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Shadows. Prof. George Wolberg Dept. of Computer Science City College of New York

Beyond Programmable Shading Course ACM SIGGRAPH 2010 Bending the Graphics Pipeline

Evolution of GPUs Chris Seitz

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Shader Programming 1. Examples. Vertex displacement mapping. Daniel Wesslén 1. Post-processing, animated procedural textures

Shading. Slides by Ulf Assarsson and Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Programmable Graphics Hardware

Transcription:

Efficient and Scalable Shading for Many Lights

1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more!

First GPU Shaders Unified Shaders CUDA OpenCL NVIDIA GeForce 256, 1999 Fixed function Programmable Shaders, 2001-2003 Increasing Programmability Unified Shaders AMD Xbox360, 2005 General Programmable Cores CUDA, 2007 C/C++, Integers, Memory Access Open CL, 2009

First GPU Shaders Unified Shaders CUDA OpenCL NVIDIA Geforce 400 series Fermi, 2010 Virtual Functions Unified Memory Addressing AMD Radeon 7900 series Graphics Core Next, 2012 Much the same actually. CPU/GPU integration AMD Fusion APU Intel Sandy Bridge

Around 8x flops Growing fast! Around 6x bandwidth Not growing fast enough! Theoretical TFlops GTX580 ~1.6 TFlops Theoretical GB/s 200 GTX580 ~192 GB/s 1.5 150 1.0 100 0.5 Intel CPU ~0.2 TFlops 50 CPU ~32 GB/s 2003 2005 2007 2009 2011 2003 2005 2007 2009 2011

CPU GPU Core ALU Cache Core ALU Cache Few complex tasks 1-2 thread/core Clever instruction exec Branch prediction etc Lots of cache Narrow Memory Bus Many simple tasks ~48 threads/core Wide SIMD (Many ALUs) Task switching Latency hiding Wide Memory Bus

Color and intensity at sample point Here: sample point = fragment Diffuse contribution needs: Light direction Normal Material diffuse color Specular shading needs: Light direction Normal Eye direction Material shininess Material specular color

Only One Light! Position Color Material Reflectances Untextured Vertex data Interpolated Result color Output to frame buffer uniform vec3 lightpos; uniform vec3 lightcolor; uniform vec3 materialdiffuse; uniform vec3 materialspecular; uniform float materialshininess; in vec3 samplepos; in vec3 samplenormal; out vec4 resultcolor; void main() { // see next slide... }

out vec4 resultcolor; void main() { vec3 lightdir = normalize(lightpos samplepos); vec3 diffuse = dot(samplenormal, lightdir) * materialdiffuse * lightcolor; vec3 viewdir = normalize(-samplepos); vec3 h = normalize(lightdir + viewdir); vec3 specular = pow(dot(h, normal), materialshininess) * materialspecular * lightcolor; resultcolor = vec4(diffuse + specular, 1.0); }

The traditional approach Still most common for real time? Shifting now... Single pass Shading in fragment shaders Geometry generates fragments Each fragment is shaded Result is merged into frame buffer E.g. overwrite if nearest.

Overdraw New fragments replaces old Shading re-computed And again... Lots of waste Especially for many lights. Light Assignment What lights to use? Usually per object E.g. A few thousand triangles Shader management Must support varying numbers of lights

Overdraw Brighter = more overdraw Lots of waste Visible

Which lights to use? Lights vs geometry Complex problem 1000s of objects/batches. 1000s of lights Want minimal set of lights. (Geometrically) Small batches. Want quick drawing Large batches! Good batching for GPU/API. Few material changes

The good Single Pass Low storage overhead Single Frame Buffer Simple if only single / few lights MSAA works The Bad Overdraw Light management Difficult for many lights Shader management Batching coupled with lighting

Key idea Defer (i.e. wait with) light computations First do a geometry pass Sample geometry as per normal But, no lighting Store geometry attributes per pixel Then do a shading pass Use attributes from first pass. Process lights one at a time.

Render Geometry to G-Buffers G-Buffers store attributes, e.g. color, normal, position. For each light Draw Light Bounds For each fragment Read G-Buffers Compute Shading Add to frame buffer No overdraw Trivial light management

All attributes needed for shading e.g Color Normal Position(Z) Specular

No Lights! Just the geometry More Outputs to G-buffer 4 Screen sized textures Position not stored Use depth buffer. Reconstruct using inverse projection uniform vec3 materialdiffuse; uniform vec3 materialspecular; uniform float materialshininess; in vec3 samplepos; in vec3 samplenormal; out vec3 gbuffernormal; out vec3 gbufferdiffuse; out vec3 gbufferspecular; out float gbuffershininess; void main() { gbuffernormal = normal; gbufferdiffuse = materialdiffuse; gbufferspecular = materialspecular; gbuffershininess = materialshininess); }

For each light Draw Light Bounds For each fragment Read G-Buffers Compute Shading Add to frame buffer

Color Normal uniform vec3 lightposition; uniform vec3 lightcolor; uniform float lightrange; Position(Z) Specular void main() { vec3 color = texelfetch(colortex, gl_fragcoord.xy); vec3 specular= texelfetch(speculartex, gl_fragcoord.xy); vec3 normal = texelfetch(normaltex, gl_fragcoord.xy); vec3 position = fetchposition(gl_fragcoord.xy); vec3 shading = dolight(position, normal, color, specular, lightposition, lightcolor, lightrange); } resultcolor = vec4(shading, 1.0);

The good Enables many lights Trivial light management Simple shader management No overdraw The Bad Difficult to support transparency Large frame buffer Especially with MSAA High memory bandwidth usage

Key goal Solve bandwidth issue Read G-Buffers once

New type of overdraw Light overdraw N lights cover a certain pixel -> N reads from the same G-Buffer location Deferred Shading for each light for each covered pixel read G-Buffer compute shading write frame buffer

Deferred Shading, Light Loop for each light for each covered pixel read G-Buffer compute shading write frame buffer We would like to re-arrange for each pixel read G-Buffer for each affecting light compute shading write frame buffer

Requires sequential access to lights for each pixel List per pixel prohibitive Lots of data Slow construction Share list in screen space tiles E.g. 32 x 32 pixels Simple construction Little storage Coherent access

The Light Grid 1 1 12 1 12 12 12 3 12 1 1 12 1 12 12 12 3 12 12 12 12 12 12 12 3 12 1 1 1 12 12 12 3 12 12 12 12 3 1 12 3 12 12 1 1 1 12 12 12 3 12 12 12 1 1 12 12 12 3 12 12 12 1 12 12 12 12

Lets look at those tiles...

1. Render Scene to G-Buffers 2. Build Light Grid 3. Full Screen Quad (or CUDA, or Compute Shaders) For each pixel Fetch G-Buffer Data Find Tile Loop over lights and accumulate shading Write shading

vec3 computelight(vec3 position, vec3 normal, vec3 albedo, vec3 specular, float shininess) { vec3 viewdir = -normalize(position); ivec2 l = ivec2(gl_fragcoord.xy) / CELL_DIM); int count = lightgrid[l.x][l.y].x; int offset = lightgrid[l.x][l.y].y; vec3 shading = vec3(0.0, 0.0, 0.0); } for (int i = 0; i < count; ++i) { int lightid = texelfetch(tiledatatex, offset + i); shading += dolight(position, normal, albedo, specular, shininess, viewdir, lightid); } return shading;

void main() { ivec2 fragpos = ivec2(gl_fragcoord.xy); vec3 albedo = texelfetch(albedotex, fragpos).xyz; vec4 specshine = texelfetch(specularshininesstex, fragpos); vec3 position = unproject(gl_fragcoord.xy, texelfetch(depthtex, fragpos)); vec3 normal = texelfetch(normaltex, fragpos).xyz; vec3 viewdir = -normalize(position); } gl_fragcolor = computelight(position, normal, albedo, specshine.xyz, viewdir, specshine.w, fragpos)

The good Low bandwidth usage Enables many lights Trivial light management Simple shader management No overdraw The Bad Difficult to support transparency Large frame buffer Especially with MSAA

Key Idea Use tiles to solve light assignment For forward shading Forward as usual Shading in fragment shader Access grid for list of lights Benefits No G-Buffer Transparency Can easily combine! Tiled Deferred for opaque geometry Tiled Forward for transparent

1. Build Light Grid 2. Render Scene For each fragment Use geometry attributes Find Tile Loop over lights and accumulate shading Write shading Notes: + FSAA + Transparency + No G-Buffers - Overdraw

in vec3 normal; in vec3 position; in vec2 texcoord; uniform vec3 specular; uniform float shininess; uniform float diffuse; void main() { ivec2 fragpos = ivec2(gl_fragcoord.xy); vec3 albedo = texture2d(albedotex, texcoord) * diffuse; vec3 viewdir = -normalize(position); } gl_fragcolor = computelight(position, normal, albedo, specular, viewdir, shininess, fragpos);

The good Low bandwidth usage Enables many lights Trivial light management Simpler shader management MSAA Transparency No G-Buffers The Bad Overdraw is back Coherency worse Looong shaders

Time (ms) Frame times, GTX 280 120 100 80 60 40 20 Tiled Forward Traditional Deferred Tiled Deferred 0 Frame Number 0 200

Deferred Lighting Factor out specular and diffuse color G-Buffer only store normal and shininess Output diffuse and specular shading Second geometry pass which multiplies colors Light PrePass Much the same But with monochromatic specular highlight Similar performance as deferred Only improves constant factors. Limits shading model even further

Exclude Small Lights (on screen) E.g. small = radius < a tile Use standard deferred

Tiled Shading Olsson and Assarsson, JGT 2011 Deferred Rendering for Current and Future Rendering Pipelines Lauritzen, SIGGRAPH 2010 Course. Bending the Graphics Pipeline, Andersson, SIGGRAPH 2010 Course Improved Tiled Shading Olsson and Billeter Somewhere 2012 ;)

You may now ask them questions.