The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Similar documents
Shading Languages for Graphics Hardware

A Real-Time Procedural Shading System for Programmable Graphics Hardware

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering

A Hardware F-Buffer Implementation

Comparing Reyes and OpenGL on a Stream Architecture

Complex Multipass Shading On Programmable Graphics Hardware

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Hardware Shading: State-of-the-Art and Future Challenges

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

GeForce4. John Montrym Henry Moreton

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Scanline Rendering 2 1/42

GPU Architecture and Function. Michael Foster and Ian Frasch

CS4620/5620: Lecture 14 Pipeline

A Reconfigurable Architecture for Load-Balanced Rendering

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Course Recap + 3D Graphics on Mobile GPUs

Cg: A system for programming graphics hardware in a C-like language

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

2.11 Particle Systems

GPU Memory Model Overview

Mali Demos: Behind the Pixels. Stacy Smith

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

The Rasterization Pipeline

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

GPU Memory Model. Adapted from:

PowerVR Performance Recommendations The Golden Rules. October 2015

Drawing Fast The Graphics Pipeline

Spring 2009 Prof. Hyesoon Kim

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Graphics and Imaging Architectures

1.2.3 The Graphics Hardware Pipeline

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST

Mattan Erez. The University of Texas at Austin

Hardware-driven visibility culling

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Graphics Hardware. Instructor Stephen J. Guy

Rendering Objects. Need to transform all geometry then

PowerVR Hardware. Architecture Overview for Developers

OpenGl Pipeline. triangles, lines, points, images. Per-vertex ops. Primitive assembly. Texturing. Rasterization. Per-fragment ops.

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

E.Order of Operations

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Scheduling the Graphics Pipeline on a GPU

CS 354R: Computer Game Technology

Mobile HW and Bandwidth

Methodology for Lecture. Importance of Lighting. Outline. Shading Models. Brief primer on Color. Foundations of Computer Graphics (Spring 2010)

Spring 2011 Prof. Hyesoon Kim

High-Quality Surface Splatting on Today s GPUs

Point Sample Rendering

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Mattan Erez. The University of Texas at Austin

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game

General-Purpose Computation on Graphics Hardware

frame buffer depth buffer stencil buffer

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Lecture 2. Shaders, GLSL and GPGPU

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Performance Analysis and Culling Algorithms

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Drawing Fast The Graphics Pipeline

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Pipeline Operations. CS 4620 Lecture 14

Profiling and Debugging Games on Mobile Platforms

Rendering approaches. 1.image-oriented. 2.object-oriented. foreach pixel... 3D rendering pipeline. foreach object...

Volume Graphics Introduction

Graphics Processing Unit Architecture (GPU Arch)

Threading Hardware in G80

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Real-time Ray Tracing on Programmable Graphics Hardware

CS452/552; EE465/505. Clipping & Scan Conversion

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research

Monday Morning. Graphics Hardware

Introduction to Shaders.

Drawing Fast The Graphics Pipeline

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Tracking Graphics State for Network Rendering

Why modern versions of OpenGL should be used Some useful API commands and extensions

Shaders (some slides taken from David M. course)

The Graphics Pipeline

GPUs and GPGPUs. Greg Blanton John T. Lubia

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Module 13C: Using The 3D Graphics APIs OpenGL ES

From Brook to CUDA. GPU Technology Conference

OpenGL Programmable Shaders

What s New with GPGPU?

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Transcription:

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering Bill Mark and Kekoa Proudfoot Stanford University http://graphics.stanford.edu/projects/shading/ Motivation for this work Two goals for real-time procedural shading: Q Q Hardware independence Support arbitrarily complex shaders Page 1

Must virtualize HW resources HW resources include: Q Functional units Q Instructions Q Texture units Q Interpolators Q Memory Q Vertex & fragment registers CPU Analogy ƒ Virtual Memory system virtualizes DRAM ƒ Perfect virtualization not necessary Current approaches to virtualization Q Virtualize functional units Q Q Q Examples: Most multi-texture HW 8 combiners @ 25% fill rate on NV20 Implemented with extra HW state Memory virtualization is still a problem Page 2

Current approaches to virtualization Q Multi-pass rendering ƒ ƒ ƒ ƒ Render several times from one viewpoint Each rendering pass performs one part of complete computation Framebuffer can store one temporary RGBA value. Use render-to-texture to store more temporary values. Multiple temporary values This shade tree needs two temporary variables * * + + * + over Page 3

Multiple temporary values This shade tree needs two temporary variables * * + + * + over Multiple temporary values This shade tree needs two temporary variables * * + + * + over Page 4

Conventional multi-pass rendering Rasterizer Framebuffer Conventional multi-pass rendering Texture #1 Framebuffer Page 5

Conventional multi-pass rendering Rasterizer Texture #1 Framebuffer Conventional multi-pass rendering Texture #2 Texture #1 Framebuffer Page 6

Conventional multi-pass rendering Rasterizer Texture #2 Texture #1 Framebuffer Problem #1: Transparency Incorrect results for partially-transparent surfaces that overlap Cause: Temporary storage is shared when it shouldn t be. Page 7

Problem #2: Wastes memory Each additional per-pixel temporary variable requires a screen-sized buffer Problem #2: Wastes memory Each additional per-pixel temporary variable requires a screen-sized buffer Page 8

Problem #2: Wastes memory Each additional per-pixel temporary variable requires a screen-sized buffer or, a bounding-box-sized buffer Problem #3: One output per pass NV20 has 9 RGBA registers, but only one RGBA output (and, it s lower precision/range!) STAGE n RGB combiner ALPHA combiner RGBA Registers STAGE n+1 RGB combiner ALPHA combiner Page 9

The F-Buffer A Rasterization-Order FIFO Buffer Rasterizer F-buffer The F-Buffer F-buffer #1 F-buffer Page 10

The F-Buffer Rasterizer F-buffer #1 F-buffer The F-Buffer F-buffer #1 F-buffer #2 F-buffer Page 11

The F-Buffer Rasterizer F-buffer #1 F-buffer #2 Framebuffer F-Buffer implementation is simple Requirements: ƒ Input: Address counter for texture read ƒ Output: Address counter for framebuffer write ƒ Software or HW to handle overflow Possibly: ƒ Guarantee of consistent rasterization order Page 12

Related ideas Stream Processing [Rixner98] - F-Buffer is a type of stream buffer A-Buffer [Carpenter84] - Associates storage with each fragment R-Buffer [Wittenbrink01] - Order-independent transparency Transparency works with F-Buffer Overlapping fragments get their own storage locations Page 13

F-Buffer can avoid wasting memory F-Buffer Buffer does not have to be screen-sized Multiple outputs per pass with F-Buffer F-Buffer makes it simpler to support multiple outputs ƒ FIFO access ƒ No read-modify-write blend ops ƒ Memory-usage efficiency Extended-precision output is easier for same reasons Page 14

Many variations of F-Buffer Where is F-Buffer stored? How is overflow handled? Is geometry re-rasterized on every pass? (if so, rasterization order must be consistent) When are conventional framebuffer ops performed? Where is F-Buffer stored? On-chip Off-chip graphics DRAM Main system memory Buffer access is linear Æ hybrids are relatively easy Page 15

Options for overflow Q Q Q HW-supported virtual memory ƒ Removes burden from software ƒ Linear access makes it simple Pass burden to software just avoid overflow HW-supported batching of geometry ƒ Break geometry into batches ƒ Render all passes for a batch, then start next batch ƒ HW support for starting/stopping batches ƒ Fragment-granularity batching is simplest, but inefficient if overflow is common How frequent is overflow? Statistics from Quake III shaders* ƒ 10% of shader invokations overflow an F-Buffer sized to 10% of screen ƒ 0.1% of shader invokations overflow an F-Buffer sized to 85% of screen ƒ Rarely, shader invokations overflow an F-Buffer sized to 100% of screen! Note future applications are flexible * For shaders requiring two passes on a single-texture pipeline. Two-pass shaders are used for 53% of fragments. Each shader invokation is counted separately. Page 16

Shading library can handle overflow Added F-Buffer to MesaGL Shading system manages F-Buffer overflows No changes to application Multi-pass rendering s future Functional-unit virtualization in HW is great ƒ Easy for software to use ƒ Trend is clear NV10, NV20, R200 But, multi-pass rendering will still be useful ƒ For further virtualization of functional units ƒ For virtualization of memory/registers Page 17

Conclusion F-Buffer can solve problems with multi-pass rendering ƒ Works with partial transparency ƒ Doesn t waste memory ƒ Can easily preserve multiple results per pass F-Buffer overflow can be handled in HW and/or SW F-Buffer could facilitate evolution towards more general stream-processing architecture Acknowledgements Stanford Shading Group ƒ Pat Hanrahan, Svetoslav Tzvetkov, Pradeep Sen, Ren Ng, Eric Chan, John Owens, David Ebert Sponsors ƒ ATI, NVIDIA, SONY, Sun ƒ DARPA, DOE Useful discussions ƒ Roger Allen, Matt Papakipos Page 18