A Real-Time Procedural Shading System for Programmable Graphics Hardware

Similar documents
Complex Multipass Shading On Programmable Graphics Hardware

Shading Languages for Graphics Hardware

Tutorial on GPU Programming. Joong-Youn Lee Supercomputing Center, KISTI

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Comparing Reyes and OpenGL on a Stream Architecture

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.

Hardware Shading: State-of-the-Art and Future Challenges

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Programmable Graphics Hardware

Chromatic Aberration. CEDEC 2001 Tokyo, Japan

Cg: A system for programming graphics hardware in a C-like language

Windowing System on a 3D Pipeline. February 2005

Graphics Processing Unit Architecture (GPU Arch)

What s New with GPGPU?

PixelFlow. Marc Olano SGI

Real-Time Graphics Architecture

Anselmo Lastra, Steve Molnar, Marc Olano, Yulan Wang. Marc Olano, Yulan Wang University of North Carolina at Chapel Hill

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Threading Hardware in G80

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

SGI OpenGL Shader. Marc Olano SGI

CS427 Multicore Architecture and Parallel Computing

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

A Reconfigurable Architecture for Load-Balanced Rendering

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Scheduling the Graphics Pipeline on a GPU

GeForce4. John Montrym Henry Moreton

Graphics Processing Units (GPUs) V1.2 (January 2010) Outline. High-Level Pipeline. 1. GPU Pipeline:

Lecture 10: Part 1: Discussion of SIMT Abstraction Part 2: Introduction to Shading

Monday Morning. Graphics Hardware

Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling

PyFX: A framework for programming real-time effects

Introduction to Programmable GPUs CPSC 314. Introduction to GPU Programming CS314 Gordon Wetzstein, 09/03/09

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering

Graphics Hardware. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 2/26/07 1

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Introduction to Programmable GPUs CPSC 314. Real Time Graphics

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

GpuPy: Accelerating NumPy With a GPU

3D buzzwords. Adding programmability to the pipeline 6/7/16. Bandwidth Gravity of modern computer systems

Graphics Processing Units (GPUs) V2.2 (January 2013) Jan Kautz, Anthony Steed, Tim Weyrich

The Pixel-Planes Family of Graphics Architectures

A Hardware F-Buffer Implementation

Basics of GPU-Based Programming

From Brook to CUDA. GPU Technology Conference

In-Game Special Effects and Lighting

Shading Languages Tim Foley

Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware

Teaching Cg. This presentation introduces Cg ( C for graphics ) and explains why it would be useful when teaching a computer graphics course.

Dynamic Code Generation for Realtime Shaders

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

Advanced Shading and Texturing

NVIDIA nfinitefx Engine: Programmable Pixel Shaders

Lecture 2. Shaders, GLSL and GPGPU

1 Hardware virtualization for shading languages Group Technical Proposal

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

ASHLI. Multipass for Lower Precision Targets. Avi Bleiweiss Arcot J. Preetham Derek Whiteman Joshua Barczak. ATI Research Silicon Valley, Inc

Shading Languages. Ari Silvennoinen Apri 12, 2004

Chapter 1 Introduction. Marc Olano

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

GPU Memory Model Overview

Ashli Advanced Shading Language Interface

Shaders (some slides taken from David M. course)

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

! Readings! ! Room-level, on-chip! vs.!

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Shaders. Slide credit to Prof. Zwicker

A Data-Parallel Genealogy: The GPU Family Tree

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Introduction to Modern GPU Hardware

The Application Stage. The Game Loop, Resource Management and Renderer Design

Lecture 10: Shading Languages. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Sung-Eui Yoon ( 윤성의 )

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Programming Graphics Hardware

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

Real-Time Graphics Architecture

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

Outline of Lecture. Real-Time High Quality Rendering. Geometry or Vertex Pipeline. Basic Hardware Pipeline. Pixel or Fragment Pipeline

Graphics Hardware. Instructor Stephen J. Guy

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

Vertex and Pixel Shaders:

Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices

Chapter 1. Introduction Marc Olano

Introduction. Chapter What Is Cg?

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

Transcription:

A Real-Time Procedural Shading System for Programmable Graphics Hardware Kekoa Proudfoot William R. Mark Pat Hanrahan Stanford University Svetoslav Tzvetkov NVIDIA Corporation http://graphics.stanford.edu/projects/shading/

Programmable graphics hardware Register-machine based processing units Programmable Vertex Hardware Programmable Fragment Hardware e.g. NV vertex programs e.g. DirectX8 vertex shaders e.g. ATI vertex shaders 100s of instructions floating-point complex math ops e.g. NV register combiners e.g. DirectX8 pixel shaders e.g. ATI fragment shaders 10s of instructions fixed-point texture mapping Many interesting effects possible

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs DP4 o[hpos].x, v[opos], c[0]; DP4 o[hpos].y, v[opos], c[1]; DP4 o[hpos].z, v[opos], c[2]; DP4 o[hpos].w, v[opos], c[3]; DP3 R0.x, v[nrml], c[4]; DP3 R0.y, v[nrml], c[5]; DP3 R0.z, v[nrml], c[6]; DP3 R0.w, R0, R0; RSQ R0.w, R0.w; MUL R0, R0, R0.w; DP3 R0.x, R0, c[7]; LIT R0, R0; MUL o[col0].xyz, R0.y, c[8]; NV vertex programs

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners glcombinerinputnv(gl_combiner0_n V, GL_RGB, GL_VARIABLE_A_NV, GL_TEXTURE0_ARB, GL_EXPAND_NORMAL_NV, GL_RGB); glcombinerinputnv(...); glcombinerinputnv(...); glcombinerinputnv(...); glcombineroutputnv(gl_combiner0_ NV, GL_RGB, GL_SPARE0_NV, GL_DISCARD_NV, GL_DISCARD_NV, GL_NONE, GL_NONE, GL_TRUE, GL_FALSE, GL_FALSE);... NV register combiners

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners ATI vertex shaders v0 = glgensymbolsati(...); glbeginvertexshaderati(...); c0 = glgensymbolsati(...); glsetlocalconstant(...); r0 = glgensymbolsati(...); r1 = glgensymbolsati(...); glshaderop2ati(...); glshaderop1ati(...); glextractcomponentati(...); glshaderop2ati(...); glshaderop2ati(...);... glendvertexshaderati(); ATI vertex shaders

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners ATI vertex shaders ATI fragment shaders glbeginfragmentshaderati(...); glpasstexcoordatix(...); glpasstexcoordatix(...);... glcolorfragmentop2atix(gl_mul_at IX, GL_REG_0_ATIX, GL_RED, GL_NONE, GL_REG_4_ATIX, GL_NONE, GL_NONE, GL_REG_3_ATIX, GL_NONE, GL_NONE); glalphafragmentop2atix(...);... glendfragmentshaderati(); ATI fragment shaders

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners ATI vertex shaders ATI fragment shaders Multipass OpenGL glactivetexturearb(gl_texture0_a RB); gltexenvi(gl_texture_env, GL_TEXTURE_ENV_MODE, GL_ADD); glactivetexturearb(gl_texture1_a RB); gltexenvi(gl_texture_env,...); gltexenvi(gl_texture_env,...); gltexenvi(gl_texture_env,...); glenable(gl_blend); glblendfund(gl_dst_color, GL_ZERO);... Multipass OpenGL

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners ATI vertex shaders ATI fragment shaders Multipass OpenGL Host vertex code for (i = 0; i < n_verts; i++) { // perform vertex processing... } Host vertex code

Hardware is difficult to use Two problems: Programming interfaces are low-level Functionality varies between chipsets User must support: NV vertex programs NV register combiners ATI vertex shaders ATI fragment shaders Multipass OpenGL Host vertex code Next-generation hardware Next-generation hardware

Shading languages Solution: Shading languages provide an easy-to-use abstraction of programmability e.g. RenderMan, Hanrahan and Lawson SIGGRAPH 90 Apply shading languages to real-time systems

Related work Olano and Lastra, SIGGRAPH 98 Shading language specialized for PixelFlow id Software s Quake 3 shader scripts Scripting language for shading computations Peercy et al., SIGGRAPH 00 SIMD processor abstraction for multipass rendering McCool s SMASH API API for specifying shading computations

SIMD processor model Single instruction per pass Many passes Fragments only Fragment Processing frame buffer 1 instruction (e.g. FB = TEX over FB)

Programmable processor model Many instructions per pass Fewer passes often just one Vertices and fragments Vertex Processing ~100 instructions Fragment Processing ~10 instructions frame buffer

Programmable model advantages 1. Support for vertex operations 2. Reduced off-chip bandwidth 3. Better match to current hardware

System design goals 1. Implement a shading language 2. Support a variety of hardware 3. Generate fast and efficient code 4. Create a framework for future hardware design

Multiple computation frequencies Constant Per Vertex Per Primitive Group Evaluated less often More complex math Floating point Per Fragment Evaluated more often Simpler math Fixed point

System overview Compiler Front End Shading language abstraction: Surface and light shaders Compiler Back End

System overview Compiler Front End Compiler front end: 1. Combines surface and light shaders 2. Maps shaders to intermediate abstraction Compiler Back End

System overview Compiler Front End Programmable pipeline abstraction: Pipeline programs Compiler Back End

System overview Compiler Front End Compiler Back End Compiler back end: 1. Modular design 2. Maps pipeline programs to hardware

System overview Compiler Front End Compiler Back End Hardware-specific shader object code

Programmable pipeline abstraction geometry w/ shader params Primitive Group Processing unlimited instructions Vertex Processing unlimited instructions Fragment Processing unlimited instructions Extension of programmable processor model Unified framework for all computation frequencies Virtualization of hardware resources Conceptually only one rendering pass

Accommodating today s hardware 1. No true fragment floating-point type Implement with fixed-point instead 2. Not every operator is available at every computation frequency e.g. no vertex textures e.g. no complex per-fragment ops 3. Not every operator is available on all hardware e.g. cubemaps and 3D textures

Shading language example surface shader float4 anisotropic_ball (texref anisotex, texref star) { // generate texture coordinates perlight float4 uv = { center(dot(b, E)), center(dot(b, L)), 0, 1 }; // compute reflection coefficient perlight float4 fd = max(dot(n, L), 0); perlight float4 fr = fd * texture(anisotex, uv); // compute amount of reflected light float4 lightcolor = 0.2 * Ca + integrate(cl * fr); } // modulate reflected light color float4 uv_base = { center(pobj[2]), center(pobj[0]), 0, 1 }; return lightcolor * texture(star, uv_base);

Computation frequency analysis surface shader float4 anisotropic_ball (texref anisotex, texref star) { // generate texture coordinates perlight float4 uv = { center(dot(b, E)), center(dot(b, L)), 0, 1 }; // compute reflection coefficient perlight float4 fd = max(dot(n, L), 0); perlight float4 fr = fd * texture(anisotex, uv); // compute amount of reflected light float4 lightcolor = 0.2 * Ca + integrate(cl * fr); } // modulate reflected light color float4 uv_base = { center(pobj[2]), center(pobj[0]), 0, 1 }; return lightcolor * texture(star, uv_base);

Analysis and optimization Global optimization All functions are inlined Surfaces and lights are compiled together Single basic block No data-dependent loops or conditionals All shading computations are reduced to a pair of expressions

Retargetable compiler back end Two goals: Provide support for many hardware platforms Virtualize hardware resources

Virtualizing hardware resources Multipass rendering Arbitrarily-complex fragment computations Reduces vertex resource requirements Host processing Useful when vertex operations are very complex

Back end module interactions 1. Pass-related interactions e.g. partition vertex code by pass 2. Resource constraint interactions e.g. fall back to host if vertex-processing resources insufficient 3. Data flow interactions e.g. define data formats for computed values

Current back end modules Host: C code with external compiler Internal x86 assembler Hardware: Multipass OpenGL with extensions NVIDIA vertex programs NVIDIA register combiners ATI vertex and fragment shaders* Stanford Imagine processor* * recent additions

Vertex back end experiences Current vertex processing architectures: Clean design Easy to target True for both ATI and NVIDIA Different APIs Equivalent functionality ATI hides register allocation See paper for compilation details

Fragment back end experiences Multipass OpenGL with extensions Relatively easy to target using tree-matching NVIDIA register combiners Tree matching does not work! Use VLIW compilation techniques instead Hardware is not orthogonal Compilation quality is still very good Current back end: single pass only See also: HWWS 01 paper

Demo Demo machine info: 866 MHz Pentium III NVIDIA GeForce3 All shaders compiled using NVIDIA vertex program and NVIDIA register combiner back ends

Summary Hardware Very capable, but difficult to use Provide a shading language interface Stanford shading system Shading language designed for hardware Programmable pipeline abstraction Retargetable compiler back end System runs in real time on today s hardware

Final thoughts Past Fixed-function pipelines Feature-based interfaces Present Limited programmability Assembly-language interfaces Future Generalized programmability Shading language interfaces Hardware designed for compiled code

Acknowledgements Stanford Pradeep Sen, Ren Ng, Eric Chan, John Owens, Bill Dally, Philipp Slussalek Bowling pin data Anselmo Lastra, Lawrence Kesteloot, Frederik Fatemi Fish data Xiaoyuan Tu, Homan Igehy Sponsors ATI, SUN, SGI, SONY, NVIDIA DARPA Web site and downloads http://graphics.stanford.edu/projects/shading/