Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Size: px
Start display at page:

Download "Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis"

Transcription

1 Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

2 Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia 2008 (to appear)

3 Cinematic rendering looks good Image courtesy: Pixar

4 High geometric complexity Image courtesy: Pixar

5 High shading complexity Image courtesy: Pixar

6 Motion blur, Depth-of-field Image courtesy: Pixar

7 Complex lighting Image courtesy: Pixar

8 All this is slow Image courtesy: Pixar

9 Our Goal Make it faster As much as possible in real time Use the GPU Massive available parallelism Fixed-function texture/raster units High Programmability

10 Enter Reyes Developed in 1980s, offline Pixel-accurate smooth surfaces Eye-space shading Stochastic sampling Order-independent transparency

11 Reyes Direct3D 10 Bound/Split Vertex Shade Dice (tessellate) Geometry Shade Displace Rasterize Shade Fragment Shade Sample Depth-buffer Compose Blend

12 Reyes Bound/Split Dice (tessellate) Displace Shade Sample Compose High-quality geometry Convenient surface shading Cinematic quality Well-behaved (?)

13 Reyes Bound/Split Dice (tessellate) Displace Shade Input: Smooth surfaces Output: 0.5 x 0.5 pixel quads (micropolygons) Sample Compose

14 Outline Reyes Subdivision algorithm Subdivision on GPU implementation Results Limitations

15 Outline Reyes Subdivision algorithm Subdivision on GPU implementation Results Limitations

16 Split-Dice Recursively split a surface till dicing makes sense Uniformly sample it to form a grid of micropolygons

17 Split Bound/Cull Diceable? No: Split Yes: Dice Dice

18 Dice Post-split surface Grid Micropolygon

19 Split-Dice is hard Split Recursive, serial Rapid primitive generation/destruction Dice Huge memory demand

20 Can we do this in parallel?

21 Parallel Split Regular computation A lot of independent operations

22 Analogy: A Dynamic work queue A B C D E F G H I A A B C C E E F H Cull Split No Action

23 How can we do these efficiently? Creating new primitives How to dynamically allocate space? Culling unneeded primitives How to avoid fragmentation?

24 Our Choice keep it simple A B C D E F G H I A B C E F H A C E A child primitive is offset by the queue length

25 and get rid of the holes later A B C E F H A C E A B C E F H A C E Scan-based compact is fast! (Sengupta 07)

26 Storage Issues for Dice Too many micropolygons Cannot reject early Screen-space buckets (tiles) In parallel? Ideal bucket size?

27 Outline Reyes Subdivision algorithm Subdivision on GPU implementation Results Limitations

28 Platform NVIDIA GeForce 8800 GTX 16 SIMD Multiprocessors 16KB shared memory 768 MB memory (no cache) NVIDIA CUDA 1.1 Grids/Blocks/Threads OpenGL interface

29 Implementation Details Input choice: Bicubic Bézier Surfaces Only affects implementation View Dependent Subdivision every frame Single CPU-GPU transfer Final micropolygons written to a VBO Flat-shaded and displayed

30 Kernels Implemented Dice Regular, symmetric, parallel 256 threads per patch Primitive information in shared memory Bound/Split Hard to ensure efficiency

31 Bound/Split: Efficiency Goals Memory Coherence Off-chip memory accesses must be efficient Computational Efficiency Hardware SIMD must be maximally utilized

32 Memory Coherence during Split Compact work-queue after each iteration Primitives always contiguous in memory Structure-Of-Arrays representation Primitive attributes adjacent in memory 99.5% of all accesses fully coalesced

33 SIMD utilization during split Intra-Primitive parallelism Independent control points Negligible divergence Vectorized Split 16 Threads per patch 32 threads (1 warp) 90.16% of all branches SIMD coherent

34 Outline Reyes Subdivision algorithm Subdivision on GPU implementation Results Limitations

35 Results - Killeroo grids 5 levels of subdivision Bound/Split: 6.99 ms Dice: 7.21ms frames/second (19.92 with 16x AA) Killeroo model courtesy: Headus Inc.

36 Results - Teapot 4823 grids 11 levels of subdivision Bound/Split: 3.46 ms Dice: 2.42 ms frames/second (30.02 with 16x AA)

37 Time (ms) Results Random scenes Subdivision time proportional to number 25 of micropolygons Number of grids

38 Render time Breakdown 100% 75% Rendering lots of small triangles Rendering 50% 25% CUDA-OpenGL transfer Normal generation Subdivision 0% Teapot Killeroo Random Model (30 patches) Should ideally be zero: CUDA limitation

39 Subdivision Time (ms) Memory Usage (MB) Results - Screen-space buckets Acceptable performance, Small memory footprint Bucket width (pixels) Subdivision time Memory Usage (max.) Memory Usage (avg.)

40 Outline Reyes Subdivision algorithm Subdivision on GPU implementation Results Limitations

41 Limitations Can t split and dice in parallel Uniform dicing Cracks

42 Limitations Uniform dicing

43 Limitations - Cracks

44 Limitations - Cracks

45 Conclusions Breadth-first recursive subdivision Suits GPUs Works fast Dicing Programmable tessellation is fast 500M micropolygons/sec It is time to experiment with alternate graphics pipelines

46 Reyes in real-time rendering Visually superior to polygon pipeline Regular, highly parallel workload Extremely well-studied Good candidate for a real-time system?

47 Future Work Cracks, Displacement mapping Rest of Reyes Offline quality shading Interactive lighting Parallel Stochastic Sampling (Wei 2008) A-buffer (Myers 2007)

48 Thanks to Feedback and suggestions Per Christensen, Charles Loop, Dave Luebke, Matt Pharr and Daniel Wexler Financial support DOE Early Career PI Award National Science Foundation SciDAC Institute for Ultrascale Visualization Equipment support from NVIDIA

49 Teapot Video Real-time footage

50 Killeroo Video Real-time footage

51 Real-Time Reyes-Style Adaptive Surface Subdivision BACKUP SLIDES

52 CUDA Thread Structure Image courtesy: NVIDIA CUDA Programming Guide, 1.1

53 CUDA Memory Architecture Image courtesy: NVIDIA CUDA Programming Guide, 1.1

54 Displacement Mapping Fairly simple if cracks can be avoided Displace adjacent grids together

55 Shading & Lighting Interactive preview Lpics / Lightspeed Shaders Intelligent textures File access Shadows

56 Composite/Filter Stuff Stochastic sampling 20x speedup on a GPU (Wei 2008) A-buffer

57 Special Effects Motion Blur and DOF Global Illumination Ambient Occlusion

Real-Time Reyes Programmable Pipelines and Research Challenges

Real-Time Reyes Programmable Pipelines and Research Challenges Real-Time Reyes Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis This talk Parallel Computing for Graphics: In Action What does it take to write a programmable

More information

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis Parallelism in Interactive Graphics Well-expressed in hardware as well as APIs Consistently

More information

Real-Time Reyes-Style Adaptive Surface Subdivision

Real-Time Reyes-Style Adaptive Surface Subdivision Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney University of California, Davis John D. Owens University of California, Davis Figure 1: Flat-shaded OpenGL renderings of Reyes-subdivided

More information

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 13: Reyes Architecture and Implementation Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) A gallery of images rendered using Reyes Image credit: Lucasfilm (Adventures

More information

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University A Real-time Micropolygon Rendering Pipeline Kayvon Fatahalian Stanford University Detailed surfaces Credit: DreamWorks Pictures, Shrek 2 (2004) Credit: Pixar Animation Studios, Toy Story 2 (1999) Credit:

More information

Comparing Reyes and OpenGL on a Stream Architecture

Comparing Reyes and OpenGL on a Stream Architecture Comparing Reyes and OpenGL on a Stream Architecture John D. Owens Brucek Khailany Brian Towles William J. Dally Computer Systems Laboratory Stanford University Motivation Frame from Quake III Arena id

More information

Reyes Rendering on the GPU

Reyes Rendering on the GPU Reyes Rendering on the GPU Martin Sattlecker Graz University of Technology Markus Steinberger Graz University of Technology Abstract In this paper we investigate the possibility of real-time Reyes rendering

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

Real-Time Patch-Based Sort-Middle Rendering on Massively Parallel Hardware

Real-Time Patch-Based Sort-Middle Rendering on Massively Parallel Hardware Real-Time Patch-Based Sort-Middle Rendering on Massively Parallel Hardware Charles Loop 1 and Christian Eisenacher 2 1 Microsoft Research 2 University of Erlangen-Nuremberg May 2009 Technical Report MSR-TR-2009-83

More information

Lecture 12: Advanced Rendering

Lecture 12: Advanced Rendering Lecture 12: Advanced Rendering CSE 40166 Computer Graphics Peter Bui University of Notre Dame, IN, USA November 30, 2010 Limitations of OpenGL Pipeline Rendering Good Fast, real-time graphics rendering.

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Scheduling the Graphics Pipeline on a GPU

Scheduling the Graphics Pipeline on a GPU Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Today Real-time 3D graphics workload metrics Scheduling the graphics pipeline on a modern GPU Quick aside: tessellation Triangle

More information

GPU Task-Parallelism: Primitives and Applications. Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis

GPU Task-Parallelism: Primitives and Applications. Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis GPU Task-Parallelism: Primitives and Applications Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis This talk Will introduce task-parallelism on GPUs What is it? Why is it important?

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V: Principles in Computer Architecture Parallelism and Locality Fall 2008 Lecture 10 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should we

More information

RenderAnts: Interactive REYES Rendering on GPUs

RenderAnts: Interactive REYES Rendering on GPUs RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo Zhejiang University Tsinghua University Microsoft Research Asia Abstract We present RenderAnts,

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

A Trip Down The (2011) Rasterization Pipeline

A Trip Down The (2011) Rasterization Pipeline A Trip Down The (2011) Rasterization Pipeline Aaron Lefohn - Intel / University of Washington Mike Houston AMD / Stanford 1 This talk Overview of the real-time rendering pipeline available in ~2011 corresponding

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 11 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should

More information

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016 Ray Tracing Computer Graphics CMU 15-462/15-662, Fall 2016 Primitive-partitioning vs. space-partitioning acceleration structures Primitive partitioning (bounding volume hierarchy): partitions node s primitives

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Real-Time Graphics Architecture

Real-Time Graphics Architecture Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Geometry Outline Vertex and primitive operations System examples emphasis on clipping Primitive

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Memory-efficient Adaptive Subdivision for Software Rendering on the GPU

Memory-efficient Adaptive Subdivision for Software Rendering on the GPU Memory-efficient Adaptive Subdivision for Software Rendering on the GPU Marshall Plan Scholarship End Report Thomas Weber Vienna University of Technology April 30, 2014 Abstract The adaptive subdivision

More information

The GPGPU Programming Model

The GPGPU Programming Model The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming

More information

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming Chapter IV Fragment Processing and Output Merging Fragment Processing The per-fragment attributes may include a normal vector, a set of texture coordinates, a set of color values, a depth, etc. Using these

More information

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices Hearn, Baker, Carithers Raster Display Transmissive vs. Emissive Display anode

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 12 GPU Architecture (NVIDIA G80) Mattan Erez The University of Texas at Austin Outline 3D graphics recap and

More information

OpenGL and RenderMan Rendering Architectures

OpenGL and RenderMan Rendering Architectures HELSINKI UNIVERSITY OF TECHNOLOGY 16.4.2002 Telecommunications Software and Multimedia Laboratory T-111.500 Seminar on Computer Graphics Spring 2002: Advanced Rendering Techniques OpenGL and RenderMan

More information

Ciril Bohak. - INTRODUCTION TO WEBGL

Ciril Bohak. - INTRODUCTION TO WEBGL 2016 Ciril Bohak ciril.bohak@fri.uni-lj.si - INTRODUCTION TO WEBGL What is WebGL? WebGL (Web Graphics Library) is an implementation of OpenGL interface for cmmunication with graphical hardware, intended

More information

REYES REYES REYES. Goals of REYES. REYES Design Principles

REYES REYES REYES. Goals of REYES. REYES Design Principles You might be surprised to know that most frames of all Pixar s films and shorts do not use a global illumination model for rendering! Instead, they use Renders Everything You Ever Saw Developed by Pixar

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline sequence of operations to generate an image using object-order processing primitives processed one-at-a-time

More information

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline sequence of operations to generate an image using object-order processing primitives processed one-at-a-time

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

EE 4702 GPU Programming

EE 4702 GPU Programming fr 1 Final Exam Review When / Where EE 4702 GPU Programming fr 1 Tuesday, 4 December 2018, 12:30-14:30 (12:30 PM - 2:30 PM) CST Room 226 Tureaud Hall (Here) Conditions Closed Book, Closed Notes Bring one

More information

Ray Casting of Trimmed NURBS Surfaces on the GPU

Ray Casting of Trimmed NURBS Surfaces on the GPU Ray Casting of Trimmed NURBS Surfaces on the GPU Hans-Friedrich Pabst Jan P. Springer André Schollmeyer Robert Lenhardt Christian Lessig Bernd Fröhlich Bauhaus University Weimar Faculty of Media Virtual

More information

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization Jordi Roca Victor Moya Carlos Gonzalez Vicente Escandell Albert Murciego Agustin Fernandez, Computer Architecture

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

3D Modeling: Surfaces

3D Modeling: Surfaces CS 430/536 Computer Graphics I 3D Modeling: Surfaces Week 8, Lecture 16 David Breen, William Regli and Maxim Peysakhov Geometric and Intelligent Computing Laboratory Department of Computer Science Drexel

More information

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 4: Processing Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today Key per-primitive operations (clipping, culling) Various slides credit John Owens, Kurt Akeley,

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

The Need for Programmability

The Need for Programmability Visual Processing The next graphics revolution GPUs Graphics Processors have been engineered for extreme speed - Highly parallel pipelines exploits natural parallelism in pixel and vertex processing -

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Efficient GPU Rendering of Subdivision Surfaces. Tim Foley,

Efficient GPU Rendering of Subdivision Surfaces. Tim Foley, Efficient GPU Rendering of Subdivision Surfaces Tim Foley, 2017-03-02 Collaborators Activision Wade Brainerd Stanford Matthias Nießner NVIDIA Manuel Kraemer Henry Moreton 2 Subdivision surfaces are a powerful

More information

2.11 Particle Systems

2.11 Particle Systems 2.11 Particle Systems 320491: Advanced Graphics - Chapter 2 152 Particle Systems Lagrangian method not mesh-based set of particles to model time-dependent phenomena such as snow fire smoke 320491: Advanced

More information

Deus Ex is in the Details

Deus Ex is in the Details Deus Ex is in the Details Augmenting the PC graphics of Deus Ex: Human Revolution using DirectX 11 technology Matthijs De Smedt Graphics Programmer, Nixxes Software Overview Introduction DirectX 11 implementation

More information

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0) Acceleration Techniques V1.2 Anthony Steed Based on slides from Celine Loscos (v1.0) Goals Although processor can now deal with many polygons (millions), the size of the models for application keeps on

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

CS452/552; EE465/505. Clipping & Scan Conversion

CS452/552; EE465/505. Clipping & Scan Conversion CS452/552; EE465/505 Clipping & Scan Conversion 3-31 15 Outline! From Geometry to Pixels: Overview Clipping (continued) Scan conversion Read: Angel, Chapter 8, 8.1-8.9 Project#1 due: this week Lab4 due:

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Rendering Subdivision Surfaces Efficiently on the GPU

Rendering Subdivision Surfaces Efficiently on the GPU Rendering Subdivision Surfaces Efficiently on the GPU Gy. Antal, L. Szirmay-Kalos and L. A. Jeni Department of Algorithms and their Applications, Faculty of Informatics, Eötvös Loránd Science University,

More information

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1 Programmable GPUs Real Time Graphics Virtua Fighter 1995 (SEGA Corporation) NV1 Dead or Alive 3 2001 (Tecmo Corporation) Xbox (NV2A) Nalu 2004 (NVIDIA Corporation) GeForce 6 Human Head 2006 (NVIDIA Corporation)

More information

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

Rendering Objects. Need to transform all geometry then

Rendering Objects. Need to transform all geometry then Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform

More information

The Graphics Pipeline

The Graphics Pipeline The Graphics Pipeline Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel But you really want shadows, reflections, global illumination, antialiasing

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Approximate Catmull-Clark Patches. Scott Schaefer Charles Loop

Approximate Catmull-Clark Patches. Scott Schaefer Charles Loop Approximate Catmull-Clark Patches Scott Schaefer Charles Loop Approximate Catmull-Clark Patches Scott Schaefer Charles Loop Catmull-Clark Surface ACC-Patches Polygon Models Prevalent in game industry Very

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Parallel Reyes-style Adaptive Subdivision with Bounded Memory Usage Permalink https://escholarship.org/uc/item/8kn7c65q Authors Weber, Thomas Wimmer,

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray

More information

Why modern versions of OpenGL should be used Some useful API commands and extensions

Why modern versions of OpenGL should be used Some useful API commands and extensions Michał Radziszewski Why modern versions of OpenGL should be used Some useful API commands and extensions Timer Query EXT Direct State Access (DSA) Geometry Programs Position in pipeline Rendering wireframe

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Motivation MGB Agenda. Compression. Scalability. Scalability. Motivation. Tessellation Basics. DX11 Tessellation Pipeline

Motivation MGB Agenda. Compression. Scalability. Scalability. Motivation. Tessellation Basics. DX11 Tessellation Pipeline MGB 005 Agenda Motivation Tessellation Basics DX Tessellation Pipeline Instanced Tessellation Instanced Tessellation in DX0 Displacement Mapping Content Creation Compression Motivation Save memory and

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

LOD and Occlusion Christian Miller CS Fall 2011

LOD and Occlusion Christian Miller CS Fall 2011 LOD and Occlusion Christian Miller CS 354 - Fall 2011 Problem You want to render an enormous island covered in dense vegetation in realtime [Crysis] Scene complexity Many billions of triangles Many gigabytes

More information

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences Hardware Accelerated Volume Visualization Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences A Real-Time VR System Real-Time: 25-30 frames per second 4D visualization: real time input of

More information

TSBK03 Screen-Space Ambient Occlusion

TSBK03 Screen-Space Ambient Occlusion TSBK03 Screen-Space Ambient Occlusion Joakim Gebart, Jimmy Liikala December 15, 2013 Contents 1 Abstract 1 2 History 2 2.1 Crysis method..................................... 2 3 Chosen method 2 3.1 Algorithm

More information

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies How to Work on Next Gen Effects Now: Bridging DX10 and DX9 Guennadi Riguer ATI Technologies Overview New pipeline and new cool things Simulating some DX10 features in DX9 Experimental techniques Why This

More information

Rendering Grass with Instancing in DirectX* 10

Rendering Grass with Instancing in DirectX* 10 Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position

More information

Advanced Shading and Texturing

Advanced Shading and Texturing Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Advanced Shading and Texturing 1 Topics Features Bump mapping Environment mapping Shadow

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

Point based Rendering

Point based Rendering Point based Rendering CS535 Daniel Aliaga Current Standards Traditionally, graphics has worked with triangles as the rendering primitive Triangles are really just the lowest common denominator for surfaces

More information

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Executive Summary The NVIDIA Quadro2 line of workstation graphics solutions is the first of its kind to feature hardware support for

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Parallel Programming for Graphics

Parallel Programming for Graphics Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2016 Lectures 10 & 11 October 10th & 12th, 2016 1. Put a 3D primitive in the World Modeling 2. Figure out what color it should be 3. Position relative to the

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic.

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic. Shading Models There are two main types of rendering that we cover, polygon rendering ray tracing Polygon rendering is used to apply illumination models to polygons, whereas ray tracing applies to arbitrary

More information

COMPUTER GRAPHICS COURSE. Rendering Pipelines

COMPUTER GRAPHICS COURSE. Rendering Pipelines COMPUTER GRAPHICS COURSE Rendering Pipelines Georgios Papaioannou - 2014 A Rendering Pipeline Rendering or Graphics Pipeline is the sequence of steps that we use to create the final image Many graphics/rendering

More information

NVIDIA Parallel Nsight. Jeff Kiel

NVIDIA Parallel Nsight. Jeff Kiel NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more

More information

Other Rendering Techniques CSE 872 Fall Intro You have seen Scanline converter (+z-buffer) Painter s algorithm Radiosity CSE 872 Fall

Other Rendering Techniques CSE 872 Fall Intro You have seen Scanline converter (+z-buffer) Painter s algorithm Radiosity CSE 872 Fall Other Rendering Techniques 1 Intro You have seen Scanline converter (+z-buffer) Painter s algorithm Radiosity 2 Intro Some more Raytracing Light maps Photon-map Reyes Shadow maps Sahdow volumes PRT BSSRF

More information