Interactive Ray Tracing: Higher Memory Coherence

Similar documents
Level-of-Detail Techniques and Cache-Coherent Layouts

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies

Cache-Oblivious Ray Reordering

Cache-Oblivious Ray Reordering

Fast BVH Construction on GPUs

Cache-Oblivious Ray Reordering

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies

Massive Model Visualization using Real-time Ray Tracing

Quick-VDR: Interactive View-Dependent Rendering of Massive Models

RECENT advances in acquisition, modeling, and simulation

Interactive Visualization and Collision Detection using Dynamic Simplification and Cache-Coherent Layouts

CS780: Topics in Computer Graphics

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Sung-Eui Yoon ( 윤성의 )

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies. Tae-Joon Kim, Bochang Moon, Duksu Kim, Sung-Eui Yoon, Member, IEEE

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

Row Tracing with Hierarchical Occlusion Maps

ReduceM: Interactive and Memory Efficient Ray Tracing of Large Models

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Part IV. Review of hardware-trends for real-time ray tracing

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Stackless Ray Traversal for kd-trees with Sparse Boxes

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

THE complexity of polygonal models has been increasing

Accelerating Ray-Tracing

Real Time Ray Tracing

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Interactive View-Dependent Rendering with Conservative Occlusion Culling in Complex Environments

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II

Comparison of hierarchies for occlusion culling based on occlusion queries

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics

Project Gotham Racing 2 (Xbox) Real-Time Rendering. Microsoft Flighsimulator. Halflife 2

Razor: An Architecture for Dynamic Multiresolution Ray Tracing

Ray-Box Culling for Tree Structures

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Real-time ray tracing

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

Ray Tracing III. Wen-Chieh (Steve) Lin National Chiao-Tung University

Computer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I

Computer Graphics Ray Casting. Matthias Teschner

Interactive View-Dependent Rendering with Conservative Occlusion Culling in Complex Environments

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

Intro to Ray-Tracing & Ray-Surface Acceleration

Lecture 11: Ray tracing (cont.)

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

A distributed rendering architecture for ray tracing large scenes on commodity hardware. FlexRender. Bob Somers Zoe J.

SUMMARY. CS380: Introduction to Computer Graphics Ray tracing Chapter 20. Min H. Kim KAIST School of Computing 18/05/29. Modeling

Ray Tracing with Sparse Boxes

Acceleration Structure for Animated Scenes. Copyright 2010 by Yong Cao

Single Scattering in Refractive Media with Triangle Mesh Boundaries

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Frédo Durand, George Drettakis, Joëlle Thollot and Claude Puech

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

Acceleration Data Structures

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Building a Fast Ray Tracer

Realtime Ray Tracing

Fast Hard and Soft Shadow Generation on Complex Models using Selective Ray Tracing

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Spatial Data Structures

Enabling immersive gaming experiences Intro to Ray Tracing

Spatial Data Structures

Interactive Isosurface Ray Tracing of Large Octree Volumes

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

Cache-Efficient Layouts of Bounding Volume Hierarchies

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Accelerating Shadow Rays Using Volumetric Occluders and Modified kd-tree Traversal

Spatial Data Structures

Point based Rendering

Ray tracing. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 3/19/07 1

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays

Ray Tracing Acceleration Data Structures

Logistics. CS 586/480 Computer Graphics II. Questions from Last Week? Slide Credits

Spatial Data Structures

A Hardware Pipeline for Accelerating Ray Traversal Algorithms on Streaming Processors

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Spatial Data Structures

A Developer s Survey of Polygonal Simplification algorithms. CS 563 Advanced Topics in Computer Graphics Fan Wu Mar. 31, 2005

Acceleration Data Structures for Ray Tracing

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur

Geometric Modeling. Bing-Yu Chen National Taiwan University The University of Tokyo

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

LOD and Occlusion Christian Miller CS Fall 2011

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Sung-Eui Yoon ( 윤성의 )

Massive model visualization: An investigation into spatial partitioning

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Transcription:

Interactive Ray Tracing: Higher Memory Coherence http://gamma.cs.unc.edu/rt Dinesh Manocha (UNC Chapel Hill) Sung-Eui Yoon (Lawrence Livermore Labs)

Interactive Ray Tracing Ray tracing is naturally sub-linear with scene size Ray tracing naturally supports good shading Ray tracing maps well to multi-core architectures [Shirley 2006]

Interactive Ray Tracing Ray tracing is naturally sub-linear with scene size Ray tracing naturally supports good shading Ray tracing maps well to multi-core architectures Moore s Law is a natural boon for ray tracing: 2015 prediction -> 2048^2 with 16 samples per pixel [Shirley 2006]

Interactive Ray Tracing Ray tracing is natually sub-linear with scene size Ray tracing naturally supports good shading Ray tracing maps well to multi-core architectures Moore s Law is a natural boon for ray tracing: 2015 prediction -> 2048^2 with 16 samples per pixel But.

Low Growth Rate of Memory Bandwidth Growth rate during 1993 2005 50 45 40 35 30 25 20 15 10 5 0 Disk access speed RAM access speed CPU speed Processor speed improvements are not sufficient Courtesy: http://www.hcibook.com/e3/online/moores-law/

Applications need to have high memory coherence Memory hierarchies

One Driving Application: Massive models Model: geometric representation of object Many sources: Scientific simulation Scanned objects CAD

Massive models: Memory Overhead Size: Tens or hundreds of millions of triangles (previous slide: 100M, 372M, 82M) that s 13GB just raw data! Datasets with billions of polygons are becoming available Naïve rendering is not fast enough Still want to display in real time

Rasterization Standard method for rendering Draw all triangles on a raster:

Rasterization Advantage: Use graphics hardware / GPUs (fast, growing faster than Moore s Law) 1-2 orders of magnitude faster than ray tracing Disadvantages: Local illumination Performance ~ linear to # triangles

Rasterization Advantage: Use graphics hardware / GPUs (fast, growing faster than Moore s Law) 1-2 orders of magnitude faster than ray tracing Disadvantages: Local illumination Performance ~ linear to # triangles Improved algorithms for sub-linear performance

Rasterization Current GPUs can render 100-400M triangles per second

Rasterization Current GPUs can render 100-400M triangles per second Assumes the triangles are in GPU memory

Rasterization Current GPUs can render 100-400M triangles per second Assumes the triangles are in GPU memory CPU-GPU bandwidth is a limitation

Rasterization Current GPUs can render 100-400M triangles per second Assumes the triangles are in GPU memory CPU-GPU bandwidth is a limitation Real-time rasterization of massive model becomes a data management problem

Rasterization Current GPUs can render 100-400M triangles per second Assumes the triangles are in GPU memory CPU-GPU bandwidth is a limitation Real-time rasterization of massive model becomes a data management problem Deliver the right set of triangles to the GPU for each frame

Rasterization: Acceleration Use multi-resolution representations Static LODs View-dependent rendering Visibility and occlusion culling Out-of-core rendering

Rasterization: Acceleration Use multi-resolution representations Static LODs View-dependent rendering Visibility and occlusion culling Out-of-core rendering [Hundreds of papers]

Rasterization: Acceleration Use multi-resolution representations Static LODs View-dependent rendering Visibility and occlusion culling Out-of-core rendering Develop an integrated solution!

Towards Scale-able View-Dependent Rendering View-dependent rendering Uses dynamic simplification New multi-resolution hierarchy (CHPM) Occlusion culling using BVHs Out-of-core rendering Improved layouts for high cache throughput Integrate with low error shadow maps [Lloyd et al. 2006] [Yoon et al. 04, Yoon et al. 2005]

Video Demonstration Quick-VDR System

Interactive View-Dependent Shadow Generation Video

Ray Tracing Well studied for 25+ years 1-2 orders of magnitude slower than rasterization

Ray Tracing Well studied for 25+ years 1-2 orders of magnitude slower than rasterization But: asymptotic performance ~ logarithmic

Ray Tracing Well studied for 25+ years 1-2 orders of magnitude slower than rasterization But: asymptotic performance ~ logarithmic Good choice for massive models?

Ray Tracing for Massive Models Logarithmic asymptotic behavior Very useful for dealing with massive models Mainly due to its hierarchical data structures

Ray Tracing for Massive Models Logarithmic asymptotic behavior Very useful for dealing with massive models Mainly due to its hierarchical data structures BUT: Observed only in in-core datasets

Ray Tracing: Performance Measured with 2GB main memory Render time (log scale) Memory thrashing! Working set Size 2GB 2GB Model complexity (M tri) - log scale

Low Growth Rate of Memory Bandwidth Growth rate during 1993 2005 50 45 40 35 30 25 20 15 10 5 0 Disk access speed RAM access speed CPU speed Recent hardware improvements may not provide an efficient solution to our problem! Courtesy: http://www.hcibook.com/e3/online/moores-law/

Ray Coherence Techniques Assume coherences between rays Works well with CAD or architectural models Primary rays and some secondary rays Highly-tessellated models Not much coherence between rays Viewpoint Image plane Small triangles Rays per each pixel

Issues Design appropriate hierarchical representations: Should avoid access to lower levels in the tree Access should be coherent

Incoherent Memory Accesses Model with 370M triangles Assuming 512x512 resolution Hundreds of triangle per pixel At most <1% of triangles visible Each triangle likely in different area of memory Scan of Michelangelo s St.Matthew:

Our approach Add levels-of-detail to ray tracing Main benefit: Improved memory coherence

Our approach Add levels-of-detail to ray tracing Main benefit: Improved memory coherence LOD: simplified versions of geometry Selection according to LOD metric Use ideas from rasterization literature rasterzation: selection per object ray tracing: selection per ray [Yoon et al. 2006]

LOD-based Ray Tracing: Issues Compact and simple to compute LOD can be considered for each node and ray Drastic simplification Factor of two simplification gives only one level reduction for tree traversal High quality and interactive rendering Error should be controllable

Our approach R-LODs Highly integrated with kd-tree [Wald et al. 05] Can also be integrated with BVHs Simple but fast LOD metric Works with shadows, reflections Integrates ray and cache coherences

Outline LOD-based ray tracing Results

Outline LOD-based ray tracing Results

Ray Tracing: Performance Measured with 2GB main memory Render time (log scale) Memory thrashing! Working set size 2GB Model complexity (M tri) - log scale

Ray Tracing: Performance Achieved up to three order of magnitude speedup! Render time (log scale) Working set size Model complexity (M tri) - log scale

Real-time Captured Video St. Matthew Model 512 by 512 and 2x2 super-sampling, 4 pixels-of-error

Related Work Interactive ray tracing LOD and out-of-core techniques LOD-based ray tracing

Interactive Ray Tracing Ray coherences [Heckbert and Hanrahan 84, Wald et al. 01, Reshetov et al. 05] Parallel computing [Parker et al. 99, DeMarle et al. 04, Dietrich et al. 05] Hardware acceleration [Purcell et al. 02, Schmittler et al. 04, Woop et al. 05] Large dataset [Pharr et al. 97, Wald et al. 04]

LOD and Out-of-Core Widely researched Techniques [Luebke et al. 02, Chiang et al. 03] LOD methods combined with out-of-core techniques Points clouds [Rusinkiewicz and Levoy 00] Regular meshes [Hwa et al. 04, Losasso and Hoppe 04] General meshes [Lindstrom 03, Cignoni et al. 04, Yoon et al. 04, Gobbetti and Marton 05]

LOD Methods for Rasterization LOD selection difference LOD section for object LOD selection for ray (Culling or LOD) hierarchy difference Coarse-grained hierarchy for rasterization Fine-grained hierarchy for ray tracing Not clear whether LOD techniques for rasterization is applicable to ray tracing

LOD-based Ray Tracing Ray differentials [Igehy 99] Subdivision meshes [Christensen et al. 03, Stoll et al. 06] Point clouds [Wand and Straβer 03] Viewpoint Image plane Footprint size of ray Ray beam for one pixel

Outline LOD-based ray tracing R-LOD representation LOD selection LOD and layout computations Results

Outline LOD-based ray tracing R-LOD representation LOD selection LOD and layout computations Results

R-LOD Representation Tightly integrated with kd-nodes A plane, material attributes, and surface deviation Rays kd-node No intersection Intersection Normal Plane Valid extent of the plane

LOD-based Runtime Traversal Modification of efficient kd-tree traversal [Wald 04] Traverse, evaluate metric at each node If satisfies, intersect with plane instead if it hits, we re done if not, go back up, try other sub tree In any case: don t need to go deeper!

Properties of R-LODs Compact and efficient LOD representation Add only 4 bytes to (8 bytes) kd-node Drastic simplification Useful for performance improvement

Properties of R-LODs Error-controllable LOD rendering Error is measured in a screen-space in terms of pixels-of-error (PoE) Provides interactive rendering framework

Outline LOD-based ray tracing R-LOD representation LOD selection LOD and layout computations Results

Two Main Design Criteria for LOD Metric Controllability of visual errors Efficiency LOD metric can be evaluated with many nodes for every single ray More than tens of million times evaluation

Visual Artifacts Visibility difference Illumination difference Path difference for secondary rays Surface deviation Projected area Curvature difference LODs Original mesh View direction Ray with original mesh Ray with LODs Image plane

R-LOD Error Metric Consider two factors Projected screen-space area of a kd-node Surface deviation

Conservative Projection Method Measures the screen-space area affected by using an R-LOD LOD metric: Image plane? C (B) d min > R Viewpoint B { d min R kd-node PoE error bound One ray beam

R-LODs with Different PoE Values PoE: Original 1.85 5 10 (512x512, no anti-aliasing)

R-LODs with Different PoE Values PoE: Original 40 80 512x512 image resolution

LOD Metric for Secondary Rays Applicable to any linear transformation Shadow Planar reflection Not applicable to non-linear transformation Refraction and non-planar reflection Uses more general, but expensive ray differentials [Igehy 99]

C 0 Discontinuity between R- LODs Ray Possible solutions Posing dependencies [Lindstrom 03, Hwa et al. 04, Yoon et al. 04, Cignoni et al. 05] Implicit surfaces [Wald and Seidel 05]

Expansion of R-LODs Ray Expansion of the extent of the plane Inspired by hole-free point clouds rendering [Kalaiah and Varshney 03] A function of the surface deviation (20% of the surface deviation)

Impact of Expansions of R- LODs Hole Before expansion After expansion Original model PoE = 5 at 512 by 512

Outline LOD-based ray tracing R-LOD representation LOD selection LOD and layout computations Results

R-LOD Construction Principal component analysis (PCA) Compute the covariance matrix for the plane of R-LODs Normal (= Eigenvector) Hierarchical PCA computation Has linear time complexity Accesses the original data only one time with virtually no memory overhead

Ray Coherence Using LOD improve the utilization of SIMD functionality Maintain spatial coherence between rays Maintain ray groups bigger

Cache Coherence Cache misses can be a major bottleneck Especially for massive models Use cache-oblivious layouts [Yoon and Manocha 06, Yoon et al. 05] Works well with various caches (L1, L2, memory, disk) Does not require any code modification 10% ~ 60% improvement for LOD-based ray tracer 3X improvement for ray tracing, collision detection, GPUbased rendering, iso-surface extraction

Layout Computation vb va vc vd Input graph (weights) Multilevel optimization va vb vd vc Cache-oblivious metric Local permutations Result 1D layout The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

OpenCCL http://gamma.cs.unc.edu/openccl The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Specialization to kd-trees and BVHs What is an input graph? Hierarchy itself? Parent-child and spatial localities Implicitly considered given the input hierarchy Weights Indicates coherence levels between two nodes Computed based on geometric relationships The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Probability Function for Layout Computation How much a node is likely to be accessed? Bounding box of a node Point Bounding box of a second object Sphere Rectangular Ray beam Equivalent to surface area heuristics [MacDonald and Booth 90, Havran 00]

Layout Algorithms Recursively divide and layout between sub-trees (multi-scale approach) Based on the probability function Works well with various cache block sizes [Yoon and Lindstrom 06]

Outline R-LODs for ray tracing Results

Implementation Uses common optimized kd-tree construction methods Based on surface-area heuristics [MacDonald and Booth 90, Havran 00] Out-of-core computation Decompose an input model into a set of clusters [Yoon et al. 04]

Preprocessing Construction speed Very fast due to its linear complexity (3M triangles per min) Memory overhead Require 33% more storage over the optimized kd-tree representation [Wald 04] Runtime overhead 5% compared to non-lod version of an efficient ray tracer

Impacts of R-LODs # of intersected nodes per ray 10X speedup Render time Working set size PoE = 0 (No LOD) PoE = 2.5

Real-time Captured Video St. Matthew Model 512 x 512, 2 x 2 anti-aliasing, PoE = 4

Image Quality Comparison Forest Model (32M Triangles) 4 X speedup PoE = 0 (No LOD) PoE = 4 and cache-oblivious layout of kd-tree Shading difference

Results CAD model 2 fps 2 times speedup Double Eagle tanker, 82M triangles

Pros and Cons Limitations Does not handle advanced materials (BRDF) Our metric works well only with a linear transformation No guarantee there is no holes Advantages Simplicity Interactivity Efficiency

Ongoing and Future Work Investigate an efficient use of implicit surfaces Allow approximate visibility Extend to global illumination Design an efficient layout algorithm for deforming models

Conclusions Massive model rendering limited by memory access and bus bandwidth It is becoming a data and memory management problem LOD-based ray tracing Main improvement due to working set size reduction 20-1000% speedups Integrate cache and ray coherence techniques

UCRL-PRES-223086 Some part of this work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W- 7405-ENG-48.