EFFICIENT BVH CONSTRUCTION AGGLOMERATIVE CLUSTERING VIA APPROXIMATE. Yan Gu, Yong He, Kayvon Fatahalian, Guy Blelloch Carnegie Mellon University

Similar documents
Efficient BVH Construction via Approximate Agglomerative Clustering

Efficient BVH Construction via Approximate Agglomerative Clustering

Bounding Volume Hierarchy Optimization through Agglomerative Treelet Restructuring

Fast BVH Construction on GPUs

Accelerating Ray-Tracing

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

SAH guided spatial split partitioning for fast BVH construction. Per Ganestam and Michael Doggett Lund University

Row Tracing with Hierarchical Occlusion Maps

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Lecture 4 - Real-time Ray Tracing

Fast Agglomerative Clustering for Rendering

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur

Acceleration Data Structures

Recursive SAH-based Bounding Volume Hierarchy Construction

Intersection Acceleration

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Ray Tracing Acceleration Data Structures

HLBVH: Hierarchical LBVH Construction for Real Time Ray Tracing of Dynamic Geometry. Jacopo Pantaleoni and David Luebke NVIDIA Research

SAH guided spatial split partitioning for fast BVH construction

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Holger Dammertz August 10, The Edge Volume Heuristic Robust Triangle Subdivision for Improved BVH Performance Holger Dammertz, Alexander Keller

CS535 Fall Department of Computer Science Purdue University

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Massively Parallel Hierarchical Scene Processing with Applications in Rendering

Computer Graphics. - Spatial Index Structures - Philipp Slusallek

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]

Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur

CPSC / Sonny Chan - University of Calgary. Collision Detection II

Extended Morton Codes for High Performance Bounding Volume Hierarchy Construction

Lecture 11: Ray tracing (cont.)

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Warped parallel nearest neighbor searches using kd-trees

Adaptive Assignment for Real-Time Raytracing

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees

Geometric data structures:

Bounding Volume Hierarchies versus Kd-trees on Contemporary Many-Core Architectures

Object Partitioning Considered Harmful: Space Subdivision for BVHs

Spatial Data Structures for Computer Graphics

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

On Quality Metrics of Bounding Volume Hierarchies

Lecture 2 - Acceleration Structures

Comparison of hierarchies for occlusion culling based on occlusion queries

LightSlice: Matrix Slice Sampling for the Many-Lights Problem

Spatial Data Structures

Topics. Ray Tracing II. Intersecting transformed objects. Transforming objects

Ray-tracing Acceleration. Acceleration Data Structures for Ray Tracing. Shadows. Shadows & Light Sources. Antialiasing Supersampling.

Lecture 11 - GPU Ray Tracing (1)

Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models. Kirill Garanzha

Stackless Ray Traversal for kd-trees with Sparse Boxes

Simpler and Faster HLBVH with Work Queues

Two Optimization Methods for Raytracing. W. Sturzlinger and R. F. Tobler

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Review for Ray-tracing Algorithm and Hardware

A Model to Evaluate Ray Tracing Hierarchies

Spatial Data Structures

Ray-Box Culling for Tree Structures

Space-based Partitioning Data Structures and their Algorithms. Jon Leonard

An improvement in the build algorithm for Kd-trees using mathematical mean

T-SAH: Animation Optimized Bounding Volume Hierarchies

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Topics. Ray Tracing II. Transforming objects. Intersecting transformed objects

Accelerated Raytracing

Today. Acceleration Data Structures for Ray Tracing. Cool results from Assignment 2. Last Week: Questions? Schedule

Ray Tracing with Spatial Hierarchies. Jeff Mahovsky & Brian Wyvill CSC 305

Parallel SAH k-d Tree Construction. Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, John C.

Spatial Data Structures

Consider a partially transparent object that is illuminated with two lights, one visible from each side of the object. Start with a ray from the eye

Recall: Inside Triangle Test

SPATIAL DATA STRUCTURES. Jon McCaffrey CIS 565

Spatial Data Structures

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

Collision and Proximity Queries

Kinetic Bounding Volume Hierarchies for Deformable Objects

Real Time Ray Tracing

Using Bounding Volume Hierarchies Efficient Collision Detection for Several Hundreds of Objects

Spatial Data Structures. Steve Rotenberg CSE168: Rendering Algorithms UCSD, Spring 2017

Advanced Ray Tracing

Construction of Efficient Kd-Trees for Static Scenes Using Voxel-visibility Heuristic

COMPUTER GRAPHICS COURSE. Ray Tracing

Spatial Data Structures

Homework 1: Implicit Surfaces, Collision Detection, & Volumetric Data Structures. Loop Subdivision. Loop Subdivision. Questions/Comments?

Collision Detection. Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

EDAN30 Photorealistic Computer Graphics. Seminar 2, Bounding Volume Hierarchy. Magnus Andersson, PhD student

Algorithm Engineering Lab: Ray Tracing. 8. Februar 2018

Solid Modeling. Thomas Funkhouser Princeton University C0S 426, Fall Represent solid interiors of objects

Computer Graphics Ray Casting. Matthias Teschner

Bounding Volume Hierarchies

Simpler and Faster HLBVH with Work Queues. Kirill Garanzha NVIDIA Jacopo Pantaleoni NVIDIA Research David McAllister NVIDIA

Fast kd-tree Construction for 3D-Rendering Algorithms Like Ray Tracing

Collision Detection. These slides are mainly from Ming Lin s course notes at UNC Chapel Hill

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella

Lightcuts. Jeff Hui. Advanced Computer Graphics Rensselaer Polytechnic Institute

improving raytracing speed

Transcription:

EFFICIENT BVH CONSTRUCTION VIA APPROXIMATE AGGLOMERATIVE CLUSTERING Yan Gu, Yong He, Kayvon Fatahalian, Guy Blelloch Carnegie Mellon University

BVH CONSTRUCTION GOALS High quality: produce BVHs of comparable (or better) quality to full-sweep SAH algorithms. High performance: faster construction than widely used SAH-based algorithms that use binning. OUR APPROACH An agglomerative clustering (bottom-up) based construction algorithm. Motivated by [Walter et al. 2008].

HIERARCHICAL CLUSTERING EXAMPLE Source data points:

HIERARCHICAL CLUSTERING EXAMPLE Source data points: d, d(i, j) = distance from cluster i to cluster j

HIERARCHICAL CLUSTERING EXAMPLE Source data points:

HIERARCHICAL CLUSTERING EXAMPLE Source data points: Resulting cluster hierarchy:

HIERARCHICAL CLUSTERING EXAMPLE Source data points: Resulting cluster hierarchy:

HIERARCHICAL CLUSTERING EXAMPLE Source data points: Resulting cluster hierarchy:

HIERARCHICAL CLUSTERING EXAMPLE Source data points: Resulting cluster hierarchy:

HIERARCHICAL CLUSTERING IS A GENERAL TECHNIQUE FOR ORGANIZING DATA. Domain Linguistic Image retrieval Anthropology Biology Social network Clustered primitives Languages Images Surnames / races Genes / species People / behaviors

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Elements to cluster = scene primitives Distance = surface area of aggregate bounding box

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Compute the nearest neighbor to each primitive.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Find the closest pair of primitives and combine them into a cluster.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Update nearest neighbor links.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Repeat: combine closest remaining clusters.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Repeat: combine closest remaining clusters.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Repeat: combine closest remaining clusters.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Repeat: combine closest remaining clusters.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Repeat: combine closest remaining clusters.

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Continue until one cluster remains (BVH root).

BVH BUILD USING AGGLOMERATIVE CLUSTERING [WALTER ET AL. 2008] Good: often higher quality BVH than sweep builds Bad: lower performance than binned builds KD-tree search/update in each clustering step. Data-dependent parallel execution.

OBSERVATION Most computation occurs at the lowest levels of the BVH of the construction process when the number of clusters is large (near leaves). Top h 2 levels: Number of nodes n Bottom h 2 levels: Number of nodes n

CONTRIBUTION a Approximate Agglomerative Clustering (AAC) New algorithm for BVH construction that is work efficient, parallelizable, and produces high-quality trees.

OUR MAIN IDEA Restrict nearest neighbor search to a small subset of neighboring scene elements.

OUR MAIN IDEA Restrict nearest neighbor search to a small subset of neighboring scene elements.

OUR MAIN IDEA Restrict nearest neighbor search to a small subset of neighboring scene elements.

PHASE 1: PRIMITIVE PARTITIONING ( DOWNWARD/DIVIDE PHASE ) Computation graph: Primitive partitioning: δ = 4

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Each node = combine input into f(n) clusters Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Primitive partitioning:

PHASE 2: AGGLOMERATIVE CLUSTERING ( UPWARD PHASE ) Computation graph: Primitive partitioning:

AAC IS AN APPROXIMATION TO THE TRUE AGGLOMERATIVE CLUSTERING SOLUTION. Computation graph: Primitive partitioning:

AAC IS AN APPROXIMATION TO THE TRUE AGGLOMERATIVE CLUSTERING SOLUTION. Computation graph: Primitive partitioning:

AAC HAS TWO PARAMETERS δ: stopping criterion for stop partitioning (maximum of primitives in leaf regions). f(n): function that determines the number of clusters to generate in each graph node (n is the number of primitives in the corresponding region.)

DETERMINING HOW MUCH TO CLUSTER f n = 1: close to spatial bisection BVH.

DETERMINING HOW MUCH TO CLUSTER f n = n: all primitives pushed to top of computation graph, AAC solution is same as true agglomerative clustering.

DETERMINING HOW MUCH TO CLUSTER We use f n = cn α, where 0 < α < 0.5.

AAC HAS LINEAR TIME COMPLEXITY Downward phase is linear. Upward clustering phase: Let f n = cn α, where 0 < α < 0.5. Assumptions: δ is a small constant. Time complexity on each graph node is O n 2 [Olson 1995], where n is the number of input primitives in this node.

COMPLEXITY ANALYSIS Let f n = cn α, where 0 < α < 0.5. Work done at leaves: N nodes, O f δ 2 = δ O δ 2α computation each. O Nδ 2α 1 work total.

COMPLEXITY ANALYSIS Let f n = cn α, where 0 < α < 0.5. Let C = Nδ 2α 1 Work done at leaves: N nodes, O f δ 2 = δ O δ 2α computation each. O C work total.

LINEAR TIME COMPLEXITY Let f n = cn α, where 0 < α < 0.5. Let C = Nδ 2α 1 N 2δ O f 2δ 2 = O 2δ 2α computation each. O N 2δ 2α 1 work total. O C work total.

LINEAR TIME COMPLEXITY Let f n = cn α, where 0 < α < 0.5. Let C = Nδ 2α 1 r = 2 2α 1 < 1 O C r work total. O C work total.

LINEAR TIME COMPLEXITY Let f n = cn α, where 0 < α < 0.5. Geometrically decreasing. Let C = Nδ 2α 1 r = 2 2α 1 < 1 O C r 2 work total. O C r work total. O C work total.

BVH Construction Time (ms) THEORY MEETS PRACTICE: WE OBSERVE LINEAR SCALING WITH SCENE SIZE AAC-HQ BVH Construction Time (Single Core) 7500 San Miguel 5000 2500 Half-Life Conference Buddha 0 Sponza/Fairy 0 2M 4M 6M 8M Scene Triangle Count

PARAMETERS Trade off BVH quality and construction speed by changing δ and f in the algorithm. We proposed 2 sets of parameters: AAC-HQ (high quality): δ = 20, f n = δ0.6 2 n0.4 ; AAC-Fast: δ = 4, f n = δ0.7 2 n0.3.

IMPLEMENTATION DETAILS Parallelization: Algorithm is divide-and-conquer, so very easy to parallelize. Key optimizations possible: Reduce redundant computation of cluster distances; Reducing data movement; Sub-tree flatting for improved tree quality.

EVALUATION

SETUP We compared 5 CPU implementations SAH SAH-BIN Local-Ord A standard top-down full-sweep SAH build [MacDonald and Booth 1990] A top-down binned SAH build using at most 16 bins along the longest axis [Wald 2007] Locally-ordered agglomerative clustering [Walter et al. 2008] AAC-HQ AAC with high quality settings: δ = 20, f n = 3n 0.4 AAC-Fast AAC configured for performance: δ = 4, f n = 1.3n 0.3

SCENES Sponza Half-Life Conference Fairy San Miguel Buddha

Ray-tracing cost TREE COST COMPARISON Cost = number of traversal steps + intersection tests during ray tracing. 1.2 1.0 0.8 0.6 0.4 0.2 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

Ray-tracing cost AAC-HQ produces BVHs that have similar cost as those produced by true agglomerative clustering builds. 1.2 1.0 0.8 0.6 0.4 0.2 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

Ray-tracing cost AAC-Fast produces BVHs with equal or lower cost than the full sweep build in all cases except Buddha. 1.2 1.0 0.8 0.6 0.4 0.2 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

Ray-tracing cost AAC-Fast produces BVHs with equal or lower cost than the full sweep build in all cases except Buddha. 1.2 1.0 0.8 0.6 0.4 0.2 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

AAC IS ABLE TO MAKE PARTITIONS THAT ARE NOT DETERMINED BY PARTITION PLANES.

Normalized Exec. Time BVH CONSTRUCTION TIME (SINGLE CORE) AAC-HQ build times are five to six times lower than Local-Ord (while maintaining comparable BVH quality) Normalized BVH Build Time (Single Core) 2.5 2.0 1.5 1.0 0.5 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

AAC-HQ build times are comparable to SAH-BIN AAC-Fast build times up to four times faster than SAH-BIN.................. Normalized Exec. Time Normalized BVH Build Time (Single Core) 1.0 0.75 0.5 0.25 0 Sponza Fairy Conference Buddha Half-Life San Miguel SAH SAH-BIN Local-Ord AAC-HQ AAC-Fast

AAC PARALLEL EXECUTION SPEEDUP AAC-HQ achieves nearly linear speedup out to 16 cores, and a 34 speedup on 40 cores

AAC 32-CORE SPEEDUP AAC Build Execution Times (milliseconds) and Parallel Speedup Tri Count AAC-HQ AAC-Fast 1 core 32 cores 1 core 32 cores Sponza 67 K 52 a 2 (24.0) 20 a 1 (21.5) Fairy 174 K 117 a 5 (24.5) 44 a 2 (22.4) Conference 283 K 225 a 10 (23.6) 70 a 4 (19.4) Buddha 1.1 M 1,101 a 43 (25.8) 397 a 16 (24.0) Half-Life 1.2 M 1,080 a 42 (25.7) 359 a 15 (22.8) San Miguel 7.9 M 7,350 a 298 (24.6) 2,140 a 99 (21.6)

SUMMARY AAC algorithm: BVH construction via an approximation to agglomerative clustering of scene primitives Comparable quality BVH to full sweep SAH build Up to four-times faster than binned SAH build Amenable to parallelism on many-core CPUs

SIMILARITY TO KARRAS13 (NEXT TALK) Fast initial organization of scene primitives via Morton codes AAC: to define constraints on clustering Karras13: to define initial BVH Brute-force optimization of local sub-structures AAC: brute-force local clustering in each node Karras13: brute-force enumeration of treelet structures In both: more flexible partitions than defined by spatial partition plane AAC does not address triangle splitting

LOOKING FORWARD Have not yet explored parallelization of AAC on GPUs Post-process BVH optimizations can be applied on a smaller set of clusters generated by AAC Clustering in low dimensional space has many other applications in computer graphics including: Lighting (e.g., Light Cuts) N-body simulation Collision detection

Thank you We acknowledge the support of: The National Science Foundation (CCF-1018188) Intel Labs Academic Research Office NVIDIA corporation

Ray-tracing cost BVHs produced by AAC methods realize greater benefit for shadow rays than diffuse bounce rays. AAC-HQ BVH cost (normalized to full sweep SAH) 1.0 0.8 0.6 0.4 0.2 0 Sponza Fairy Conference Buddha Half-Life San Miguel Diffuse Rays Shadow Rays

WHY AAC PERFORMS WORSE FOR BUDDHA.