Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model

Similar documents
Interactive Visibility Culling in Complex Environments using Occlusion-Switches

Interactive View-Dependent Rendering with Conservative Occlusion Culling in Complex Environments

Interactive View-Dependent Rendering with Conservative Occlusion Culling in Complex Environments

Quick-VDR: Interactive View-Dependent Rendering of Massive Models

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

Interactive Ray Tracing: Higher Memory Coherence

RECENT advances in acquisition, modeling, and simulation

GigaWalk: Interactive Walkthrough of Complex Environments

Fast and Simple Occlusion Culling using Hardware-Based Depth Queries

Level-of-Detail Techniques and Cache-Coherent Layouts

Interactive Visualization and Collision Detection using Dynamic Simplification and Cache-Coherent Layouts

Frédo Durand, George Drettakis, Joëlle Thollot and Claude Puech

Visibility-Based Prefetching for Interactive Out-Of-Core Rendering

CS780: Topics in Computer Graphics

CS535 Fall Department of Computer Science Purdue University

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Software Structures for Virtual Environments

ACCELERATING ROUTE PLANNING AND COLLISION DETECTION FOR COMPUTER GENERATED FORCES USING GPUS

Fast Line-of-Sight Computations in Complex Environments

Visibility and Occlusion Culling

CC Shadow Volumes. Abstract. Introduction

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

Distributed Massive Model Rendering

Project Gotham Racing 2 (Xbox) Real-Time Rendering. Microsoft Flighsimulator. Halflife 2

A Framework for the Real-Time Walkthrough of Massive Models

Comparison of hierarchies for occlusion culling based on occlusion queries

Fast BVH Construction on GPUs

Fast continuous collision detection among deformable Models using graphics processors CS-525 Presentation Presented by Harish

CS 563 Advanced Topics in Computer Graphics Culling and Acceleration Techniques Part 1 by Mark Vessella

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays

Fast Hard and Soft Shadow Generation on Complex Models using Selective Ray Tracing

Hierarchical Face Cluster Partitioning of Polygonal Surfaces and High-Speed Rendering

CC Shadow Volumes. University of North Carolina at Chapel Hill

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013

Visible Surface Detection. (Chapt. 15 in FVD, Chapt. 13 in Hearn & Baker)

Simplifying Complex Environments Using Incremental Textured Depth Meshes

MASSIVE TIME-LAPSE POINT CLOUD RENDERING with VR

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering

AUTOMATICALLY REDUCING AND BOUNDING GEOMETRIC COMPLEXITY BY USING IMAGES

A Developer s Survey of Polygonal Simplification algorithms. CS 563 Advanced Topics in Computer Graphics Fan Wu Mar. 31, 2005

Graphics Processing Unit Architecture (GPU Arch)

Techniques for an Image Space Occlusion Culling Engine

Wed, October 12, 2011

Occluder Simplification using Planar Sections

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

RACBVHs: Random Accessible Compressed Bounding Volume Hierarchies

Occluder Shadows for Fast Walkthroughs of Urban Environments

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

LOD and Occlusion Christian Miller CS Fall 2011

OpenGL-assisted Visibility Queries of Large Polygonal Models

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

HLODs for Faster Display of Large Static and Dynamic Environments Carl Erikson Dinesh Manocha William V. Baxter III Department of Computer Science

Design of a Walkthrough System for Indoor Environments from Floor Plans

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Real Time Rendering of Expensive Small Environments Colin Branch Stetson University

Spatial Data Structures and Acceleration Algorithms

Interactive Visibility Ordering and Transparency Computations among Geometric Primitives in Complex Environments

Smooth Transitions in Texture-based Simplification

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful

Direct Rendering of Trimmed NURBS Surfaces

A Fast Method for Local Penetration Depth Computation. Stephane Redon and Ming C. Lin

Spatially-Encoded Far-Field Representations for Interactive Walkthroughs

Streaming Massive Environments From Zero to 200MPH

The Traditional Graphics Pipeline

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

Software Occlusion Culling

A Real-time Rendering Method Based on Precomputed Hierarchical Levels of Detail in Huge Dataset

Applications of Explicit Early-Z Culling

Proposal for an EG2006 Tutorial (full day) Real-time, Interactive Massive Model Visualization

Hardware-accelerated from-region visibility using a dual ray space

Culling an Object Hierarchy to a Frustum Hierarchy

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Multiresolution model generation of. texture-geometry for the real-time rendering 1

Large-Scale CAD Model Visualization on a Scalable Shared-Memory Architecture

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Efficient Real-Time Rendering of Building Information Models

Near Optimal Hierarchical Culling: Performance Driven Use of Hardware Occlusion Queries

The Traditional Graphics Pipeline

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Universiteit Leiden Computer Science

Hierarchical Z-Buffer Visibility

Speeding up your game

Level-of-Detail Triangle Strips for Deforming. meshes

Visibility Preprocessing with Occluder Fusion for Urban Walkthroughs

Occlusion Horizons for Driving through Urban Scenery

FastV: From-point Visibility Culling and Application to Sound Rendering

Some Thoughts on Visibility

Associate Prof. Michael Wimmer. TU Wien

Realtime Ray Tracing for Advanced Visualization in the Aerospace Industry

Illumination and Geometry Techniques. Karljohan Lundin Palmerius

Horizon Occlusion Culling for Real-time Rendering of Hierarchical Terrains

Accelerating Shadow Rays Using Volumetric Occluders and Modified kd-tree Traversal

Dynamic Ambient Occlusion and Indirect Lighting. Michael Bunnell NVIDIA Corporation

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

CS427 Multicore Architecture and Parallel Computing

Transcription:

Goal Interactive Walkthroughs using Multiple GPUs Dinesh Manocha University of North Carolina- Chapel Hill http://www.cs.unc.edu/~walk SIGGRAPH COURSE #11, 2003 Interactive Walkthrough of complex 3D environments at high fidelity Models from CAD, VR High primitive count Heterogeneous geometry Irregular distribution DoubleEagle Tanker Model Boeing 777 470 million triangles 82 million triangles 127,000 objects 1

Boeing 777 Boeing 777 470 million triangles 470 million triangles: Zoomed in View Simulation of Richtmyer-Meshkov instability Simulation of Richtmyer-Meshkov instability 2048 * 2048 * 1920 grid and 27,000 time steps (5832 processors, IBM SST System): Over 3 TB of multivariate data generated. Each time step, ~500 Million triangles for each iso-surface Close view of a small detail Data: Lawrence Livermore Labs 2

Commodity Hardware Performance Current GPUs have capabilities to render millions of polygons per second Require vertex arrays, triangle stripping to achieve these rates Issues Large polygon count Memory limitations Commodity Hardware Performance Current GPUs have capabilities to render millions of polygons per second Require vertex arrays, triangle stripping to achieve these rates Issues Large polygon count Memory limitations Solution Reduce polygon count Acceleration Techniques Geometric Simplification Geometric simplification Occlusion culling Hybrid approaches Static [Clark 1976; Hoppe 96; Erikson et al. 01; Garland et al. 97] View-Dependent [Hoppe 97; Luebke et al. 97] Surveyed in LOD book [Luebke et al. 02] 3

Occlusion Culling Hybrid Approaches Cells and portals [Airey et al. 90; Teller 92; Luebke et al. 95] Urban datasets composed of large occluders [Coorg et al. 97; Hudson et al. 97; Schaufler et al. 00; Wonka et al. 01] Recent survey in [Cohen-Or et al. 02] Further classified as Conservative [Greene et al. 93, Zhang et al. 97 ] Approximate [Bartz et al. 99; Klosowski and Silva 00] Combine LOD and Occlusion Culling techniques UC Berkeley Walkthrough [Funkhouser et al. 96] Synthetic convex occluders [Andujar et al. 00] Approximate visibility using prioritized layer projections with view dependent rendering [ElSana et al. 01] UNC MMR system [Aliaga et al. 99] GigaWalk [Baxter et al. 02] Motivation An algorithm that works on generic models Image-based culling algorithms Current approaches involve frame buffer readbacks [ Greene et al. 93, Zhang et al. 97, Baxter et al. 02 ] GPU Growth Rate CPU Growth Rate AGP Bandwidth Growth Rate 4

Frame Buffer Readbacks Outline GPU growth rate is higher than Moore s law Bandwidth growth rate lower than Moore s law Result Frame buffer readbacks are expensive Also readbacks result in graphics pipeline stalls Require a solution that Leverages high GPU power Avoids frame buffer readbacks Parallel algorithm using mulitple GPUs Interactive walkthroughs Interactive shadow generation Outline Occlusion-Switch Parallel algorithm using mulitple GPUs Interactive walkthroughs Interactive shadow generation Occlusion-switch has 2 GPUs It takes a camera as input The output is PVS and a camera PVS: Potentially visible set 5

Occlusion-Switch Occlusion Switch : Parallel Algorithm and Architecture Parallel image-based culling algorithm A novel switching mechanism to eliminate frame-buffer readbacks Exploits temporal coherence Occluder selection Lower bandwidth requirements Integrates with hierarchical LODs Three processes in parallel 1. Occlusion Representation () 2. ing () 3. Render Visible Geometry (RVG) 6

Cam i+1 +1 Frame i Frame i Cam i+1 +1 Cam i+1 +1 +1 Frame i Frame i+1 Frame i Frame i+1 7

Cam i+1 +1 Cam i+2 +2 +1 Cam i+1 +1 Cam i+2 +2 +1 Frame i Frame i+1 Frame i Frame i+1 Cam i+1 +1 Cam i+2 +2 +2 +1 Cam i+1 +1 Cam i+2 +2 +2 +1 Cam i+3 +3 Frame i Frame i+1 Frame i+2 Frame i Frame i+1 Frame i+2 8

+1 RVG Display Geometry GPU 3 +2 +2 +1 +3 GPU 3 +1 RVG Display Geometry +2 +2 +1 +3 Frame i Frame i+1 Frame i+2 Frame i Frame i+1 Frame i+2 GPU 3 +1 RVG Display Geometry +2 +2 +1 +3 GPU 3 +1 RVG Display Geometry +2 +2 +1 RVG Display Geometry +1 +3 Frame i Frame i+1 Frame i+2 Frame i Frame i+1 Frame i+2 9

GPU 3 +1 RVG Display Geometry +2 +2 +1 RVG Display Geometry +1 +3 RVG Display Geometry +2 GPU 3 For Frame i For Frame i+1 RVG Display Geometry For Frame i SWITCH For Frame i+2 For Frame i+1 RVG Display Geometry For Frame i+1 For Frame i+2 For Frame i+3 RVG Display Geometry For Frame i+2 Frame i Frame i+1 Frame i+2 Frame i Frame i+1 Frame i+2 Bandwidth Requirements Potential Visible Set (PVS) and camera parameters are transmitted over the network. Temporal coherence to minimize bandwidth requirements Each GPU maintains current frame geometry Transmit the differences in visible geometry across successive frames Static LOD Selection: Error Metric Pixel Error Metric: Max normal deviation of silhouette in image Traverse down scene graph till error satisfied Upper Bound: Highly conservative DE Engine Room 1K x 1K, 20 PE Error Image 10

ing ing Optimizations Use NV_OCCLUSION_QUERY to determine visible pixels Traverse scene hierarchy rendering bounding boxes of nodes Multiple Occlusion Tests Occlusion Query counter for each node Traverse scene graph breadth first Bunch queries for all nodes at a level 40% faster than testing one node with HP_OCCLUSION_TEST LOD Selection: Visibility Metric Visibility for LOD selection Visible pixels of bounding box > visible pixels of geometry No. of visible pixels less than error metric provides an early termination condition Provides looser bounds reduces polygon count Conservative Occlusion Culling Underlying algorithm used for occlusion culling is conservative to image precision Exactly same set of LODs is used for both and stages 11

Outline Interactive Walkthroughs Parallel algorithm using mulitple GPUs Interactive walkthroughs Interactive shadow generation 3 Dell Precision Workstations with dual 2 GHz Pentium IV CPUs, GeForce 4 GPU, and 2 GB main memory: Immediate mode rendering Test Model: Powerplant Test Model: DoubleEagle Tanker Original 0.5 Gigabyte dataset 13 Million Polygons 1200 objects Preprocessing 7 hours 1.2 Gigabytes 13 Million Polygons 38,000 objects Original 4 Gigabyte dataset 82 Million polygons 127,000 objects Preprocessing 34 hours 8 Gigabytes 82 Million polygons 61,000 objects 12

Interactive Walkthrough: Powerplant Interactive Walkthrough: Double Eagle Tanker 13M triangles: 3 Dell PCs with NVIDIA GeForce 4 cards (2002) 82M triangles: 3 Dell PCs with NVIDIA GeForce 4 cards (2002) Recent Work: Use Single GPU for 3 Processes Use vertex arrays Perform, and RVG on the GPU Improved support for occlusion queries Lower performance as compared to 3 GPUbased solution Fewer latency problems Live Demo Dell M50 laptop, 2.4GHz Pentium IV-M CPU, NVIDIA Quadro4 700GoGL GPU, 1GB memory running Windows XP: Vertex arrays Powerplant model (13M triangles) [July 2003] 13

View-Dependent Rendering View-Dependent Rendering Use of static LODs leads to popping artifacts Compute a view-dependent simplification Precompute a vertex and cluster hierarchy Runtime: perform occlusion culling using cluster hierarchy Runtime: use the vertex hierarchy to compute the view-dependent simplification Use of static LODs leads to popping artifacts Compute a view-dependent simplification Precompute a vertex and cluster hierarchy Runtime: perform occlusion culling using cluster hierarchy Runtime: use the vertex hierarchy to compute the view-dependent simplification Integrated Occlusion Culling with View-Dependent Simplification Occlusion-Switch : Advantages Generic models No assumptions on model, distribution Conservative Occlusion Culling Significant Occlusion Culling Low Bandwidth Computation performed on GPUs Single Dell PC with NVIDIA GeForce FX 5800 Ultra GPU [Yoon, Salomon and Manocha: IEEE Visualization 2003] 14

Outline Interactive Shadows Parallel algorithm using mulitple GPUs Interactive walkthroughs Interactive shadow generation Portions of scene that are occluded from light Goal Generate hard-edged shadows for interactive walkthroughs Models with millions of polygons Scenes with a moving light source Shadows: Importance Realism Real world scenes have shadows Provide a sense of depth to the user Better navigation Enhance user experience in interactive games, VR applications Hard-edged shadows 15

Shadows: Importance Interactive Shadows View in the Powerplant model N. Govindaraju et al. Interactive Shadow Generation in Complex Environments, Proc. of ACM SIGGRAPH 2003 Talk scheduled on Tuesday at 3:40pm (Shadows section) Shadows No Shadows Goal Goal Generate high fidelity shadows Requires object precision techniques Compute shadow casters and shadow receivers Interactive performance Use visibility culling and LOD techniques Reduce the number of potential shadow casters and shadow receivers 16

Interactive Shadows Interactive Shadows PVS Computation Cross Culling Hybrid Shadow Generation PVS Computation Cross Culling Hybrid Shadow Generation Stage I Stage II Stage III Stage I Stage II Stage III Interactive Shadows Overview PVS Computation Cross Culling Hybrid Shadow Generation Stage I Stage II Stage III Algorithm proceeds in three stages Stage I: PVS computation Compute potential shadow casters and receivers Shadow Stage II: Cross Culling Culling Compute shadow casters and receivers that result in aliasing Stage III: Hybrid Shadow Generation 17

Light Powerplant model: 12.7M triangles Fully Lit Cross- Culling Eye Eye 18

Fully Lit Cross- Culling Fully Lit Cross- Culling Fully Shadowed Fully Shadowed Eye Eye Partially Shadowed Shadow Casters Cross- Culling Light Shadow Generation Partially Shadowed Receivers Eye Shadow polygons 19

Render Process-parallel Algorithm Utilize multiple GPUs to improve performance Architecture Data Flow GPU 3 GPU1 GPU3 Compute PVS E For Frame i Compute PVS E For Frame i+1 PVS E GPU2 PVS L Cross- Culling Render Compute PVS L For Frame i Compute PVS L For Frame i+1 CPU Shadow Generation Shadow Generation For Frame i Frame i Shadow Generation For Frame i+1 Frame i+1 Frame i+2 20

Interactive Shadows Conclusions Interactive display and shadows Massive models Use multiple GPUs for visibility computations Application to complex models Collaborators Acknowledgements Naga Govindaraju Brandon Lloyd Brian Salomon Avneesh Sud Sungeui Yoon http://www.cs.unc.edu/~walk Army Research Office Department of Energy Intel Corporation Office of Naval Research National Science Foundation NNS for the DoubleEagle model Anonymous donor for Powerplant model Boeing Corporation NVIDIA Corporation (Occlusion queries) UNC Walkthrough group 21

The End 22