Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

Similar documents
Point Sample Rendering

Gaël Guennebaud Loïc Barthe, Mathias Paulin IRIT UPS CNRS TOULOUSE FRANCE Gaël Guennebaud Cyprus June 2006

Point based Rendering

CSL 859: Advanced Computer Graphics. Dept of Computer Sc. & Engg. IIT Delhi

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

Adaptive Point Cloud Rendering

Splat/Mesh Blending, Perspective Rasterization and Transparency for Point-Based Rendering

Vincent Forest, Loïc Barthe, Mathias Paulin. IRIT-UPS-CNRS University of Toulouse, France

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Visibility and Occlusion Culling

Screen-Space Triangulation for Interactive Point Rendering

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Point-Based Rendering

Real-Time Rendering (Echtzeitgraphik) Dr. Michael Wimmer

High-Quality Surface Splatting on Today s GPUs

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Project Gotham Racing 2 (Xbox) Real-Time Rendering. Microsoft Flighsimulator. Halflife 2

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

High Quality Adaptive Soft Shadow Mapping

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model

CS535 Fall Department of Computer Science Purdue University

Massive Model Visualization using Real-time Ray Tracing

Performance OpenGL Programming (for whatever reason)

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Triangle Rasterization

Progressive Volume Rendering of Large Unstructured Grids

CS4620/5620: Lecture 14 Pipeline

Hierarchical surface fragments *

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

PowerVR Hardware. Architecture Overview for Developers

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Spatial Data Structures and Acceleration Algorithms

White Paper. Soft Shadows. February 2007 WP _v01

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University

View-Independent Object-Space Surface Splatting

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

Computer Graphics. Bing-Yu Chen National Taiwan University

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

Enabling immersive gaming experiences Intro to Ray Tracing

Visible Surface Detection. (Chapt. 15 in FVD, Chapt. 13 in Hearn & Baker)

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Level of Details in Computer Rendering

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

LOD and Occlusion Christian Miller CS Fall 2011

Interactive Ray Tracing: Higher Memory Coherence

Point-based rendering of trees

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Hidden surface removal. Computer Graphics

Universiteit Leiden Computer Science

GeForce3 OpenGL Performance. John Spitzer

Hardware Accelerated Rendering of Unprocessed Point Clouds

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Subdivision Of Triangular Terrain Mesh Breckon, Chenney, Hobbs, Hoppe, Watts

Pipeline Operations. CS 4620 Lecture 10

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

ICS RESEARCH TECHNICAL TALK DRAKE TETREAULT, ICS H197 FALL 2013

SMOOTH VISUALIZATION OF LARGE POINT CLOUDS

APPROVAL SHEET. Title of Thesis: HYBRID 3D-MODEL REPRESENTATION THROUGH QUADRIC METRICS AND HARDWARE ACCELERATED POINT-BASED RENDERING

Dynamic Spatial Partitioning for Real-Time Visibility Determination. Joshua Shagam Computer Science

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

CS452/552; EE465/505. Clipping & Scan Conversion

International Journal of Computer Trends and Technology- volume3issue1-2012

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Hardware-Assisted Visibility Ordering for Point-Based and Volume Rendering

Point-Based rendering on GPU hardware. Advanced Computer Graphics 2008

CSE 167: Introduction to Computer Graphics Lecture #10: View Frustum Culling

3D Rasterization II COS 426

Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control

GPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data. Ralf Kähler (KIPAC/SLAC)

Coding OpenGL ES 3.0 for Better Graphics Quality

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Applications of Explicit Early-Z Culling

The Rasterization Pipeline

CS451Real-time Rendering Pipeline

Kinetic BV Hierarchies and Collision Detection

Real Time Rendering of Expensive Small Environments Colin Branch Stetson University

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Culling. Computer Graphics CSE 167 Lecture 12

Direct Volume Rendering

Get the most out of the new OpenGL ES 3.1 API. Hans-Kristian Arntzen Software Engineer

Ray Casting of Trimmed NURBS Surfaces on the GPU

Sign up for crits! Announcments

Wed, October 12, 2011

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Hardware Accelerated Rendering of Points as Surfaces

Hardware-driven Visibility Culling Jeong Hyun Kim

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Transcription:

Deferred Splatting Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE http://www.irit.fr/~gael.guennebaud

Plan Complex Scenes: Triangles or Points? High Quality Splatting: Really efficient? Deferred Splatting: Accurate point selection Temporal Coherency Applications: Occlusion culling & SPT Results Future Works

Motivations Real Time rendering of Complex Scenes Triangles: fully supported by graphics HW, but... tiny triangles are inefficient multi resolution can be very tedious One solution is Points: no connectivity, no texture map, no... multi resolution rendering: simple & efficient but...

Motivations One solution is Points, but... Large magnification: low quality Flat surfaces: inefficiency hybrid, triangles and points are complementary: use triangles when points become less efficient High Quality point rendering is expensive deferred splatting! IRIT University of Toulouse France

Efficient Point Rendering 2 issues: How to select points that have to be rendered? How to render the points?

Efficient Point Rendering How to select points that have to be rendered? Store points into a hierarchical data structure (kd tree, octree, hierarchy of bounded spheres,...) Recursive traversal with visibility culling (view frustum,back face,occlusion,...) LOD selection (local density estimation, remove superfluous points) How to render the points?

Efficient Point Rendering How to select points that have to be rendered? How to render the points? Efficiency => graphics HW splatting approach

GPU Point Rendering quality & performance issues Standard GL_POINTS ( render a disk instead of a square is almost free ) Opaque ellipses High Quality Splatting ( accumulation of elliptic Gaussian, e.g. EWA Surface Splatting ) 60 85 35 45 6 10 Number of million of points per second (GeForceFX 5900 under Linux) vs 44 M of small triangles per second

Complex Scenes : Example Scene ~ 6800 trees 1 tree ~ 750k points 5000 Millions points After High Level culling & LOD: ~ 4 M points are still potentially visible and have to be rendered But in fact only 150k are really visible!

Our Solution : Deferred Splatting is similar to deferred shading : Defer expensive rendering computations to visible points only is based on: An accurate point selection Temporal coherency

High Quality Splatting on GPU a multi pass algorithm Hierarchical & multi resolution data structure Data Set GPU High Level Point Selection (Culling, LOD,...) CPU sub set List of selected points (indexes, list of ranges,...) Z buffer buffer

High Quality Splatting on GPU a multi pass algorithm Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer CPU In order to accumulate visible splats only: pre compute the depth buffer: std GL_POINTS primitive + per fragment shape & depth correction buffer

High Quality Splatting on GPU a multi pass algorithm Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer CPU IRIT University of Toulouse France splatting (2) std GL_POINTS primitive + per fragment Gaussian weight + accumulation buffer

High Quality Splatting on GPU a multi pass algorithm Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) CPU sub set Data Set Visibility splatting (1) splatting (2) Owing to : weights 1 GPU Z buffer buffer Normalization (3)

High Quality Splatting on GPU [ analyse ] Hierarchical & multi resolution data structure Data Set GPU Visibility splatting (1) High Level Point Selection (Culling, LOD,...) sub set EXPENSIVE / SLOW 12 20 M pts/s Z buffer COARSE CPU COULD BE HUGE > 4 M pts splatting (2) buffer Normalization (3)

The Deferred Splatting Algorithm

Accurate Point Selection Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer CPU splatting buffer Normalization

Accurate Point Selection Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) CPU sub set Data Set Visibility splatting (1) splatting buffer Normalization GPU Z buffer Break this direct path Add an accurate point selection Only visible points should pass the new test

Accurate Point Selection Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer Index (2) Render points as fast as possible: no shading, no blending, no... GL_POINTS, size = 1 pixel = handle of the point CPU = comb(object's id,point's id) IRIT University of Toulouse France buffer = {handle of visible points}

Accurate Point Selection Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer Index (2) CPU Read & (2') Sort B i Read the color buffer Extract indices from handles Sort point's indices by object buffer IRIT University of Toulouse France => index arrays B i

Accurate Point Selection Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) sub set Data Set Visibility splatting (1) GPU Z buffer Index (2) CPU Read & (2') Sort B i splatting (3) buffer Normalization (4)

Accurate Point Selection break this direct path Hierarchical by & taking advantage of Data Set multi resolution temporal coherency data structure High Level Point Selection (Culling, LOD,...) CPU Read & (2') Sort sub set B i Visibility splatting (1) Index (2) splatting (3) GPU Z buffer buffer Normalization (4)

Accurate Point Selection Render only points which are Hierarchical visible in the & previous frame multi resolution data structure High Level Point Selection (Culling, LOD,...) CPU Read & (2') Sort B i 1 sub set B i Data Set Visibility splatting (1) B i 1 B i => holes Index (2) splatting (3) GPU Z buffer buffer Normalization (4)

Temporal Coherency : Artifacts Frame i Frame i+1 temporal coherency approximation leads to artifacts

Temporal Coherency Render only points which are Hierarchical visible in the & previous frame multi resolution data structure High Level Point Selection (Culling, LOD,...) B i 1 sub set Data Set Visibility splatting (1) B Index i 1 B (2) i => holes GPU Z buffer Read & (2') Sort CPU B i splatting (4) buffer Normalization (5)

Temporal Coherency Hierarchical & multi resolution data structure Compute B i from the High Level Point Selection (Culling, LOD,...) B i 1 incomplete Z buffer Also compute B i B i 1 sub set Data Set Visibility splatting (1) Update the Index Z buffer (2) : Render B i B i 1 GPU Z buffer Read & (2') Sort CPU B i B i 1 B i Visibility splatting (3) splatting (4) buffer Normalization (5)

The Complete Algorithm summary step by step Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) B i 1 Data Set Visibility splatting (1) sub set Index (2) GPU Z buffer Read & (2') Sort CPU B i B i 1 B i Visibility splatting (3) splatting (4) buffer Normalization (5)

One point per pixel... Deferred Splatting allows only one point per pixel Advantages Remove superfluous points (LOD selection) Solve color buffer overflow (only 8 bits per component) Drawbacks

One point per pixel... Deferred Splatting allows only one point per pixel Advantages Drawbacks We may lose texture information High frequency textured models + coarse high level LOD selection flickering artifacts... Can be solved using surfel mipmap [Pfister et al. 00]

Deferred Splatting Applications Occlusion Culling Sequential Point Trees

High Level Occlusion Culling Hierarchical & multi resolution data structure High Level Point Selection (Culling, LOD,...) B i 1 Data Set Visibility splatting (1) HW Occlusion Queries (asynchronous) GPU Z buffer Occluded nodes removal sub set buffer CPU

Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy

Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level

Sequential Point Trees [Dachsbacher03] Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level

Sequential Point Trees Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level SPT Coarse SPT selection prefix Classical High Quality Splatting: CPU all points of the coarse prefix are processed by 2 complex vertex programs IRIT University => inefficient of Toulouse France Visibility splatting(1) + SPT fine selection splatting (2) + SPT fine selection GPU

Sequential Point Trees Preprocessing: build a sequential version of the hierarchy Rendering: CPU: fast & coarse selection of a prefix GPU: fine LOD selection at the point level SPT Coarse SPT selection prefix Deferred Splatting: CPU all points of the coarse prefix are processed by 1 very simple vertex program IRIT University => efficient of Toulouse France Index (2) + SPT fine selection GPU

Results Classical GPU based High Quality Splatting versus Deferred Splatting

Results : Simple Head 285k points Average FPS: EWA Splatting: 34 Deferred Splatting: 41 Speed up: x1.2 % of culled points: 50 70% with DS classic 0 10 20 30 40 IRIT University of Toulouse France EWA Splatting Reading buffer + sort Render Indexes Visibility Splatting

Results : 200 Hugo 1 Hugo = 450k points Scene = 200 Hugo in motion Average FPS: EWA Splatting: 11.5 Deferred Splatting: 34.5 Speed up : x3 % of culled points: 90% with DS classic 0 20 40 60 80 IRIT University of Toulouse France EWA Splatting Reading buffer + sort Render Indexes Visibility Splatting

Results : Forest 1 tree = 750k points Scene = 6800 trees Average FPS: EWA Splatting: 1.1 1.8 Deferred Splatting:11 20 Speed up : x10 % of culled points: 90 97% with DS classic with DS classic (1 tree) EWA Splatting Reading buffer + sort Render Indexes Visibility Splatting 0 50 100 150 200 250 300 350 400 450 500 550 600 IRIT University of Toulouse France

What about screen resolutions? When the screen size increases The rendering time linearly increases The speed up of deferred splatting remains constant Large resolution => reading the color buffer becomes expensive: 1024² => 25ms! AGP limitation > PCI express? 512x512 724x724 EWA Splatting Reading buffer + sort Render Indexes Visibility Splatting 1024x1024 0 50 100 200 300 400 500 600 700 800 900 1000 1100 1200 IRIT University of Toulouse France

Usability Unsuitable for simple scenes (< ~300k points) Based on the assumption that a point is visible or not true for small points only (< ~10 pixels) For our initial context it is always true large points are inefficient => use triangles If you don't have a polygonal representation: render large points anyway

Conclusion works at the point level and does: view frustum culling occlusion culling (and back face culling) LOD selection high quality splatting on highly complex scenes suitable for dynamic scenes & point clouds no assumption on the high level data structure no additional preprocessing simple and efficient

Future Works Full hardware implementation keep the CPU free no slow reading from the GPU to the CPU More efficient/accurate high level point selection new data structures new algorithms IRIT University of Toulouse France

Questions?