OpenCL Implementation Of A Heterogeneous Computing System For Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data


 Arthur Morgan
 5 days ago
 Views:
Transcription
1
2 OpenCL Implementation Of A Heterogeneous Computing System For Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data Andrew Miller Computer Vision Group Research Developer
3 3D TERRAIN RECONSTRUCTION FROM IMAGES 3 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
4 APPLICATIONS Augmented Reality Change Detection Geo Referencing Geo Positioning Visibility app Geo Reference the input image Geocoordinates (Long=?,lat=?, ele=?) Vantage Point 2d Vantage point 3d model 3d Planar model Vantage point 4 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
5 GEOPOSITIONING Input Output Geocoordinates (Long=?,lat=?, ele=?) 3d model Planar model 5 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
6 APPLICATIONS Example: Change Detection: Change detection on planar model Change detection on 3d model 6 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
7 WHY IS IT CHALLENGING? 7 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
8 EXISTING METHODS 3d Method Pros Cons References SurfaceBased Methods Output is a mesh. Cannot handle ambiguity, uncertainty, complex topology Cipolla et al. TPAMI Cipolla et al. TPAMI Zaharescu et al. ACCV Volumetric Methods Handles ambiguity and occlusion. Large amount of data and computationally expensive. Broadhurst et al. ICCV Pollard et al. CVPR 2007 Crispell 2009 Viola et al. ICCV 1999 Furukawa 2008 Crispell Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
9 VOLUMETRIC MODELING 9 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
10 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 10 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
11 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 11 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
12 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 12 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
13 VOXEL DATA Two probability distributions stored in each voxel: Occupancy probability (alpha) Appearance model (Mixture of 3 Gaussians) P ( X S ) Probability that voxel X is in a surface I (intensity ) μ 0 σ 0 w 0 μ 1 σ 1 w 1 μ 2 σ 2 P ( I X S ) = i = 0 13 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, w i N ( μ, σ i 2 i )
14 MODEL MANIPULATION Divide volume into voxels that store two distributions Need to learn the correct (best) parameters for these two distributions Two main procedures to manipulate our model Update (model reconstruction) Given a set of observations and cameras, learn distributions in each voxel Render (expected images) Given a novel viewpoint, render expected image of the volume Both of these procedures are accomplished by casting rays 14 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
15 RAY CASTING Cast ray through the image plane into volume Sum relevant statistics as rays intersect voxels Camera Center Image Plane 15 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
16 MODEL ALONG A RAY Assume one voxel along a ray produces observed intensity Must be unoccluded, and have high surface probability How likely some voxel Xα is responsible for observation at r: P( V = X α ) = P( X α S ) [1 P( X α ' S )] r α ' < α Surface likelihood Visibility of voxel alpha 16 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
17 UPDATING Use an online algorithm to update surface probabilities: P N +1 ( X S ) = P N ( X S ) p N (I N +1 X S ) p N (I) Updated surface likelihood Previous surface likelihood Bayes update factor 17 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
18 UPDATING: Downtown Providence 18 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
19 UPDATING: Capitol Building in Providence 19 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
20 RENDERING Marginalize appearance across ray Calculate visibility of each cell along ray Keep running sum of the expected intensity: E[I r ] = α P(V r = X α )E[ p(i X S)] Expected image sequence of downtown Providence, RI 20 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
21 MEMORY FOOTPRINT 21 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
22 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 22 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
23 MEMORY FOOTPRINT Existing scenes (downtown and capitol) are 1536 by 1536 by billion voxels Each voxel carries at least 12 bytes of information: 4 bytes for occupancy probability 8 bytes for appearance model Would require about 14 GB of information! Reducing Memory in two ways Spatial subdivision Efficiently encoded octree structure 23 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
24 SPATIAL SUBDIVISION Fixed grid of shallow octrees Trees have depth limit 4 (512 total voxels) Designed for optimal performance with GPU architecture An octree is a data structure in which each internal node has exactly eight children. Octrees are used to partition a three dimensional space by subdividing it into eight octants. 24 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
25 BITENCODED OCTREE Represent tree structure as implicit octree String of bits, each bit indicates if cell has children 73 bits are sufficient for depth four tree (10 bytes) 4 reserved for data pointer (and align to 16) Tree bits [09] Data Ptr [10,11,12,13] [ ] [ ] Simple formulas to find child and parent bits: parent (i) = i 1 8 child (i) = 8 i +1 Tree structure can be stored in local memory (shared memory) per thread Accelerates tree traversal 25 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
26 BITENCODED OCTREE Data stored in BFS order, tree points to root cell Simple sum calculate data index: Binary example: parent ( i ) 1 data _ index (i) = 8 tree _ bit ( j) + 1 j = 0 (a) (c) (b) a) Bits encoding octree found in (b). b) Binary tree where each node s data is a letter c) Location of each letter in memory 26 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
27 DATA STORAGE Modular data storage Voxel data stored separately from tree structure Minimizes data transfer for kernels that only need some data (i.e. rendering, update passes) Indexing identical across data buffers 27 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
28 REFINE Once the surface probability passes a certain threshold, cell is refined into 8 octants 28 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
29 REFINE Unrefined model 29 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
30 REFINE Refined to depth four 30 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
31 REFINE Refining one tree Surface Probability Buffer p p p 0 1 New data cells p p p 2 p 3 p 4 p 5 p Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
32 REFINE Difficult to parallelize Cannot refine all trees in parallel in place as race conditions may occur (data written over other new data) Refine in three steps: Compute and save new tree structure, size of each new tree Run a prefix sum on tree sizes to determine new location of each tree's data Allocate new, sufficiently large data buffer, swap data into new locations 32 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
33 REFINE Three parallel steps: 1. Compute new Tree Structure, sizes 2. Compute cumulative sum of size buffer using parallel prefix sum (determines new data indices) s 0 s 1 s 2 s 3 3. Swap data into new location (initializing new cells) α s 0 s 1 s 2 s 3 s 0 0 s i 1 i=0 2 i=0 s i α α 0 α 1 α 2 α 3 33 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
34 GPU IMPLEMENTATION 34 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
35 VOLUMETRIC MODELING Basics: Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 35 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
36 GPU IMPLEMENTATION Ray tracing is expensive on volumetric models Each voxel along the path of a ray needs to be intersected and examined Very parallel algorithm Can use GPU to process multiple rays simultaneously Note that now we are ray tracing a hybrid gridoctree structure 36 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
37 RAY CASTING REVISITED Workgroups are 8x8 patches of image (64 rays) Camera Center Image Plane 37 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
38 RAY CASTING Tracing through a fixed grid of octrees Two approaches: Two nested while loops (one traversing the grid, the other traversing the current octree) Single while loop, each ray caches it's own current tree 38 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
39 Double Loop Method Outer Loop: Iterating over regulargrid 39 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
40 Double Loop Method Outer Loop: Iterating over regulargrid 40 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
41 Double Loop Method Outer Loop: Iterating over regulargrid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 41 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
42 Double Loop Method Outer Loop: Iterating over regulargrid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 42 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
43 Double Loop Method Outer Loop: Iterating over regulargrid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 43 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
44 Double Loop Method Outer Loop: Iterating over regulargrid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 44 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
45 Double Loop Method Outer Loop: Iterating over regulargrid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 45 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
46 Double Loop Method Outer Loop: Iterating over regulargrid 46 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
47 Double Loop Method Outer Loop: Iterating over regulargrid New octree loaded into local memory Inner Loop: Iterating over Octree 47 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
48 Double Loop Method Outer Loop: Iterating over regulargrid New octree loaded into local memory Inner Loop: Iterating over Octree 48 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
49 Double Loop Method Outer Loop: Iterating over regulargrid New octree loaded into local memory Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) Inner Loop: Iterating over Octree 49 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
50 Outer Loop: Iterating over regulargrid Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) 50 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
51 Single Loop Method Rays traverse fixed grid and octrees 51 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
52 Single Loop Method Rays traverse fixed grid and octrees 52 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
53 Single Loop Method Rays traverse fixed grid and octrees Each ray reads octree (16 bytes) into local memory 53 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
54 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 54 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
55 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 55 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
56 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 56 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
57 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 57 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
58 RAY CASTING Single while loop tracing performs better Threads do not have to wait for neighbors to continue processing Global memory transfer time is hidden by scheduler Operations on tree in local memory are fast 58 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
59 UPDATING Complications with update: Multiple rays can pass through the same cell; creates race conditions Need atomic operations to update global memory Global writes (especially atomics) are very expensive Two rays (in red) update the same cell need atomic (serial) operations. 59 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
60 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per workitem (thread) Determine connected components in two passes: Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
61 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per workitem (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
62 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per workitem (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right 2)Each tail looks at the next row, continues linked list Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
63 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per workitem (thread) The head of each list sums up their contribution to their cell The head of each list then makes an atomic call to update global memory Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
64 SPEED AND MEMORY ANALYSIS 64 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
65 SPEED AND MEMORY ANALYSIS Tested multiple processes: Render top down (fewer intersections), Render oblique (more intersections), Update empty scene, Update near converged scene, Refine Processes tested in OpenCL on: multicore (Gulftown) CPU 200 series NVIDIA card Fermi generation NVIDIA card ATI 5870 series OpenCL code is currently optimized for NVIDIA Fermi generation cards Unrolled 3d rays/points in float4 vectors to three floats to save register space/boost occupancy Using 8x8 workgroup size 65 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
66 RENDER VS OCCUPANCY 900 Render time vs kernel occupancy for two generations of NVIDIA cards. In our ray trace implementation, lower occupancy yielded faster results, with a steeper climb for the previous generation hardware. Render Time (ms) GPU time for GTX280 GPU time for GTX Occupancy 66 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
67 UPDATE VS WORKSPACE SIZE Update time vs workgroup size Optimal workgroup size appears to be 8x8 for both current and previous generation NVIDIA cards. Time (ms) Gtx280 GTX x2 6x6 10x10 14x14 workgroup size 67 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
68 SPEED AND MEMORY ANALYSIS Render, update and refine benchmarks: Render Top down Render Oblique Update Empty CPU (intel 3.33 GHz) ATI 5870 NVIDIA GTX 280 NVIDIA GTX 480 NVIDIA GTX s.15s.118s.062s.039s 2.23s.27s.213s.114s.065ms 8.11s 4.72s 4.37s (cache) 13.99s 1.81s (cache).934s.795s (cache).790s Optimized OpenCL code for NVIDIA architecture. Benchmarks run out of the box on ATI card. ATI does vector ops in parallel, our implementation uses scalar operations. Update Converged 18.92s n/a n/a 1.363s 1.990s (cache).999s 68 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
69 FURTHER ANALYSIS Size of GPU memory currently restricts model size. Larger scenes stream blocks in and out of the GPU Current render processing is faster than 5 gb/s (PCIe bandwidth) Block streaming from host to device memory is a bottleneck Larger scenes would benefit from an integrated CPU/GPU with higher bandwidth 69 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
70 FUTURE WORK 70 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
71 FUTURE WORK Goals: More accurately model 3d space Volumetric (cone) ray tracing. Casting 3d cone rays, instead of 2d lines Consider spatial continuity. Filtering voxels, accounting for neighboring joint distribution Model underlying physical properties (surface normals, textures, non Lambertian properties) to make models robust to many surface types and lighting conditions Build bigger models Multi resolution scenes (using octree) Streaming cache system (disk to RAM to GPU) for many blocks 71 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
72 CONE RAY TRACE Volumetric ray trace Goal is to cover the entire volume when updating Camera Center Image Plane 72 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
73 CONE RAY TRACE More ray coverage removes some aliasing effects: Cone Update Ray Update 73 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
74 DATA STREAMING Streaming models for larger scenes Begun implementing a streaming model Asynchronous transfers between three levels of memory 74 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
75 QUESTIONS? 75 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
76 CODE PUBLICLY AVAILABLE AT: 76 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011
77 Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied. 77 Realtime Rendering And Dynamic Updating Of Dense 3d Volumetric Data May 23, 2011