OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data"

Transcription

1

2 OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data Andrew Miller Computer Vision Group Research Developer

3 3-D TERRAIN RECONSTRUCTION FROM IMAGES 3 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

4 APPLICATIONS Augmented Reality Change Detection Geo Referencing Geo Positioning Visibility app Geo Reference the input image Geocoordinates (Long=?,lat=?, ele=?) Vantage Point 2-d Vantage point 3-d model 3-d Planar model Vantage point 4 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

5 GEO-POSITIONING Input Output Geo-coordinates (Long=?,lat=?, ele=?) 3-d model Planar model 5 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

6 APPLICATIONS Example: Change Detection: Change detection on planar model Change detection on 3-d model 6 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

7 WHY IS IT CHALLENGING? 7 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

8 EXISTING METHODS 3-d Method Pros Cons References Surface-Based Methods Output is a mesh. Cannot handle ambiguity, uncertainty, complex topology Cipolla et al. TPAMI Cipolla et al. TPAMI Zaharescu et al. ACCV Volumetric Methods Handles ambiguity and occlusion. Large amount of data and computationally expensive. Broadhurst et al. ICCV Pollard et al. CVPR 2007 Crispell 2009 Viola et al. ICCV 1999 Furukawa 2008 Crispell Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

9 VOLUMETRIC MODELING 9 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

10 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 10 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

11 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 11 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

12 VOLUMETRIC MODEL Given a volume, divide into volumetric pixels, called voxels 12 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

13 VOXEL DATA Two probability distributions stored in each voxel: Occupancy probability (alpha) Appearance model (Mixture of 3 Gaussians) P ( X S ) Probability that voxel X is in a surface I (intensity ) μ 0 σ 0 w 0 μ 1 σ 1 w 1 μ 2 σ 2 P ( I X S ) = i = 0 13 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, w i N ( μ, σ i 2 i )

14 MODEL MANIPULATION Divide volume into voxels that store two distributions Need to learn the correct (best) parameters for these two distributions Two main procedures to manipulate our model Update (model reconstruction) Given a set of observations and cameras, learn distributions in each voxel Render (expected images) Given a novel viewpoint, render expected image of the volume Both of these procedures are accomplished by casting rays 14 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

15 RAY CASTING Cast ray through the image plane into volume Sum relevant statistics as rays intersect voxels Camera Center Image Plane 15 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

16 MODEL ALONG A RAY Assume one voxel along a ray produces observed intensity Must be un-occluded, and have high surface probability How likely some voxel Xα is responsible for observation at r: P( V = X α ) = P( X α S ) [1 P( X α ' S )] r α ' < α Surface likelihood Visibility of voxel alpha 16 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

17 UPDATING Use an online algorithm to update surface probabilities: P N +1 ( X S ) = P N ( X S ) p N (I N +1 X S ) p N (I) Updated surface likelihood Previous surface likelihood Bayes update factor 17 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

18 UPDATING: Downtown Providence 18 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

19 UPDATING: Capitol Building in Providence 19 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

20 RENDERING Marginalize appearance across ray Calculate visibility of each cell along ray Keep running sum of the expected intensity: E[I r ] = α P(V r = X α )E[ p(i X S)] Expected image sequence of downtown Providence, RI 20 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

21 MEMORY FOOTPRINT 21 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

22 VOLUMETRIC MODELING Basics Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 22 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

23 MEMORY FOOTPRINT Existing scenes (downtown and capitol) are 1536 by 1536 by billion voxels Each voxel carries at least 12 bytes of information: 4 bytes for occupancy probability 8 bytes for appearance model Would require about 14 GB of information! Reducing Memory in two ways Spatial subdivision Efficiently encoded octree structure 23 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

24 SPATIAL SUBDIVISION Fixed grid of shallow octrees Trees have depth limit 4 (512 total voxels) Designed for optimal performance with GPU architecture An octree is a data structure in which each internal node has exactly eight children. Octrees are used to partition a three dimensional space by subdividing it into eight octants. 24 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

25 BIT-ENCODED OCTREE Represent tree structure as implicit octree String of bits, each bit indicates if cell has children 73 bits are sufficient for depth four tree (10 bytes) 4 reserved for data pointer (and align to 16) Tree bits [0-9] Data Ptr [10,11,12,13] [ ] [ ] Simple formulas to find child and parent bits: parent (i) = i 1 8 child (i) = 8 i +1 Tree structure can be stored in local memory (shared memory) per thread Accelerates tree traversal 25 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

26 BIT-ENCODED OCTREE Data stored in BFS order, tree points to root cell Simple sum calculate data index: Binary example: parent ( i ) 1 data _ index (i) = 8 tree _ bit ( j) + 1 j = 0 (a) (c) (b) a) Bits encoding octree found in (b). b) Binary tree where each node s data is a letter c) Location of each letter in memory 26 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

27 DATA STORAGE Modular data storage Voxel data stored separately from tree structure Minimizes data transfer for kernels that only need some data (i.e. rendering, update passes) Indexing identical across data buffers 27 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

28 REFINE Once the surface probability passes a certain threshold, cell is refined into 8 octants 28 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

29 REFINE Unrefined model 29 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

30 REFINE Refined to depth four 30 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

31 REFINE Refining one tree Surface Probability Buffer p p p 0 1 New data cells p p p 2 p 3 p 4 p 5 p Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

32 REFINE Difficult to parallelize Cannot refine all trees in parallel in place as race conditions may occur (data written over other new data) Refine in three steps: Compute and save new tree structure, size of each new tree Run a prefix sum on tree sizes to determine new location of each tree's data Allocate new, sufficiently large data buffer, swap data into new locations 32 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

33 REFINE Three parallel steps: 1. Compute new Tree Structure, sizes 2. Compute cumulative sum of size buffer using parallel prefix sum (determines new data indices) s 0 s 1 s 2 s 3 3. Swap data into new location (initializing new cells) α s 0 s 1 s 2 s 3 s 0 0 s i 1 i=0 2 i=0 s i α α 0 α 1 α 2 α 3 33 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

34 GPU IMPLEMENTATION 34 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

35 VOLUMETRIC MODELING Basics: Voxel data and probabilistic framework Ray tracing Updating Rendering Memory Footprint Spatial subdivisions Efficient spatial encoding and Data storage Refining GPU Implementation Ray Tracing Updating in parallel, with GPU optimizations 35 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

36 GPU IMPLEMENTATION Ray tracing is expensive on volumetric models Each voxel along the path of a ray needs to be intersected and examined Very parallel algorithm Can use GPU to process multiple rays simultaneously Note that now we are ray tracing a hybrid grid-octree structure 36 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

37 RAY CASTING REVISITED Workgroups are 8x8 patches of image (64 rays) Camera Center Image Plane 37 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

38 RAY CASTING Tracing through a fixed grid of octrees Two approaches: Two nested while loops (one traversing the grid, the other traversing the current octree) Single while loop, each ray caches it's own current tree 38 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

39 Double Loop Method Outer Loop: Iterating over regular-grid 39 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

40 Double Loop Method Outer Loop: Iterating over regular-grid 40 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

41 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 41 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

42 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 42 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

43 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 43 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

44 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 44 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

45 Double Loop Method Outer Loop: Iterating over regular-grid Each ray reads octree (16 bytes) into local memory Inner Loop: Iterating over Octree 45 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

46 Double Loop Method Outer Loop: Iterating over regular-grid 46 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

47 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Inner Loop: Iterating over Octree 47 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

48 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Inner Loop: Iterating over Octree 48 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

49 Double Loop Method Outer Loop: Iterating over regular-grid New octree loaded into local memory Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) Inner Loop: Iterating over Octree 49 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

50 Outer Loop: Iterating over regular-grid Note how the two upper rays must wait for the lower ray to finish (due to OpenCL branching) 50 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

51 Single Loop Method Rays traverse fixed grid and octrees 51 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

52 Single Loop Method Rays traverse fixed grid and octrees 52 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

53 Single Loop Method Rays traverse fixed grid and octrees Each ray reads octree (16 bytes) into local memory 53 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

54 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 54 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

55 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 55 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

56 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 56 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

57 Single Loop Method Rays traverse fixed grid and octrees Bottom ray does not wait, loads next block into local memory 57 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

58 RAY CASTING Single while loop tracing performs better Threads do not have to wait for neighbors to continue processing Global memory transfer time is hidden by scheduler Operations on tree in local memory are fast 58 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

59 UPDATING Complications with update: Multiple rays can pass through the same cell; creates race conditions Need atomic operations to update global memory Global writes (especially atomics) are very expensive Two rays (in red) update the same cell need atomic (serial) operations. 59 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

60 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

61 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

62 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) Determine connected components in two passes: 1)Each thread in the work group connects to the right 2)Each tail looks at the next row, continues linked list Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

63 Example: 4x4 work group: Matrix of voxel IDs in local memory. One element per work-item (thread) The head of each list sums up their contribution to their cell The head of each list then makes an atomic call to update global memory Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

64 SPEED AND MEMORY ANALYSIS 64 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

65 SPEED AND MEMORY ANALYSIS Tested multiple processes: Render top down (fewer intersections), Render oblique (more intersections), Update empty scene, Update near converged scene, Refine Processes tested in OpenCL on: multi-core (Gulftown) CPU 200 series NVIDIA card Fermi generation NVIDIA card ATI 5870 series OpenCL code is currently optimized for NVIDIA Fermi generation cards Unrolled 3d rays/points in float4 vectors to three floats to save register space/boost occupancy Using 8x8 workgroup size 65 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

66 RENDER VS OCCUPANCY 900 Render time vs kernel occupancy for two generations of NVIDIA cards. In our ray trace implementation, lower occupancy yielded faster results, with a steeper climb for the previous generation hardware. Render Time (ms) GPU time for GTX280 GPU time for GTX Occupancy 66 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

67 UPDATE VS WORKSPACE SIZE Update time vs workgroup size Optimal workgroup size appears to be 8x8 for both current and previous generation NVIDIA cards. Time (ms) Gtx280 GTX x2 6x6 10x10 14x14 workgroup size 67 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

68 SPEED AND MEMORY ANALYSIS Render, update and refine benchmarks: Render Top down Render Oblique Update Empty CPU (intel 3.33 GHz) ATI 5870 NVIDIA GTX 280 NVIDIA GTX 480 NVIDIA GTX s.15s.118s.062s.039s 2.23s.27s.213s.114s.065ms 8.11s 4.72s 4.37s (cache) 13.99s 1.81s (cache).934s.795s (cache).790s Optimized OpenCL code for NVIDIA architecture. Benchmarks run out of the box on ATI card. ATI does vector ops in parallel, our implementation uses scalar operations. Update Converged 18.92s n/a n/a 1.363s 1.990s (cache).999s 68 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

69 FURTHER ANALYSIS Size of GPU memory currently restricts model size. Larger scenes stream blocks in and out of the GPU Current render processing is faster than 5 gb/s (PCIe bandwidth) Block streaming from host to device memory is a bottleneck Larger scenes would benefit from an integrated CPU/GPU with higher bandwidth 69 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

70 FUTURE WORK 70 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

71 FUTURE WORK Goals: More accurately model 3-d space Volumetric (cone) ray tracing. Casting 3-d cone rays, instead of 2-d lines Consider spatial continuity. Filtering voxels, accounting for neighboring joint distribution Model underlying physical properties (surface normals, textures, non Lambertian properties) to make models robust to many surface types and lighting conditions Build bigger models Multi resolution scenes (using octree) Streaming cache system (disk to RAM to GPU) for many blocks 71 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

72 CONE RAY TRACE Volumetric ray trace Goal is to cover the entire volume when updating Camera Center Image Plane 72 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

73 CONE RAY TRACE More ray coverage removes some aliasing effects: Cone Update Ray Update 73 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

74 DATA STREAMING Streaming models for larger scenes Begun implementing a streaming model Asynchronous transfers between three levels of memory 74 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

75 QUESTIONS? 75 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

76 CODE PUBLICLY AVAILABLE AT: 76 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011

77 Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied. 77 Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data May 23, 2011