for 3D Texture-based Volume Visualization on GPU Won-Jong Lee, Tack-Don Han Media System Laboratory (http://msl.yonsei.ac.k) Dept. of Computer Science, Yonsei University, Seoul, Korea
Contents Background & Proposed Octree-based Bandwidth Effective Volume Rendering al Results
Why Volume Rendering? Overcome limitation of polygon-based rendering Volume rendering generate photorealistic and nonphotorealistic scene 3D Texture VR Limitation Polygon Dataset Polygonal Rendering Rendering Volume Dataset Volume Rendering Real Object
Application#1 Visualization 3D Texture VR Limitation material sciences geology medical visualization
Application#2 - Entertainment atmospheric effects 3D Texture VR Limitation explosions, FX game
3D Texture-base Volume Rendering Texture Based Volume-Rendering Low cost & high quality method 3D Texture VR Limitation X-ray like shading(1996) Shading with OpenGL extensions(1998) Shading with Register Combiner & Multi-Texture Units(1999) Classification with Multi-dimensional T/F(2001) GPU ray-casting (2003) Real-time decompression & rendering (2003) NPR for VR (2003) Point-based VR (2004)
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
3D Texture-base Volume Rendering Typical GPU-based volume rendering flow 3D Texture VR Limitation Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
Limitation of 3D Texture Mapping The size of volume data sets can be processed is still limited usually, 64-256MB graphic memory, middle sized(512 3 ) 8 16bit volume data, 256MB 512MB Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation 3D Texture VR Limitation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
Limitation of 3D Texture Mapping Bottleneck in graphic memory bus lots of texture & pixel traffic (interpolation, α-blending) Lower localities in texture mapping & blending operation 3D Texture VR Limitation 1, request Pixel Gradient Computation Shader2,3,4 Hit! Shaders and T/F Creation 5,6,7,8, Miss!! Texture Threshing! 1,2,3,4 load u Main Graphic Original Volume Data CPU Disk t 2 3 6 1 7 4 5 8 Vertex & Texture Coordinates Volume, Gradients, T/F, s Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
Limitation of 3D Texture Mapping Bottleneck in graphic memory bus lots of texture & pixel traffic (interpolation, α- blending) Lower localities in texture mapping & blending operation GPU Blending Unit Gradient Computation Pixel Shaders and T/F Creation Hit! Main Miss!! Original Volume Data CPU Disk Graphic Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation 3D Texture VR Limitation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
Limitation of 3D Texture Mapping Brute-forced approach classical optimization technique cannot be utilized (empty-space skipping, early-ray termination) 3D Texture VR Limitation unable to skip space Gradient Computation Shaders and T/F Creation Main Original Volume Data CPU Disk Vertex & Texture Coordinates Volume, Gradients, T/F, Graphic Pre-TnL Texture Pixel GPU Vertex Shader Post-TnL Triangle Setup And Rasterization Pixel Shader Raster Operation Coordinates Transform Classification, 3D Texture Mapping, Lighting Blending
Contributions of This Research Propose an sub-division based rendering algorithm for GPU bandwidth-effectiveness Enabling empty space skipping for optimized rendering Utilizing modern GPU features for low-cost texture coordinate computation Build-up for effective H/W simulation environments for GPU based rendering
I.Boada, I.Navazo, and R.Scopigno. Multiresolution Volume Visualization with a Texture-Based Octree. The Visual Computer, 17(3), 185-197, 2001. VR on GPU Multi-resolution Empty Space Skipping AMR tree level (a) level (b) level (c)
W. Li, K. Mueller, and A. Kaufman, Empty Space Skipping and Occlusion Clipping for Texture-Based Volume Rendering, In Proc. of IEEE Visualization 2003, pages 317-324, 2003. VR on GPU Multi-resolution Empty Space Skipping AMR tree partitioned by the growing boxes converted into a BSP tree Rendering with mixed boxes and texture
R. Kauhler, M. Simon, and H. C. Hege, Interactive Volume Rendering of Large Sparse Data Sets Using Adaptive Mesh Refinement Hierarchies, IEEE Transactions on Visualization and Computer Graphics, VOL. 9, NO. 3, JULY-SEPTEMBER 2003, pages 341-351. VR on GPU Multi-resolution Empty Space Skipping AMR tree Standard w/ Octree w/ AMR tree
Proposed Method
Proposed Octree-based V/R Octree based Empty Space Skipping Octree Hierarchy Rendering unable to skip empty space Sub-division enabling empty empty space space checking skipping standard method proposed method
Proposed Octree-based V/R Efficiency Octree Hierarchy Rendering standard method texture cache pixel cache proposed method texture cache pixel cache HIT HIT for each sub-volume MISS only cold MISS on switching sub-volume
Proposed Octree-based V/R Rendering Octree in GPU Utilizing the property of regular sized sub-volumes Easy to detect visibility order of each sub-volume Octree Hierarchy Rendering Incremental texture coordinates computations in GPU
al Results
Benchmarks Environment Analysis Texture Pixel Total Bandwidth Frame Rates Lobster (255x255x56) Engine (255x255x110) Foot (255x255x255) CTdata (255x255x255)
GPU Simulation Environment #1 CPU-based software simulation For evaluating cache efficiency, memory bandwidth compile with Cg compiler (cgc) Cg vertex & fragment shader NV_vertex_program NV_fragment_program GL function & shader assembly call Mesa 6.0 (fully compatible with OpenGL 1.5) Environment Analysis Texture Pixel Total Bandwidth Frame Rates C++ OpenGL application Dinero III scene cache simulation results Make comparison with standard and proposed method Under perfectly same cache configuration fully associative cache, 32Bytes block, 1~128KBytes cache 512x512 screen size, 10 frame rendered
GPU Simulation Environment #2 GPU-based hardware experiment For evaluating rendering speedup Cg vertex & fragment shader compile with Cg compiler (cgc) GL function call Environment Analysis Texture Pixel Total Bandwidth Frame Rates NV_vertex_program NV_fragment_program Standard OpenGL library (opengl32.dll) C++ OpenGL application GPU scene & frame rates result load Shader assembly to GPU Make comparison with standard and proposed method Under perfectly same condition Pentium IV 2.80GHz PC with 512MB RDRAM & AGP8x bus Nvidia GeForce FX 5900 with 256MB graphic memory
Trade-Off Analysis Average number of vertices per frame (K). 160 140 120 100 80 60 40 20 0 lobster engine foot ctdata 256^3 128^3 64^3 32^3 16^3 8^3 The size of a sub-volume Non-empty region ratio. 1.2 1 0.8 0.6 0.4 0.2 0 lobster engine foot ctdata 256^3 128^3 64^3 32^3 16^3 8^3 The size of a sub-volume Environment Analysis Texture Pixel Total Bandwidth Frame Rates Average # of Vertices per Frame Non-Empty Region Ratio
Miss Rates Texture 20% NSD OSD(64^3) OSD(32^3) OSD(16^3) 20% NSD OSD(64^3) OSD(32^3) OSD(16^3) Miss Rate(%) 18% 16% 14% 12% 10% 8% 6% 4% 2% Miss Rate(%) 18% 16% 14% 12% 10% 8% 6% 4% 2% Environment Analysis Texture Pixel Total Bandwidth Frame Rates 0% 1 2 4 8 16 32 64 128 Texture Size(KBytes) Lobster (255x255x56) 0% 1 2 4 8 16 32 64 128 Texture Size(KBytes) Engine (255x255x110) NSD OSD(64^3) OSD(32^3) OSD(16^3) NSD OSD(64^3) OSD(32^3) OSD(16^3) 20% 20% 18% 18% 16% 16% 14% 14% Miss Rates(%) 12% 10% 8% Miss Rates(%) 12% 10% 8% 6% 6% 4% 4% 2% 2% 0% 1 2 4 8 16 32 64 128 Texture Size(KBytes) Foot (255x255x255) 0% 1 2 4 8 16 32 64 128 Texture Size(KBytes) CTdata (255x255x255) NSD : Non-Subdivided OSD : Octree Subdivided (number) : size of sub-volume
Miss Rates Pixel (Color) Miss Rate (%) NSD OSD(64^3) OSD(32^3) OSD(16^3) 5.5% 5.0% 4.5% 4.0% 3.5% 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% Miss Rate (%) NSD OSD(64^3) OSD(32^3) OSD(16^3) 5.0% 4.5% 4.0% 3.5% 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% Environment Analysis Texture Pixel Total Bandwidth Frame Rates 0.0% 1 2 4 8 16 32 64 128 0.0% 1 2 4 8 16 32 64 128 Color Size (KB) Lobster (255x255x56) Color Size (KB) Engine (255x255x110) 5.5% 5.0% 4.5% NSD OSD(64^3) OSD(32^3) OSD(16^3) 5.0% 4.5% 4.0% NSD OSD(64^3) OSD(32^3) OSD(16^3) 4.0% 3.5% Miss Rate (%) 3.5% 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% 0.0% 1 2 4 8 16 32 64 128 Color Size (KB) Foot (255x255x255) Miss Rate (%) 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% 0.0% 1 2 4 8 16 32 64 128 Color Size (KB) CTdata (255x255x255) NSD : Non-Subdivided OSD : Octree Subdivided (number) : size of sub-volume
Miss Rates Pixel (Depth) 10% NSD OSD(64^3) OSD(32^3) OSD(16^3) 10% NSD OSD(64^3) OSD(32^3) OSD(16^3) Miss Rate (%) 9% 8% 7% 6% 5% 4% 3% 2% 1% Miss Rate (%) 9% 8% 7% 6% 5% 4% 3% 2% 1% Environment Analysis Texture Pixel Total Bandwidth Frame Rates 0% 1 2 4 8 16 32 64 128 0% 1 2 4 8 16 32 64 128 Depth Size (KB) Lobster (255x255x56) Depth Size (KB) Engine (255x255x110) NSD OSD(64^3) OSD(32^3) OSD(16^3) NSD OSD(64^3) OSD(32^3) OSD(16^3) 10% 10% 9% 9% 8% 8% 7% 7% Miss Rate (%) 6% 5% 4% Miss Rate (%) 6% 5% 4% 3% 3% 2% 2% 1% 1% 0% 1 2 4 8 16 32 64 128 Depth Size (KB) Foot (255x255x255) 0% 1 2 4 8 16 32 64 128 Depth Size (KB) CTdata (255x255x255) NSD : Non-Subdivided OSD : Octree Subdivided (number) : size of sub-volume
Total Bandwidth Comparison #1 for 10 frames NSD : Non Sub-Divided standard method, OSD : Octree-based Sub-Divided method Environment Analysis Texture Pixel Total Bandwidth Frame Rates At best case, 29x reduced! Even worst case, 8Mbytes increased!
Total Bandwidth Comparison #2 for 10 frames Environment Analysis Texture Pixel Total Bandwidth Frame Rates
Rendering Performance Environment Analysis Texture Pixel Total Bandwidth Frame Rates NSD : Non Sub-Divided standard method, OSD : Octree-based Sub-Divided method
& Future work Octree based Bandwidth-Effective Volume Rendering Maximize the temporal & spacial locality of memory access Enable the empty space skipping 2~29x bandwidth reduction 2~6x faster rendering Application Volumetric Effects