Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications SAMSUNG Advanced Institute of Technology Won-Jong Lee, Seok Joong Hwang, Youngsam Shin, Jeong-Joon Yoo, Soojung Ryu
What is ray tracing? Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of its encounters with virtual objects. Naturally represents the global effects such as shadow, reflection, and refraction. - 1 - Image source: Wikipedia
Ray Tracing and Mobile Ray tracing provides a potential rendering technique for future mobile applications that require photorealistic graphics.. - 2 -
Ray tracing for VR Stereo Iray VR (NVIDIA, 2016) Foveated rendering (AMD, 2014) VRWorks Audio (NVIDIA, 2016) - 3 -
Background: Ray Tracing Early desktop CPU/GPU (`00~09) - Packet tracing [Gunther 07][Overbeck 08][Benthin 09] HW Specialization (`02~06) - SarrCor [Schmitter `02], RPU, D-RPU [Woop `05, `06] - Not commericalized Modern GPUs and MICs (`10~) - OptiX [Steven `10], Embree [Wald `14] - For professional graphics Mobile GPU and H/W revisit (`13~present) - SGRT [Lee `13, `15], GR6500 [McCombe `14], RayCore [Nah `14] - Targeted for real-time applications (Game, UX, AR/VR) - 4 -
Background: Ray Tracing Performance & Power Consumption - Future GFX application will require higher resolution (4K>). E.g. VR, high-quality 3D game - 3Grays/s for full RT game engine, 100~250 Mrays /s for hybrid rendering Ray tracing requirements (1080p, 60fps), Techniques vs Ray throughput Reference: PowerVR Graphics Keynote, Imagination Developer Connection 2015-5 -
Motivation: stereoscopic reprojection Project Left eye s ray hit points onto Right eye 2) [Adelson et al. 1992] 4 pixel classification 1) [Badt et al. 1988] - Good, Missed, Overlapped, - Bad : projected but may be occluded GOOD BAD Detect by examining reprojection indices L R L R 1) Badt, Two algorithms taking advantage of temporal coherence in ray tracing Visual Computer, 1988 2) Adelson and Larry, Visible Surface Ray-Tracing of Stereoscopic Images Southeast Regional Conf., 1992-6 -
Goal of this paper Efficiently map the reprojection algorithm onto the existing ray tracing GPUs Stereoscopic ray traced rendering with reprojection method. Except the yellow pixels (indicate bad pixels), the most of the pixels (91.54%) in the right image can be reused with the results of the left image. - 7 -
Proposed Framework
Target platform: a mobile ray tracing GPUs (SGRT) A mobile GPU based on ray tracing, which combines the advantages of programmable DSP cores and a dedicated hardware - T&I Units : fast, compact H/W to accelerate traversal & intersection - SRP : programmable shader core support flexible shading and ray generation High performance features : dual AABB test unit [Lee et al. 2014], reorder buffer [Lee et al. 2015], hybrid number representation [Hwang et al. 2015] Host CPUs Core #1 Core #2 Core #3 Core #4 Intersection Unit Cache(L1) T&I Unit Traversal Traversal Traversal Unit Traversal Unit Unit Unit Cache(L1) Cache(L1) Cache(L1) Cache(L1) Cache(L2) SGRT Core #1 SGRT Core #1 SGRT Core #1 SGRT Core #1 VLIW Engine Internal SRAM SRP Coarse Grained Reconfigurable Array I-Cache C-Mem Texture Unit Cache(L1) Host System BUS AXI System BUS Host DRAM External DRAM - 9 -
SGRT: T&I (Traversal and Intersection) Units Specialized H/W for BVH tree traversal and intersection operation for ray tracing MIMD parallel architecture[lee et al 2013], 2-AABB traversal unit [Lee et al 2014], latency hiding [Shin et al 2015], and hybrid number representation [Hwang et al 2015] T&I unit and shader core in GPU are connected via direct interfaces - 10 -
SGRT: SRP (Samsung Reconfigurable Processor) A flexible architecture template [Lee et al, 2011] ISA such as arithmetic, special function and texture are properly implemented. The VLIW engine useful for GP computations (functions, control flow). The CGRA makes full use of software pipeline technique for loop acceleration. Instruction VLIW DATA Central RF (Register file) FU FU FU FU FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF CGA for ( ) { Loop } for ( ) { Loop } for ( ) { Loop } Control proc Data proc Control proc Data proc Control proc Data proc - 11 -
Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. - 12 -
Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. - 13 -
Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. H/W accelerator (T&I unit) efficiently performed fast ray traversal processing - 14 -
Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. H/W accelerator (T&I unit) efficiently performed fast ray traversal processing Tile based ray tracing - By conducting ray tracing per-tile basis, the G-buffer can be fit into the internal memory, which allows the kernels (reprojection and reusing) to be performed using the on-chip internal SRAM without having to access the external DRAM. (exception, validation kernels) - 15 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. Rays Hit Points DRAM Span G-Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 16 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. Rays Hit Points DRAM Span G-Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 17 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 18 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 19 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 20 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 21 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 22 -
Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 23 -
Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. Tile Color- Buffer Updating tile colors DRAM Span G-Buffer Rays Hit Points Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 24 -
Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 25 -
Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 26 -
Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 27 -
Processing Detail For left image For right image Red: bad pixels Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 28 -
Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 29 -
Evaluation
Experimental Setup Performance and energy simulation model Cycle accurate simulators for SGRT integrated with energy model Energy and power model of [Lee et al. 2015] which utilized a custom model based on the database built with the power values per component from Synopsys PrimTime PX [SYNOPSYS 2016] with SAMSUNG 14nm LPP process technology [Samsung 2016]. Configuration of the T&I unit is the same as [Lee et al 2014] 4 TRV + 1 IST units, 500MHz - 31 -
Experimental Setup Test Application we used five datasets (Figure 1): Teapot (15K triangles), Chess (42K), BMW (55K), Chemical Lab. (98K), Music box (106K), and Provence (600K). Test scenes were all rendered at 2048 x 1024 resolution with enough secondary ray effects. We compared with the standard reference; ray tracing without reprojection in the same hardware platform Teapot (15K triangles) Chess (42K) BMW (55K) Chemical Lab. (98K) Music box (106K) Provence (600K) - 32 -
Reused pixels The results of the stereoscopic rendering for six test scenes* The pixels, marked as yellow, in the Right-image indicates the bad pixels. We could find that most of the pixels (91.54% in average) in Left-image could be reused as shown in the figure. * Intentionally Barrel Distortion Correction filter has not been applied to this rendered scenes so that we would focus on the reprojection effect in the scene. - 33 -
Relative Performance Overall, it achieved up to 1.64 times better performance compared with the reference platform. This is because it can substantially reduce the computing cost of the T&I unit. In terms of the absolute performance, we could obtain 131.3, 14.4, 18.2, 8.6, 28.9 and 44.5 fps for each test scene, respectively. 2.00 1.64x 1.50 1.00 Standard 1.64 1.28 1.13 Reprojected 1.42 1.11 1.51 0.50 0.00 Teapot Chess Musicbox BMW Chemical - 34 - Lab. Provence
Relative Performance Regarding energy consumption, our implementation could reduce up to 20% because it could cut the workloads in the hardware. 1.50 Standard Reprojected 1.00 0.96 1.02 0.98 0.97 0.99 0.80 20% 0.50 0.00 Teapot Chess Musicbox BMW Chemical Provence - 35 - Lab.
Conclusion
Summary In this work, we present a solution to realize ray tracing based stereoscopic rendering utilizing a mobile ray tracing GPU. With the combination of the reprojection and tile-based ray tracing, our approach could be a versatile solution for future VR applications, As it achieves up to 1.64 times better performance and 20% better energy efficiency, compared with the state-of-the-art solution. Future work, Apply more adaptive rendering such as foveated rendering - 37 -
Thank you!