Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications

Similar documents
Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Part IV. Review of hardware-trends for real-time ray tracing

Enabling immersive gaming experiences Intro to Ray Tracing

Hardware-driven visibility culling

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Course Recap + 3D Graphics on Mobile GPUs

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Review for Ray-tracing Algorithm and Hardware

Introduction to PowerVR Ray Tracing Tuesday 18th March, GDC. James A. McCombe

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

CS427 Multicore Architecture and Parallel Computing

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

PowerVR Hardware. Architecture Overview for Developers

High-Performance Ray Tracing

Computer Architecture

Embree Ray Tracing Kernels: Overview and New Features

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Specialized Acceleration Structures for Ray-Tracing. Warren Hunt

Real Time Ray Tracing

Conemarching in VR. Johannes Saam Mariano Merchante FRAMESTORE. Developing a Fractal experience at 90 FPS. / Framestore

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

PANEL MMEDIA Challenges in Multimedia

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]

Fast BVH Construction on GPUs

Rendering: Reality. Eye acts as pinhole camera. Photons from light hit objects

AN ACCELERATION OF FPGA-BASED RAY TRACER

! Readings! ! Room-level, on-chip! vs.!

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

RAY TRACING FROM A DATA MOVEMENT PERSPECTIVE

Graphics Processing Unit Architecture (GPU Arch)

Real-time ray tracing

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Threading Hardware in G80

Accelerating Realism with the (NVIDIA Scene Graph)

Massive Model Visualization using Real-time Ray Tracing

Multimedia in Mobile Phones. Architectures and Trends Lund

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Benchmark 1.a Investigate and Understand Designated Lab Techniques The student will investigate and understand designated lab techniques.

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

GPGPU on Mobile Devices

ASYNCHRONOUS SHADERS WHITE PAPER 0

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Dual Streaming for Hardware-Accelerated Ray Tracing

Programmable Shaders for Deformation Rendering

PowerVR Series5. Architecture Guide for Developers

Acceleration Data Structures

GeForce4. John Montrym Henry Moreton

A Mobile Accelerator Architecture for Ray Tracing

Comparing Memory Systems for Chip Multiprocessors

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Multi Bounding Volume Hierarchies for Ray Tracing Pipelines

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images

Interactive Stable Ray Tracing

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

Row Tracing with Hierarchical Occlusion Maps

Shadows. COMP 575/770 Spring 2013

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

HOW LEADING-EDGE COMPUTING TECHNOLOGIES ARE HELPING REIMAGINE CITIES OF THE FUTURE. Andrew Rink, AEC Industry Marketing GTC China - November 22, 2018

Performance Analysis and Culling Algorithms

NVIDIA Case Studies:

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Intro to Ray-Tracing & Ray-Surface Acceleration

Lecture 4 - Real-time Ray Tracing

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

NVIDIA DESIGNWORKS Ankit Patel - Prerna Dogra -

Ray Casting of Trimmed NURBS Surfaces on the GPU

POWERVR MBX. Technology Overview

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila

Interactive Ray Tracing: Higher Memory Coherence

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Real-Time Shadows. Last Time? Today. Why are Shadows Important? Shadows as a Depth Cue. For Intuition about Scene Lighting

For Intuition about Scene Lighting. Today. Limitations of Planar Shadows. Cast Shadows on Planar Surfaces. Shadow/View Duality.

SoC for Car Navigation Systems with a 53.3 GOPS Image Recognition Engine

Windowing System on a 3D Pipeline. February 2005

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Ray tracing. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 3/19/07 1

Parallel Computing: Parallel Architectures Jin, Hai

Sung-Eui Yoon ( 윤성의 )

Interpolation using scanline algorithm

Lecture 25: Board Notes: Threads and GPUs

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

Transcription:

Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications SAMSUNG Advanced Institute of Technology Won-Jong Lee, Seok Joong Hwang, Youngsam Shin, Jeong-Joon Yoo, Soojung Ryu

What is ray tracing? Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of its encounters with virtual objects. Naturally represents the global effects such as shadow, reflection, and refraction. - 1 - Image source: Wikipedia

Ray Tracing and Mobile Ray tracing provides a potential rendering technique for future mobile applications that require photorealistic graphics.. - 2 -

Ray tracing for VR Stereo Iray VR (NVIDIA, 2016) Foveated rendering (AMD, 2014) VRWorks Audio (NVIDIA, 2016) - 3 -

Background: Ray Tracing Early desktop CPU/GPU (`00~09) - Packet tracing [Gunther 07][Overbeck 08][Benthin 09] HW Specialization (`02~06) - SarrCor [Schmitter `02], RPU, D-RPU [Woop `05, `06] - Not commericalized Modern GPUs and MICs (`10~) - OptiX [Steven `10], Embree [Wald `14] - For professional graphics Mobile GPU and H/W revisit (`13~present) - SGRT [Lee `13, `15], GR6500 [McCombe `14], RayCore [Nah `14] - Targeted for real-time applications (Game, UX, AR/VR) - 4 -

Background: Ray Tracing Performance & Power Consumption - Future GFX application will require higher resolution (4K>). E.g. VR, high-quality 3D game - 3Grays/s for full RT game engine, 100~250 Mrays /s for hybrid rendering Ray tracing requirements (1080p, 60fps), Techniques vs Ray throughput Reference: PowerVR Graphics Keynote, Imagination Developer Connection 2015-5 -

Motivation: stereoscopic reprojection Project Left eye s ray hit points onto Right eye 2) [Adelson et al. 1992] 4 pixel classification 1) [Badt et al. 1988] - Good, Missed, Overlapped, - Bad : projected but may be occluded GOOD BAD Detect by examining reprojection indices L R L R 1) Badt, Two algorithms taking advantage of temporal coherence in ray tracing Visual Computer, 1988 2) Adelson and Larry, Visible Surface Ray-Tracing of Stereoscopic Images Southeast Regional Conf., 1992-6 -

Goal of this paper Efficiently map the reprojection algorithm onto the existing ray tracing GPUs Stereoscopic ray traced rendering with reprojection method. Except the yellow pixels (indicate bad pixels), the most of the pixels (91.54%) in the right image can be reused with the results of the left image. - 7 -

Proposed Framework

Target platform: a mobile ray tracing GPUs (SGRT) A mobile GPU based on ray tracing, which combines the advantages of programmable DSP cores and a dedicated hardware - T&I Units : fast, compact H/W to accelerate traversal & intersection - SRP : programmable shader core support flexible shading and ray generation High performance features : dual AABB test unit [Lee et al. 2014], reorder buffer [Lee et al. 2015], hybrid number representation [Hwang et al. 2015] Host CPUs Core #1 Core #2 Core #3 Core #4 Intersection Unit Cache(L1) T&I Unit Traversal Traversal Traversal Unit Traversal Unit Unit Unit Cache(L1) Cache(L1) Cache(L1) Cache(L1) Cache(L2) SGRT Core #1 SGRT Core #1 SGRT Core #1 SGRT Core #1 VLIW Engine Internal SRAM SRP Coarse Grained Reconfigurable Array I-Cache C-Mem Texture Unit Cache(L1) Host System BUS AXI System BUS Host DRAM External DRAM - 9 -

SGRT: T&I (Traversal and Intersection) Units Specialized H/W for BVH tree traversal and intersection operation for ray tracing MIMD parallel architecture[lee et al 2013], 2-AABB traversal unit [Lee et al 2014], latency hiding [Shin et al 2015], and hybrid number representation [Hwang et al 2015] T&I unit and shader core in GPU are connected via direct interfaces - 10 -

SGRT: SRP (Samsung Reconfigurable Processor) A flexible architecture template [Lee et al, 2011] ISA such as arithmetic, special function and texture are properly implemented. The VLIW engine useful for GP computations (functions, control flow). The CGRA makes full use of software pipeline technique for loop acceleration. Instruction VLIW DATA Central RF (Register file) FU FU FU FU FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF FU RF CGA for ( ) { Loop } for ( ) { Loop } for ( ) { Loop } Control proc Data proc Control proc Data proc Control proc Data proc - 11 -

Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. - 12 -

Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. - 13 -

Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. H/W accelerator (T&I unit) efficiently performed fast ray traversal processing - 14 -

Overview: stereoscopic reprojection rendering on SGRT This is the first demonstration of stereoscopic rendering utilizing mobile ray tracing GPU for VR applications. SGRT efficiently supports flexible real-time ray tracing by combining the advantages of the hardware and the software Thus, easily added new software kernels Reprojection, Validation, and Reuse. H/W accelerator (T&I unit) efficiently performed fast ray traversal processing Tile based ray tracing - By conducting ray tracing per-tile basis, the G-buffer can be fit into the internal memory, which allows the kernels (reprojection and reusing) to be performed using the on-chip internal SRAM without having to access the external DRAM. (exception, validation kernels) - 15 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. Rays Hit Points DRAM Span G-Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 16 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. Rays Hit Points DRAM Span G-Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 17 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 18 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 19 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 20 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 21 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 22 -

Processing Detail For left image For right image L R Shader Core (SRP) Ray Generation Shading Re-projection Internal Mem. DRAM Ray tracing H/W (T&I Unit) Rays Hit Points Traversal & Intersection Color Normal Texcoord Position Span G-Buffer - 23 -

Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. Tile Color- Buffer Updating tile colors DRAM Span G-Buffer Rays Hit Points Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 24 -

Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 25 -

Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 26 -

Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 27 -

Processing Detail For left image For right image Red: bad pixels Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 28 -

Processing Detail For left image For right image Shader Core (SRP) L R Validation Test Yes No Bad Pixel? Ray Generation Re-use Shading Internal Mem. DRAM Span G-Buffer Prefetching a row (by DMA) Rays Hit Points Tile Color- Buffer Updating tile colors Frame Buffer Ray tracing H/W (T&I Unit) Traversal & Intersection - 29 -

Evaluation

Experimental Setup Performance and energy simulation model Cycle accurate simulators for SGRT integrated with energy model Energy and power model of [Lee et al. 2015] which utilized a custom model based on the database built with the power values per component from Synopsys PrimTime PX [SYNOPSYS 2016] with SAMSUNG 14nm LPP process technology [Samsung 2016]. Configuration of the T&I unit is the same as [Lee et al 2014] 4 TRV + 1 IST units, 500MHz - 31 -

Experimental Setup Test Application we used five datasets (Figure 1): Teapot (15K triangles), Chess (42K), BMW (55K), Chemical Lab. (98K), Music box (106K), and Provence (600K). Test scenes were all rendered at 2048 x 1024 resolution with enough secondary ray effects. We compared with the standard reference; ray tracing without reprojection in the same hardware platform Teapot (15K triangles) Chess (42K) BMW (55K) Chemical Lab. (98K) Music box (106K) Provence (600K) - 32 -

Reused pixels The results of the stereoscopic rendering for six test scenes* The pixels, marked as yellow, in the Right-image indicates the bad pixels. We could find that most of the pixels (91.54% in average) in Left-image could be reused as shown in the figure. * Intentionally Barrel Distortion Correction filter has not been applied to this rendered scenes so that we would focus on the reprojection effect in the scene. - 33 -

Relative Performance Overall, it achieved up to 1.64 times better performance compared with the reference platform. This is because it can substantially reduce the computing cost of the T&I unit. In terms of the absolute performance, we could obtain 131.3, 14.4, 18.2, 8.6, 28.9 and 44.5 fps for each test scene, respectively. 2.00 1.64x 1.50 1.00 Standard 1.64 1.28 1.13 Reprojected 1.42 1.11 1.51 0.50 0.00 Teapot Chess Musicbox BMW Chemical - 34 - Lab. Provence

Relative Performance Regarding energy consumption, our implementation could reduce up to 20% because it could cut the workloads in the hardware. 1.50 Standard Reprojected 1.00 0.96 1.02 0.98 0.97 0.99 0.80 20% 0.50 0.00 Teapot Chess Musicbox BMW Chemical Provence - 35 - Lab.

Conclusion

Summary In this work, we present a solution to realize ray tracing based stereoscopic rendering utilizing a mobile ray tracing GPU. With the combination of the reprojection and tile-based ray tracing, our approach could be a versatile solution for future VR applications, As it achieves up to 1.64 times better performance and 20% better energy efficiency, compared with the state-of-the-art solution. Future work, Apply more adaptive rendering such as foveated rendering - 37 -

Thank you!