Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

Similar documents
Enabling immersive gaming experiences Intro to Ray Tracing

Design for Parallel Interactive Ray Tracing Systems

Massive Model Visualization using Real-time Ray Tracing

Design for Parallel Interactive Ray Tracing Systems

PowerVR Hardware. Architecture Overview for Developers

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Interactive Ray Tracing: Higher Memory Coherence

Comparison of hierarchies for occlusion culling based on occlusion queries

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Acceleration Data Structures

Accelerating Ray Tracing

Row Tracing with Hierarchical Occlusion Maps

Real Time Ray Tracing

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Lecture 4 - Real-time Ray Tracing

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

Ray Tracing Performance

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

Interactive Isosurface Ray Tracing of Large Octree Volumes

Shadows. COMP 575/770 Spring 2013

Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Computer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I

CS452/552; EE465/505. Clipping & Scan Conversion

Accelerating Realism with the (NVIDIA Scene Graph)

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Software Occlusion Culling

Hardware-driven visibility culling

Lecture 25: Board Notes: Threads and GPUs

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Lecture 2 - Acceleration Structures

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Computer Graphics. - Spatial Index Structures - Philipp Slusallek

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Rendering Grass with Instancing in DirectX* 10

Computer Graphics. - Ray-Tracing II - Hendrik Lensch. Computer Graphics WS07/08 Ray Tracing II

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Ray Tracing. Outline. Ray Tracing: History

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Working with Metal Overview

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Deep Coherent Ray Tracing

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Pipeline Operations. CS 4620 Lecture 14

CS427 Multicore Architecture and Parallel Computing

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Parallel Programming on Larrabee. Tim Foley Intel Corp

CS4620/5620: Lecture 14 Pipeline

Sung-Eui Yoon ( 윤성의 )

The Rasterization Pipeline

Building a Fast Ray Tracer

The Traditional Graphics Pipeline

Realtime Ray Tracing and its use for Interactive Global Illumination

Wed, October 12, 2011

! Readings! ! Room-level, on-chip! vs.!

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

GeForce4. John Montrym Henry Moreton

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

PowerVR Series5. Architecture Guide for Developers

Drawing Fast The Graphics Pipeline

Introduction to Parallel Programming Models

Evaluation and Improvement of GPU Ray Tracing with a Thread Migration Technique

The Traditional Graphics Pipeline

Motivation. Sampling and Reconstruction of Visual Appearance. Effects needed for Realism. Ray Tracing. Outline

Visible-Surface Detection Methods. Chapter? Intro. to Computer Graphics Spring 2008, Y. G. Shin

Embree Ray Tracing Kernels: Overview and New Features

Accelerating Shadow Rays Using Volumetric Occluders and Modified kd-tree Traversal

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Ray Intersection Acceleration

Practical Techniques for Ray Tracing in Games. Gareth Morgan (Imagination Technologies) Aras Pranckevičius (Unity Technologies) March, 2014

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

Graphics Processing Unit Architecture (GPU Arch)

Blue-Steel Ray Tracer

Part IV. Review of hardware-trends for real-time ray tracing

Proposal for an EG2006 Tutorial (full day) Real-time, Interactive Massive Model Visualization

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Lecture 12: Advanced Rendering

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

RTfact: Concepts for Generic and High Performance Ray Tracing

The Traditional Graphics Pipeline

Lecture 11 - GPU Ray Tracing (1)

Last Time. Reading for Today: Graphics Pipeline. Clipping. Rasterization

Transcription:

Ray Tracing with Multi-Core/Shared Memory Systems Abe Stephens Real-time Interactive Massive Model Visualization Tutorial EuroGraphics 2006. Vienna Austria. Monday September 4, 2006 http://www.sci.utah.edu/~abe/massive06/

Overview Reasons for ray tracing massive models. Examine Multi-Core/Processor today. Types of parallelism to look for. Considerations for ray tracing. Describe Manta general purpose interactive ray tracing architecture. Miscellaneous Issues. Simple parallel practices to adopt. Remote/Collaborative Visualization.

Massive Model Ray Tracing Basic ray tracing shading and intersection technique shared with serial renderers. No precomputed visibility allows for transparent rendering, hiding or culling geometry on the fly. Largest datasets still well incore on moderate sized machines. Parallel systems available on the desktop! Tanker dataset courtesy Northrup Grumman Newport News Ship Building. Boeing 777 dataset courtesy The Boeing Company. All images rendered in Manta Interactive Ray Tracer

Ray Tracing in a nutshell Rasterization uses projection Find closest intersection to image along a ray.

Ray Tracing in a nutshell 1. Find closest intersection 2. Invoke material shader on hit point. Send shadow rays. Send secondary rays. Repeat. 3. Return sample color. Easy to change visibility. Easy shading effects. For example: Transparency

Example Transparency

Transparency is easy in a ray tracer! One option: Find closest intersection. Shoot secondary ray. Find next intersection. Repeat. Blend shaded samples.

Transparency Find the first n intersection points. Sort and blend samples. (n depends on alpha) Sorting is necessary since triangles won t be intersected in order. (Each kdtree leaf contains several triangles.)

Other techniques? Along the ray transparency is only one example. Other shading effects are possible with secondary rays, for example ambient occlusion.

Ambient Occlusion Lambertian w/ Shadows Ambient Occlusion increases contrast In areas of fine detail. Ambient Occlusion

Why multi-core systems? Large amount of processors and memory. The same system used for scientific computing and visualization. Becoming smaller and cost less. Faster multi-core clusters require fewer nodes. 16 core Opteron system. (top) 16 processor SGI Itanium (half rack).

Different types of parallelism Per thread: Instruction Level Parallelism. Many instructions from one thread. (SWP) SIMD x2 or x4 or??? Multiple threads per core. Hyper threading. Simultaneous issue. Multiple threads. Switch at stall. Multiple cores per processor. Shared cache. Multiple processors per board. Shared main memory.

Multi-Processor Systems Today Clusters Independent operating systems. Separate hardware. (possibly) Less expensive. Explicit message passing/custom protocols Single System Image Operating system manages all processors. Explicit or automatic control possible. Shared memory used for communication.

Single System Image. Multi-Processor External Interconnect. e.g. SGI: ccnuma Many rack mounted devices, one OS. Multi-Processor Board-to-Board Interconnect. e.g. AMD Hyper-Transport. Other devices connect to HT network. Dense Multi-Core 1 or 2 processors with many cores each. Multiple threads per core. Possible future direction.

Example multi-processor systems for ray tracing. SGI Prism / Itanium2 AMD HyperTransport / Opteron Computation Brick Memory Memory CPU CPU SHUB SHUB CPU CPU Memory CPU CPU Memory Memory CPU CPU Memory

Example multi-processor systems for ray tracing. Dense Multi-Core Memory AMD HyperTransport / Opteron CPU CPU Memory CPU Memory CPU CPU CPU Memory How will it scale with traditional MP workloads? Memory

What can we implement for Massive Model vis? Hiding Objects Massive Model Vis: Cutting Planes Hiding Objects Transparency Cutting Planes All with hundreds of millions of triangles. Users employed cutting planes and object hiding to locate a certain region of the model, then adjusted opacity to examine fine details and occluded structures.

Application Scenario Quality Engineers use ray tracer to visualize problems with aircraft assembly. A. Stephens, S. Boulos, J. Bigler, I. Wald, and S. G. Parker An Application of Scalable Massive Model Interaction using Shared Memory Systems Proceedings of the Eurographics Symposium on Parallel Graphics and Visualization, 2006

Manta Interactive Ray Tracer Platform for implementing different ray tracer applications. Take advantage of modern multi-core processors. Multi-threads. Single thread performance derived from specific instruction stream optimizations. - SIMD, special cases, etc.

Design Philosophy Obtain thread scalability: Parallel Pipeline, communication constraints, state update w/ transactions. Facilities for good single thread performance: SIMD data layout. Wide ray packets, w/ properties. Lazy evaluation & special case code. Build acceleration structures & shaders on top of these two goals.

More than just triangles. Modular Design Allows Manta to be embedded in other programs. Supports multiple primitives: Massive triangle models. Massive volumes. Sphere glyph (MPM) rendering. Python front-end Open Source Highly Portable Material Point Method Dataset

Pipeline and Rendering Stack 1. Modular components vary by application. 2. Synchronized multi-thread pipeline 3. Asynchronous single thread rendering stack. 4. Scene intersection on top of stack. Thread n Renderer Pixel Sampler Image Traverser... Thread 0

Parallel Pipeline Manta Pipeline Modular and extensible components. Transaction state changes applied each stage. Barrier synchronization between stages. Image Display Thread n... Thread 0 Frame Setup Ray Tracing Pipeline Barrier Transactions

Parallel Pipeline Display of previous frame. Thread n... Thread 0 Display frame i-1 Thread 0 calls opengl. All others return immediately.

Parallel Pipeline Thread n ray tracing... Thread 0 While thread 0 is displaying frame i-1: All other threads start rendering frame i. Thread 0 joins as soon as it finishes image display.

Parallel Pipeline Thread n... Thread 0 Load balance responsible for even work distribution All threads synchronize at barrier.

Parallel Pipeline Thread n... Thread 0 Display frame I Repeat! Tasks scheduled by category: Inherently balanced. Imbalanced. Actively load balanced.

Manta Pipeline Implementation 1. One thread per core (if non-hyper threaded). 2. Stages in pipeline differentiated by subfunctions. 3. Stages ordered by load balance characteristics. 4. One pipeline barrier. 5. Additional synchronization for transactions. void Pipeline::inner_loop( int frame, int proc, int numprocs ) { // Global synchronization. pipeline_barrier.waitfor( numprocs ); // Inherently load balanced. parallel_animation_callbacks(); // Imbalanced. if (proc == display_proc) image_display-> displayimage( buffer[frame-1] ); // Dynamically balanced. image_traverser-> render_image( buffer[frame], proc ); }

Implementing Transactions Transactions permit manta to invoke a method in a foreign object to change some state at a safe time. For example, using python: def OnAutoView(self, event): # Called by python thread on GUI event. # Add the transaction. Unsafe to actually change or access renderer state! self.engine.addtransaction("auto view", manta_new(createmantatransaction(self.mantaautoview,() ))) def MantaAutoView(self): # Called by Manta thread. # Find the bounding box of the scene. bounds = BBox(); self.engine.getscene().getobject().computebounds( PreprocessContext(), bounds ); # Invoke the autoview method. channel = 0 self.engine.getcamera(channel).autoview( bounds )

Wide Ray Packets Contain: Ray origin, direction and hit info. Packet properties for special case, lazy, etc. Data layout for SIMD w/ vertical and horizontal accessors. Packets are containers for data used to: Perform intersection. Store & access info about intersection.

Manta Scaling 128 p 1.6 Ghz Itanium2 92% linear at 64p 82% at 126p Resolution 1024x768

Need for Remote/Collaborative Rendering Cost of a system usually justified by multiple users. Collaborative visualization allows many users to interact with a large dataset, either locally or remotely.

Load Balancing Coarse Load Balancing. Choose a strategy for assigning tiles to threads. Implement it as efficient as possible (hw intrinsics)

Load Balancing Fine-grained Task Assignment Share read data, avoid common write data. Complications: Lazy evaluation policies, Multi-thread/core scheduling. How does ray coherence effect each level of parallelism for read/write?

Acceleration Structure Build Parallel KD-Tree build. Strategies for offline full SAH build - Multi-thread sorting and merging. - Evaluate split candidates in parallel. - Build sub-trees in parallel. Reduced 777 build time from one day to several hours. Recent approaches vary heuristic & parallelization with tree depth.

Parallel Ray Tracing Practices Eliminate high-level bottlenecks Synchronous display. Causes other threads to block! Easy solution: Pipeline display. Shared read/write data structures in high performance code. Lazy acceleration update. Adaptive sampling structure. Producer/Consumer queues.

Parallel Ray Tracing Practices Shared read/write data structures. Update during own pipeline stage. Per-thread copy of structure. Several threads share copy in neighborhood. Choice depends on application. Unsafe practice is not an option.

Bottom Line To achieve scalable multi-thread performance: Use a parallel pipeline with limited synchronization points. (transactions) Use asynchronous display. Optimize for single processor performance. Use packet properties for instruction optimization. Use at least naïve sub-tree parallel kdtree build.

Questions? A. Stephens, S. Boulos, J. Bigler, I. Wald, and S. G. Parker An Application of Scalable Massive Model Interaction using Shared Memory Systems Proceedings of the Eurographics Symposium on Parallel Graphics and Visualization, 2006 J. Bigler, A. Stephens, S. G. Parker. Design for Parallel Interactive Ray Tracing Systems. Scientific Computing and Imaging Institute, University of Utah. Technical Report No UUSCI-2006-027. (submitted) This work is supported by: U.S. Department of Energy through the Center for the Simulation of Accidental Fires and Explosions, under grant W-7405-ENG-48 Utah Center of Excellence for Interactive Ray-Tracing and Photo Realistic Visualization. National Science Foundation. Additional support through internships: Silicon Graphics Inc. Intel Corporation Additional resources: http://www.sci.utah.edu/~abe/