Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Similar documents
Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Realtime Ray Tracing and its use for Interactive Global Illumination

Real-time Ray Tracing on Programmable Graphics Hardware

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Shadows. COMP 575/770 Spring 2013

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Accelerating Ray-Tracing

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Intro to Ray-Tracing & Ray-Surface Acceleration

Enabling immersive gaming experiences Intro to Ray Tracing

Sung-Eui Yoon ( 윤성의 )

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

PowerVR Hardware. Architecture Overview for Developers

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

GeForce4. John Montrym Henry Moreton

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

CS427 Multicore Architecture and Parallel Computing

Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications

Reusing Shading for Interactive Global Illumination GDC 2004

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Scheduling the Graphics Pipeline on a GPU

Ray Tracing on GPUs. Ian Buck. Graphics Lab Stanford University

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Texture

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Interactive Rendering with Coherent Ray Tracing

INFOGR Computer Graphics. J. Bikker - April-July Lecture 10: Ground Truth. Welcome!

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes J. Pantaleoni, L. Fascione, M. Hill, T. Aila

Massive Model Visualization using Real-time Ray Tracing

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Course Recap + 3D Graphics on Mobile GPUs

Hardware-driven visibility culling

Philipp Slusallek Karol Myszkowski. Realistic Image Synthesis SS18 Instant Global Illumination

Streaming Massive Environments From Zero to 200MPH

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Graphics and Imaging Architectures

Comparing Reyes and OpenGL on a Stream Architecture

Topic 12: Texture Mapping. Motivation Sources of texture Texture coordinates Bump mapping, mip-mapping & env mapping

Ray Casting on Programmable Graphics Hardware. Martin Kraus PURPL group, Purdue University

Applications of Explicit Early-Z Culling

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

The Traditional Graphics Pipeline

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Topic 11: Texture Mapping 11/13/2017. Texture sources: Solid textures. Texture sources: Synthesized

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

The Traditional Graphics Pipeline

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Monday Morning. Graphics Hardware

CS580: Ray Tracing. Sung-Eui Yoon ( 윤성의 ) Course URL:

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Topic 11: Texture Mapping 10/21/2015. Photographs. Solid textures. Procedural

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Accelerating Ray Tracing

Spring 2009 Prof. Hyesoon Kim

The Traditional Graphics Pipeline

Row Tracing with Hierarchical Occlusion Maps

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc.

Computer Graphics. - Ray Tracing I - Marcus Magnor Philipp Slusallek. Computer Graphics WS05/06 Ray Tracing I

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

MIT Monte-Carlo Ray Tracing. MIT EECS 6.837, Cutler and Durand 1

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

CS354 Computer Graphics Ray Tracing. Qixing Huang Januray 24th 2017

A Reconfigurable Architecture for Load-Balanced Rendering

Drawing Fast The Graphics Pipeline

Motivation. Sampling and Reconstruction of Visual Appearance. Effects needed for Realism. Ray Tracing. Outline

Ray Intersection Acceleration

Last Time. Reading for Today: Graphics Pipeline. Clipping. Rasterization

Ray Tracing. CPSC 453 Fall 2018 Sonny Chan

Render-To-Texture Caching. D. Sim Dietrich Jr.

Texture Mapping II. Light maps Environment Maps Projective Textures Bump Maps Displacement Maps Solid Textures Mipmaps Shadows 1. 7.

Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow. Takahiro Harada, AMD 2017/3

frame buffer depth buffer stencil buffer

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

For Intuition about Scene Lighting. Today. Limitations of Planar Shadows. Cast Shadows on Planar Surfaces. Shadow/View Duality.

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

LECTURE 10: Improving Memory Access: Direct and Spatial caches

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Ray Genealogy. Raytracing: Performance. Bounding Volumes. Bounding Volume. Acceleration Classification. Acceleration Classification

Intel released new technology call P6P

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Part IV. Review of hardware-trends for real-time ray tracing

To Do. Real-Time High Quality Rendering. Motivation for Lecture. Monte Carlo Path Tracing. Monte Carlo Path Tracing. Monte Carlo Path Tracing

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research

Me Again! Peter Chapman. if it s important / time-sensitive

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Advanced Shading and Texturing

Achieving Real Time Ray Tracing Effects Through GPGPU Acceleration

Real-Time Graphics Architecture

ECE 571 Advanced Microprocessor-Based Design Lecture 20

Interactive Ray Tracing: Higher Memory Coherence

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

NVIDIA Case Studies:

Transcription:

Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Ray Tracing with Tim Purcell 1

Topics Why ray tracing? Interactive ray tracing on multicomputers Ray tracing hardware The cost of ray tracing SHARP architecture Trends Readings Required 1. I. Wald, P. Slusallek, State-of-the-art in interactive ray tracing, Eurographics 2001 2

Ray Tracing Algorithm Point Light Camera P Occluder S T Specular Material T S R S Diffuse Material Diffuse Material Photorealism Direct light simulation vs. approximations True shadows vs. shadow maps True reflections vs. environment maps Interreflections Henrik Wann Jensen Marcos Fajardo 3

Target Implementation Spectrum Specialization Custom hardware ART AR350 Programmable graphics processor NVIDIA/ATI Stream processor MP on a single chip Imagine Smart Memories General Purpose CPU with specialized instructions PC w/ SSE [Saarbrücken RTRT] R10K General purpose CPU Utah IRT Interactive Ray Tracing Embarrassingly parallel Global database Utah IRT on SGI Origin 4

Interactive Ray Tracing Optimized data structures for cache Cluster data structures by access Pad and block (32B) Prefetch triangle data Optimized code for IA w/ SSE Computation Min 78 22 3.5 Max 148 41 3.7 Univ of Saarbrucken IRT Traces 4 rays at a time through BSP tree Range of speeds 200 KR/s to 1.5 MR/s (800 Mhz PIII) Advantages: Output Complexity Lazy evaluation need not touch every triangle Logarithmic search for ray intersection 5

Advantages: Selective Sampling Adaptive sampling Frameless rendering (just-in-time rendering) From Bishop et al., Frameless Rendering, SIGGRAPH 94 Advantages: Selective Sampling Adaptive sampling Frameless rendering (just-in-time rendering) RenderCache 1. Reproject 2. Ray trace in holes From Walter et al., RenderCache, Eurographics 97 6

Advantages: Selective Sampling Adaptive sampling Frameless rendering (just-in-time rendering) RenderCache Holodeck Foveal rendering Rendering under pressure AR250 (newer AR350).35 um, 106 mm 2 die 650K gates 32 single stage IEEE FPUs 50 Mhz 7

ART AR250 Fine grain distribution of rays between AR250s for load balancing DRAM Pipeline-parallel ray geometry engine (permutes multiple geometry and rays) Intersections Geometry Hierarchical Geometry Programmable Shading Co-processor Shader Ray generation & control (CPU) Pixels Geometry & shaders On-chip Data caches Host System Interface Geometry and shaders (broadcast) Host bus interface Pixels (sequential read-back) SHARP Architecture Camera Grid Triangles Lights & Materials Ray Gen Traverser Ray/Vox Intersector Hits Shader Pixels Recirculating streams 8

Eye Ray Generator Computation 10+, 13*, 5/, 3 SPC 100 MIPS R10K 13 VP1.0 Ray { vec3f O; vec3f D; float t; vec2f xy; vec3f weight; byte type; } // 49 bytes Camera Ray Gen Traverser Acceleration data struction Occupancy bitmap Grid No triangle data Ray-voxel grid traversal 3D-DDA (line drawing) Traverser Ray+Voxel Computation Setup 15+, 6*, 9/, 30 CMP Traverse 4+, 3*, 8 CMP 9

Grid Acceleration Structure Reasonable: No clear best accel. structure anyway Optimizations Mailboxing (Doesn t parallelize well, don t use) Blocking (64bits = 4x4x4) Hierarchies (Two-level blocked grid) Inserting triangles into grid Amortize cost for static geometry (display lists) Costly for time-varying geometry Basically 3D rasterization Intersector Ray Triangle Intersection Triangles Möller-Trumbore algorithm Basically point inside triangle Ray+Voxel Intersector Hits Rasterization! Computation 23+, 27*, 1/, 14 CMP Triangle { vec3f P[3]; } // 36 bytes 10

Total Operations Bunny 69,415 Triangles Quake 3 35,468 Triangles Operations (Millions) 180 160 140 120 100 80 60 40 20 0 Operations = Traversals + Intersections 0 0.5 1 1.5 2 Voxels (Millions) Total Instructions Bunny 69,415 Triangles Quake 3 35,468 Triangles Total Op Count (Billions) Operations = 50*Traversals + 200*Intersections 40 35 30 25 20 15 10 5 0 0 0.5 1 1.5 2 Voxels (Millions) 11

Shader Diffuse shading 19+, 21* Lights & Materials Hit { vec3f O, D; vec2f xy; vec3f weight; float t; byte type; long tri; vec2f uv; } // 99 bytes Shadow and secondary rays generated here Weighted rays Hits Shader Pixels I S N R T Cost Model C = R T + S With acceleration C = RV + RT + S l = average number of ray-voxel traversals mean free path of the ray a = average number of triangles per voxel t = l a (average number of ray-triangle intersections) C = R (l Crv + t Crt + Cs) = R [l * (Crv + a Crt) + Cs] l analogous to depth complexity 12

Cost Model Ray-Triangle intersections = r T = t R r = average number of rays tested per triangle t = average number of triangles tested per ray When r < 1, ray tracing wins Two effects Depth complexity Small triangles (might still cost something) Cost Model Deferred Shading R l (Crt + Cs/l) = I d (Cfv + Cs) When Cs large, ray tracing wins 13

Architecture Measurements Camera Grid Triangles 470MB local Lights & Materials Ray Gen Traverser 21Gops + Voxel Intersector 2.4Gops Hits Shader Pixels 282K Triangles Eye Only 1 frame 1024x1024 pixels Architecture Measurements Camera Grid Triangles 1.3GB local Lights & Materials Ray Gen + Traverser Voxel Intersector Hits 1.9Gops 6.2Gops Shader Pixels 1.5M Triangles Eye Only 1 frame 1024x1024 pixels 14

Caching on Smart Memories Cache Policy 4-way set associative LRU replacement System Configuration 32 intersection tiles 1024 triangle (44 KB) cache size each Shared caches Smart Memories Chip Off-chip DRAM Local cache Shared cache Off chip Caching on Smart Memories Cache Policy 4-way set associative LRU replacement System Configuration 32 intersection tiles 1024 triangle (44 KB) cache size each Shared caches Smart Memories Chip Off-chip DRAM Local cache Shared cache Off chip 15

Caching on Smart Memories Cache Policy 4-way set associative LRU replacement System Configuration 32 intersection tiles 1024 triangle (44 KB) cache size each Shared caches Smart Memories Chip Off-chip DRAM Local cache Shared cache Off chip Small and Simple Caching Suffices Over 93% of all triangle references are on chip - stall time < 10% of execution time Pct. Triangle Accesses 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Local cache Shared cache Off chip 16

Architecture Measurements Camera Grid Triangles 470MB local 47MB shared 11MB offchip + Ray Gen Voxel Traverser 21Gops Intersector 2.4Gops Hits Lights & Materials Shader Pixels 282K Triangles Eye Only 1 frame 1024x1024 pixels Architecture Measurements Camera Grid Triangles 1.3GB local 23MB shared 4MB offchip + Ray Gen Traverser Voxel Intersector Hits 1.9Gops 6.2Gops Lights & Materials Shader Pixels 1.5M Triangles Eye Only 1 frame 1024x1024 pixels 17

Performance Analysis Compute limited 3:1 compute to internal bandwidth ratio for intersector 10:1 compute to internal bandwidth ratio for traverser Ray tracing generally thought to be bandwidth limited Tuning Smart Memories for ray tracing Split FP units (64bit 2x 32bit) (2x) Specialized and/or SSE-like instructions (2-5x) Remote data pre-fetching (1.2x) Excellent performance! 32 tiles can perform 1500M intersections/s 30 tiles can perform 5000M traversals/s 100 times faster than PC with SSE Summary and Trends 18

Summary Ray tracing algorithm Embarrassingly parallel Global access to database Ray tracing kernels not unlike pipeline kernels Ray-voxel traversal like line drawing Ray-triangle intersection like rasterization Seems to map well to streaming processor Evolution of the Graphics Pipeline Evolving the graphics pipeline to accommodate RT Hybrid algorithms (1 st pass using conventional pipe) Shadows Ray-traced reflections Full ray tracing Ray casting Monte Carlo path tracing 19

Cross-Over When does ray tracing compete with classic pipeline? High model complexity High depth complexity Small triangle size Efficient acceleration structure High shading complexity Deferred shading wins Exactly when? Convergence of Geometry/Texture Different mechanisms Define and bind textures Render immediate mode geometry Complexity Detail in texture Detail in geometry Static vs. dynamic Textures static Geometry dynamic 20