Blue-Steel Ray Tracer

Similar documents
Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology

CS758: Multicore Programming

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Parallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Part IV. Review of hardware-trends for real-time ray tracing

Outline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities

Enabling immersive gaming experiences Intro to Ray Tracing

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Spring 2012)

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Ray Tracing. Outline. Ray Tracing: History

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Effects needed for Realism. Computer Graphics (Fall 2008) Ray Tracing. Ray Tracing: History. Outline

Today. Rendering algorithms. Rendering algorithms. Images. Images. Rendering Algorithms. Course overview Organization Introduction to ray tracing

Lighting. To do. Course Outline. This Lecture. Continue to work on ray programming assignment Start thinking about final project

high performance medical reconstruction using stream programming paradigms

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Today. Rendering algorithms. Rendering algorithms. Images. Images. Rendering Algorithms. Course overview Organization Introduction to ray tracing

Motivation. Sampling and Reconstruction of Visual Appearance. Effects needed for Realism. Ray Tracing. Outline

REAL-TIME GPU PHOTON MAPPING. 1. Introduction

Project Plan for May Interactive Ray Tracer on the PlayStation 3. Brendan Campbell, Daniel Risse, Aaron Westphal, Sean Godinez

CS 465 Program 5: Ray II

Parallel Computing: Parallel Architectures Jin, Hai

03 RENDERING PART TWO

CS427 Multicore Architecture and Parallel Computing

Ray Tracing with Multi-Core/Shared Memory Systems. Abe Stephens

CS 4620 Program 4: Ray II

Shadows. COMP 575/770 Spring 2013

Sung-Eui Yoon ( 윤성의 )

Spring 2011 Prof. Hyesoon Kim

For Intuition about Scene Lighting. Today. Limitations of Planar Shadows. Cast Shadows on Planar Surfaces. Shadow/View Duality.

Ray tracing. EECS 487 March 19,

A Reconfigurable Architecture for Load-Balanced Rendering

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

PowerVR Hardware. Architecture Overview for Developers

Ray Tracing. Kjetil Babington

Intro to Ray-Tracing & Ray-Surface Acceleration

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Last Time. Why are Shadows Important? Today. Graphics Pipeline. Clipping. Rasterization. Why are Shadows Important?

Photorealism: Ray Tracing

Global Rendering. Ingela Nyström 1. Effects needed for realism. The Rendering Equation. Local vs global rendering. Light-material interaction

Massive Model Visualization using Real-time Ray Tracing

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Real-Time Shadows. Last Time? Textures can Alias. Schedule. Questions? Quiz 1: Tuesday October 26 th, in class (1 week from today!

Computer Graphics and Image Processing Ray Tracing I

Lecture 25: Board Notes: Threads and GPUs

Practical 2: Ray Tracing

Production. Visual Effects. Fluids, RBD, Cloth. 2. Dynamics Simulation. 4. Compositing

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Ray Tracing on the Cell Processor

Computer Graphics. Si Lu. Fall uter_graphics.htm 11/22/2017

MITOCW MIT6_172_F10_lec18_300k-mp4

Scalable multi-gpu cloud raytracing with OpenGL

Introduction to Computing and Systems Architecture

! Readings! ! Room-level, on-chip! vs.!

CONSOLE ARCHITECTURE

Computer Graphics. Lecture 14 Bump-mapping, Global Illumination (1)

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc.

Last week. Machiraju/Zhang/Möller

Assignment 6: Ray Tracing

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Rendering. Mike Bailey. Rendering.pptx. The Rendering Equation

and Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Advanced Shading I: Shadow Rasterization Techniques

Advanced Graphics. Path Tracing and Photon Mapping Part 2. Path Tracing and Photon Mapping

Review for Ray-tracing Algorithm and Hardware

Computer Architecture

Last Time. Reading for Today: Graphics Pipeline. Clipping. Rasterization

Portland State University ECE 588/688. Graphics Processors

COMP371 COMPUTER GRAPHICS

Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow. Takahiro Harada, AMD 2017/3

Graphics Processing Unit Architecture (GPU Arch)

CPSC GLOBAL ILLUMINATION

Ray Tracing CSCI 4239/5239 Advanced Computer Graphics Spring 2018

The Rasterization Pipeline

Computer Graphics. Lecture 13. Global Illumination 1: Ray Tracing and Radiosity. Taku Komura

Application-Platform Mapping in Multiprocessor Systems-on-Chip

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Parallel Programming on Larrabee. Tim Foley Intel Corp

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

CMSC427 Advanced shading getting global illumination by local methods. Credit: slides Prof. Zwicker

Optimisation. CS7GV3 Real-time Rendering

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Supplement to Lecture 16

Lecture 1. Computer Graphics and Systems. Tuesday, January 15, 13

Massively Parallel Architectures

Working with Metal Overview

Ray Tracing. CSCI 420 Computer Graphics Lecture 15. Ray Casting Shadow Rays Reflection and Transmission [Ch ]

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Practical Techniques for Ray Tracing in Games. Gareth Morgan (Imagination Technologies) Aras Pranckevičius (Unity Technologies) March, 2014

Lecture 8. The StreamIt Language IAP 2007 MIT

6.189 IAP Lecture 3. Introduction to Parallel Architectures. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine. David A. Bader, Virat Agarwal

Texture Mapping II. Light maps Environment Maps Projective Textures Bump Maps Displacement Maps Solid Textures Mipmaps Shadows 1. 7.

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

MIT OpenCourseWare Multicore Programming Primer, January (IAP) Please use the following citation format:

Transcription:

MIT 6.189 IAP 2007 Student Project Blue-Steel Ray Tracer Natalia Chernenko Michael D'Ambrosio Scott Fisher Russel Ryan Brian Sweatt Leevar Williams Game Developers Conference March 7 2007 1

Imperative Need for Parallel Programming Education The Software Crisis To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem." -- E. Dijkstra, 1972 Turing Award Lecture 2

Multicores are Here # of cores 512 256 128 64 32 16 8 4 2 1 4004 8008 8080 8086 286 386 486 Pentium P2 P3 Raw Picochip PC102 Niagara Cisco CSR-1 Intel Tflops Raza XLR Cavium Octeon Cell Opteron 4P Boardcom 1480 Xeon MP Xbox360 PA-8800 Opteron Tanglewood Power4 PExtreme Power6 Yonah Itanium P4 Athlon Itanium 2 Ambric AM2045 1970 1975 1980 1985 1990 1995 2000 2005 20?? 3

Teaching Parallel Programming Prof. Saman Amarasinghe (MIT) and Dr. Rodric Rabbah (IBM) Month long intensive course for lectures, recitations, and labs Sponsored by Sony, Toshiba and IBM Technical support from Sony, IBM, Terra Soft Course outcomes Know fundamental concepts of parallel programming (both hardware and software) Understand issues of parallel performance Able to synthesize a fairly complex parallel program Hands-on experience with the Cell processor Sony PS3 consoles running YDL (Yellow Dog Linux) IBM Cell SDK from developerworks 4

Learning From Student Perspective Fun and challenging context attracted many students Using PS3s as the platform for student projects Programming the new Cell processor "PS3 attracted me but hearing about the future of parallel programming kept me around." student quote 5

Class Project Competition 7 ambitious projects Ray Tracer Global Illumination Linear Algebra Pack Molecular Dynamics Simulator Speech Synthesizer Soft Radio Backgammon Tutor Presentation, including performance results available online /competition.shtml Some source code will also be published 6

Our Project: Ray-Tracer Blue-Steel 7

The Idea: Realistic Graphics A Solution to the rendering equation Triangle Rasterization Fast possible in real time on a single core Inaccurate or tedious for global effects such as shadows, reflection, refraction, or global illumination Start with speed, try to get realism Ray Tracing Slow unless done on multiple cores Accurate and natural shadows, reflection, and refraction Start with realism, try to get speed 8

The Idea: Realistic Graphics Real time rasterization is done all the time! Instead, build a fast ray tracer from the ground up to take advantage of multiple cores. PS3 is perfect 6 accessible cores for rendering Fast XDR ram for transferring scene data / frames Practically a GPU on its own no need for additional hardware CP U PPE SPE SPE SPE SPE SPE SPE Frame Buffer GPU Frame Buffer GPU Modern graphics w/ GPU Without GPU, using Blue-Steel 9

Ray Tracing Shoot a ray through each pixel on the screen Check for intersections with each object in the scene Keep the closest intersection 10

Ray Tracing Shade each point according to the material of the object, as well as the lights in the scene Stopping at this level achieves traditional scan-line rasterization quality 11

Ray Tracing Cast rays for shadows, reflection, and refraction Recursive rays are processed identically to primary rays Framework for global effects is built into ray tracing by design 12

Ray Tracing on the PS3 Design Challenges Bandwidth & latency of PPE / SPE communication Mailboxes can only hold 128 bits at a time Limited size of local store 256 KB for program, execution stack, scene, and frame data DMA latency Two orders of magnitude slower than local store 13

Ray Tracing on the PS3 Design Challenges Inherent SIMD architecture of SPE Scalar code like most code today is expensive No Branch Prediction 'if' statements and loops are costly Load-Balancing Splitting up computation so as to minimize communication / computation overhead 14

Ray Tracing on the PS3 High level design Clump a set of SPEs together as one rendering engine Each SPE holds a full set of scene data Each SPE renders only part of the scene Run a full ray tracer on every SPE Engine has a set of instructions just like any processor Instructions are sent to this engine using SPE mailboxes SPE-centric framework Each SPE has knowledge of what work it must do, PPE tells it what to render only at the start of the process 15

Ray Tracing on the PS3 Tackling the Challenges Bandwidth & latency of PPE / SPE communication SPE-centric framework No need for communication during the rendering process Limited size of local store Pack data efficiently in vectors Split scene into chunks that can be stored one at a time DMA latency Hide latency through double-buffering Work on one type of object while transferring another 16

Ray Tracing on the PS3 Tackling the Challenges No branch prediction Only 3 explicit 'if' statements in code Have compiler unroll loops Inherent SIMD architecture of SPE View everything as packets, work on 4 at a time Load Balancing Have each SPE render every sixth line of the screen 17

Issues During Implementation Heterogeneous architecture SPU and PPU have different instruction sets Two versions of many objects needed to be implemented: one optimized for the PPU and one for the SPU Lack of effective debugging tools Many threads running on different cores no convenient means of viewing everything 18

Issues During Implementation Physics Engine Third-party ODE used Peculiarities in representation of object positions Difficult to kill built-in OpenGL visualization Integration Physics representation vs. rendering representation 19

Issues During Implementation Time! 4 weeks dedicated to project 1 week for planning Streaming computation or full computation on each SPE? Scene fitting in local store Software cache, or other means? 20

Issues During Implementation Time! 4 weeks dedicated to project 3 weeks for coding Many options could not be explored in-depth Simple algorithms chosen over more complex, yet faster ones Dropping parts of initial plan to meet deadline Static, rather than dynamic load balancing Spatial index structure Full scale game with real-time physics done on PPU Other primitives: cylinder, box Larger packets to reduce data dependency stalls 21

Performance Analysis Exact linear speed increase in number of SPEs Test scenes Textured crystal ball: stresses bump mapping / global effects Spotlight: Stresses scene/shading complexity, scene visibility Speed (fps) 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Time Scaling With # of SPEs 1 2 3 4 5 6 Number of SPEs Textured Spotlight 22

Performance Analysis Scalability in object complexity Time Scaling in Object Complexity Render Time (ms) 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 5 25 45 65 85 105 No Shading W/Shading 23

Performance Analysis Scalability in shader complexity Small, constant performance hit for simple shading ~20 ms, constant performance hit for procedural shaders OpenGL-like graphics at ~50 fps 45 Time Scaling in Shader Complexity Render Time (ms) 40 35 30 25 20 15 10 5 Pong 0 none simple shadows 1 bounce 2 bounces 24

Performance Analysis Optimizations Hand-tuning C code to eliminate dependencies Despite compiler optimizations, hand-tuned triangle intersection routine saved ~20ms on complex scenes 25

Performance Analysis Optimizations AOS packing for storage, SOA for computation Goal: Fit as many objects in 16KB (one DMA transfer) as possible 26

Performance Analysis Optimizations SOA for packets Utilizes full space of four element vector register Perform 3 operations on data, rather than 4 27

Performance Analysis Optimizations Approximations No recursion if past threshold depth Assume a shadow if light contribution is less than threshold Dummy Functions to assure shaders aren't run twice for the same ray 28

Questions? 29