Physics Parallelization. Erwin Coumans SCEA

Similar documents
Anatomy of a Physics Engine. Erwin Coumans

A Different Approach for Continuous Physics. Vincent ROBERT Physics Programmer at Ubisoft

Continuous Collision Detection and Physics

Multi Agent Navigation on GPU. Avi Bleiweiss

What is a Rigid Body?

Lecture 16. Introduction to Game Development IAP 2007 MIT

GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD

Spatial Data Structures

Real-Time Physics Simulation. Alan Hazelden.

Spatial Data Structures

Intersection Acceleration

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Announcements. Written Assignment2 is out, due March 8 Graded Programming Assignment2 next Tuesday

Developing Technology for Ratchet and Clank Future: Tools of Destruction

Collision and Proximity Queries

SPU Shaders. Mike Acton Engine Director Insomniac Games

SPU Render. Arseny Zeux Kapoulkine CREAT Studios

Insomniac Physics. Eric Christensen GDC 2009

Spatial Data Structures

Rigid Body Simulation. Jeremy Ulrich Advised by David Mount Fall 2013

Spatial Data Structures

Collision Detection based on Spatial Partitioning

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

CPSC / Sonny Chan - University of Calgary. Collision Detection II

Web Physics: A Hardware Accelerated Physics Engine for Web- Based Applications

Spatial Data Structures

Comparing Memory Systems for Chip Multiprocessors

Introduction to Parallel Programming Models

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Simulation in Computer Graphics Space Subdivision. Matthias Teschner

Dynamic Bounding Volume Hierarchies. Erin Catto, Blizzard Entertainment

Ray Intersection Acceleration

Acceleration Data Structures

About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.

Computer Graphics. - Spatial Index Structures - Philipp Slusallek

Accelerating Ray Tracing

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Collision Detection. These slides are mainly from Ming Lin s course notes at UNC Chapel Hill

high performance medical reconstruction using stream programming paradigms

Collision Detection with Bounding Volume Hierarchies

igmobybspheres jonathan garrett 3/7/08

On the data structures and algorithms for contact detection in granular media (DEM) V. Ogarko, May 2010, P2C course

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Wed, October 12, 2011

Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Collision detection. Piotr Fulma«ski. 1 grudnia

Real Time Ray Tracing

Speeding Up Ray Tracing. Optimisations. Ray Tracing Acceleration

Using Bounding Volume Hierarchies Efficient Collision Detection for Several Hundreds of Objects

Collision detection. Piotr Fulma«ski. 19 pa¹dziernika Wydziaª Matematyki i Informatyki, Uniwersytet Šódzki, Polska

Collision Detection. Motivation - Dynamic Simulation ETH Zurich. Motivation - Path Planning ETH Zurich. Motivation - Biomedical Simulation ETH Zurich

Cell Based Servers for Next-Gen Games

improving raytracing speed

INFOGR Computer Graphics. J. Bikker - April-July Lecture 11: Acceleration. Welcome!

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

Production. Visual Effects. Fluids, RBD, Cloth. 2. Dynamics Simulation. 4. Compositing

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Parallel Programming for Graphics

Ray Tracing Acceleration. CS 4620 Lecture 20

CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009.

MSBVH: An Efficient Acceleration Data Structure for Ray Traced Motion Blur

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Massive Model Visualization using Real-time Ray Tracing

Patterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices

Parallel Programming on Larrabee. Tim Foley Intel Corp

Geometry proxies (in 2D): a Convex Polygon

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

B. Tech. Project Second Stage Report on

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Lecture 6: Input Compaction and Further Studies

Accelerating Geometric Queries. Computer Graphics CMU /15-662, Fall 2016

Comparison of hierarchies for occlusion culling based on occlusion queries

Acceleration Data Structures. Michael Doggett Department of Computer Science Lund university

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics

Anti-aliased and accelerated ray tracing. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Ray Tracing Performance

Overview. Collision detection. Collision detection. Brute force collision detection. Brute force collision detection. Motivation

Convex Hull Generation with Quick Hull. Randy Gaul Special thanks to Stan Melax and Dirk Gregorius for contributions on this topic

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Accelerated Raytracing

Ray Tracing: Intersection

Bullet Cloth Simulation. Presented by: Justin Hensley Implemented by: Lee Howes

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

CHAPTER 1 Graphics Systems and Models 3

Parallel Processing. Parallel Processing. 4 Optimization Techniques WS 2018/19

Warped parallel nearest neighbor searches using kd-trees

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

Sparse Fluid Simulation in DirectX. Alex Dunn Dev. Tech. NVIDIA

SAH guided spatial split partitioning for fast BVH construction. Per Ganestam and Michael Doggett Lund University

Overview. Collision Detection. A Simple Collision Detection Algorithm. Collision Detection in a Dynamic Environment. Query Types.

Data Organization and Processing

Sign up for crits! Announcments

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Transcription:

Physics Parallelization A B C Erwin Coumans SCEA

Takeaway Multi-threading the physics pipeline collision detection and constraint solver Refactoring algorithms and data for SPUs Maintain unified multi-platform codebase SPU task maps well to multi-core thread

Parallelism and SPUs Both data and code needs to fit in 256kb DMA transfer to upload code and data Similar to GPU shader and texture upload Hide latency using double buffering Upload next batch while processing current Patterns for Parallel Programming [TM2005]

Broadphase N-Body Find potential overlapping pairs O (n2) number of pairs in theory But O (n) in practice

Broadphase Methods Uniform Grid Octree, BSP tree Spatial Hashing Multi-resolution Sweep and Prune Incremental Full sort (radix/quicksort)

Uniform Grid Can be parallelized on GPU What cell size to use?

Voxelize objects Performance versus Precision

Sweep and Prune (SAP) Incremental version has data dependencies Sometimes, the best serial algorithm has poor parallel scalability [Intel07]

Parallel sorted SAP Array-based sorting [CE2005] Split each axis array into segments Sort segments in parallel and merge Traverse array to find all overlapping pairs

Parallel sorted SAP Need method to check overlap on other axis Perform full AABB check Compare array-indices instead of begin/endpoint values

Pair management Broadphase either: 1) Rebuild all pairs from scratch 2) Incrementally add/remove pairs

Adding/removing pairs 1) Use hash map to find existing pairs 2) Index into uniform grid (GPU) 3) Avoid random access find

Sort and cull pairs sort check overlap, remove duplicates 0 0 0 0 0 next frame: sort 0 Perform AABB overlap check during traversal Mark non-overlapping pairs zero Look forward for duplicates 0 0 0 0 0 0

Pairwise Collision Dispatch Collision pairs are independent: embarrassingly parallel More culling needed for complex objects: AABB trees Output: contact points

Load balancing collision detection Broadphase is producer, mid/narrowphase multiple consumers Evenly distribute work, avoid synchronization Consume pairs while they are produced

Exceptions Deal with user callbacks Collision filter callback (needs collision yes/no?) Serial fallback for unhandled cases (PPU)

Concave Shapes: AABB trees Make big tree fit cache/spu Avoid random access due to recursion Output can go to narrow phase collision detector

Tree to array mapping Streaming large trees in parts Stackless traversal [EG2007]

Recursive versus Stackless void recursetree(node* node, Aabb* aabb) { if (testoverlap(node,aabb)) { if (isleafnode(node)) donarrowphase(node); else { recursetree (node->leftchildnode); recursetree (node->rightchildnode); } } } void iteratetree(node* node, Aabb* aabb) { int curindex=0; while (curindex < endnodeindex) { aabboverlap = testoverlap(node,aabb); if (aabboverlap ) { if (isleafnode(node)) donarrowphase(node); } if (aabboverlap isleafnode(node)) node++; curindex++; else { node += node->escapeindex; curindex += node->escapeindex; } } } }

Narrowphase collision algorithms Code size can become an issue for SPUs Can load/unload code using overlays or SPU Shaders [MA2008] Or use compact general purpose algorithms

Convex: GJK and EPA

GJK and EPA on SPUs See Gino van den Bergen s book [Ber2003] Virtual methods like getsupportingvertex Refactor into switch statement EPA polytope expansion can exceed memory Use PPU fallback or other solution like SAT or Minkowski sampling

Contact Point Allocation Incremental or from scratch each frame Memory pre/de/re/allocation can be a big deal We preallocate fixed-size cache (4 points) for each pair

SPU & Contact Point Caching Contact cache is input and output Find/update existing points Perform contact reduction (to 4 points) on SPU

Solver parallelization Constraint solving methods using large matrices won t fit Use iterative methods [Catto2005] Sequental Impulse or Projected Gauss Seidel Constraints share rigidbodies: not embarrassingly parallel

Each island in parallel Works well, islands are independent Doesn t parallelize large islands, load imbalance Overhead for small, single body islands

Grouping based on spatial hash 1D Array of Cells (N=NUM_BUCKETS) getcellindex (x,y,z) 012345 N1 Calculate cell dependency table Process cells in parallel, using table to avoid conflicts

Spatial hash function //compute hash bucket index in [0,NUM_BUCKETS-1] int32 getcellindex (int32 cellpos_x,int32 cellpos_y, int32 cellpos_z) { const int32 h1 = 0x8da6b343; //Large primes const int32 h2 = 0xd8163841; const int32 h3 = 0xcb1ab31f; int32 n = h1*cellpos_x + h2*cellpos_y + h3*cellpos_z; n = n%num_buckets; if (n<0) n+=num_buckets; return n; } //Example from Christer Ericson s book Real-time collision detection.

Reordering constraints [Intel07] C c3 c2 A B c1 c1 c2 C c2 c3 D B c1 D c4 c3 c4 c4 A c1 c1 c3 c3 c4 c2 c2 c4

Insomniac vs Bullet SPU Physics SPU physics pipeline comparison presentation is available online from Insomniac Tech website [IN2008]

References [Intel07] High-Performance Physical Simulations on Next-Generation Architecture with Many Cores, August 2007, Intel Technology Journal, http://www.intel. com/technology/itj/2007/v11i3/8-simulations/1-abstract.htm [EG2007] Stackless KD-Tree Traversal for High Performance GPU Ray Tracing, Proceeding of Eurographics, http://www.mpi-inf.mpg.de/%7eguenther/stacklessgpurt/index.html [MA2008] SPU Shaders, Mike Acton, Insomniac Games, Game Developers Conference 2008, http://www.insomniacgames. com/tech [Catto2005] Erin Catto, http://gphysics.com/downloads [IN2008] Insomniac and SCEA talk SPU Physics, http://www. insomniacgames.com/tech/articles/0108/insom_and_scea_talk_spu_physics.php

Books [TM2005] Patterns for Parallel Programming, Timothy Mattson et.al. Addison Wesley [CE2005] Real-time Collision Detection, Christer Ericson, Morgan Kaufmann, page 336 [Ber2003] Collision detection in interactive 3D environments, Gino van den Bergen, Morgan Kaufmann [PBA2005] Physics Based Animation, Kenny Erleben et. al. Charles River Media, page 613

Bullet physics library An open source 3D physics engine http://bulletphysics.com Written in C++ Parallelized for SPU and other architectures Check out SCEA booth #5905 at GDC Expo SPU optimized Game Physics Licensed PlayStation 3 developers can request access to SPURS version