Designing a Domain-specific Language to Simulate Particles. dan bailey

Size: px

Start display at page:

Download "Designing a Domain-specific Language to Simulate Particles. dan bailey"

Rosemary Gilbert
6 years ago
Views:

1 Designing a Domain-specific Language to Simulate Particles dan bailey

2 Double Negative Largest Visual Effects studio in Europe Offices in London and Singapore Large and growing R & D team

3 Squirt Fluid Solver Introduced in 2006 Grid / FLIP - 3D Navier Stokes Re-engineering for parallelisation

4 Dynamo Simulation Engine Solver Configuration Based on data-flow programming model Data Model Operator Network Parameter Tree Split logic into data and algorithms Initial solver configuration performs setup Jet Compiler Jet compiler provides parallelisation

5 Dynamo Data Model Represents base data resolutions and tiling Stencil information 1 1 Stores layout, indexing and access to data buffers * * Provides world space transformation data Voxels Grids

Operator Operator Data Model Parameter Tree Operator

6 Dynamo Operator Network Operators pull data from Data Model Operator Operator network executed recursively Operator Operator Data Model Parameter Tree Operator Operator Operators pull parameters from Parameter Tree

7 Jet Language and Compiler Compiler can compile for CPU or GPU using LLVM 3.2 Operator Jet Uses X86 backend or NVidia PTX backend Launch code also provided by the compiler CPU X86 LLVM IR Thread-pool CPU C++ x86 backend clang X86 Machine Code X86 Machine Code parallel-aware LLVM IR Jet Compiler GPU CUDA Runtime NVVM IR Host GPU Launch C++ nvptx backend clang PTX Virtual Assembly X86 Machine Code

8 Grid Simulation Advection Emission Forces Projection += += += += Particle-in-Cell Simulation Interpolation Seeding Advection Splatting Emission Forces Projection += += += += FLIP Solve means quantities stored on grids are deltas

9 Particle Operators Interpolation Advection Independent particle operations mean parallelisation is trivial Seeding Splatting Presence of race conditions means care must be taken when parallelising

10 Particle Storage Each grid tile maps to static contiguous memory Regular layout means predictable indexing Particle set maps to dynamic contiguous memory Irregular layout means frequent re-allocation

11 Particle Bucketing Common technique for handling particles in parallel Often used purely as a data acceleration structure (but we also want voxel data) Store start, end and length values per voxel Can now iterate over particles or over particles per voxel

Dynamo Data Model * 1 Particles Stencil 1 1 1 1 * * * *

12 Dynamo Data Model * 1 Particles Stencil * * * * Attribute Voxels Grids Mandatory attributes are id and positional

13 Goals of Jet Language Modularity Scalability Domain-specific Simplicity

$Example Jet kernel for Weighted Blur performing a weighted blur on voxel data Type inference helps with simplicity Weighted_Blur : Voxel<Box> (output, input) { limit(1, 1, 1, 1, 1, 1); } value =$

14 Example Jet kernel for Weighted Blur performing a weighted blur on voxel data Type inference helps with simplicity Weighted_Blur : Voxel<Box> (output, input) { limit(1, 1, 1, 1, 1, 1); } value = input:(0, 0, 0) * 4 + input:(-1, 0, 0) + input:(1, 0, 0) + input:(0, -1, 0) + input:(0, 1, 0) + input:(0, 0, -1) + input:(0, 0, 1); return value / 10; Colon-bracket operator for Aspects of functional accessing relative voxels programming used to prevent data dependencies

Iterator means modify particle data with reference to parent voxel Interpolation Retrieve position values Interpolate : Particle<Voxel> (output, quantity, flip) { // retrieve particle positions x =

15 Iterator means modify particle data with reference to parent voxel Interpolation Retrieve position values Interpolate : Particle<Voxel> (output, quantity, flip) { // retrieve particle positions x = flip:x(0); y = flip:y(0); z = flip:z(0); // calculate voxel indices i = x < 0.5f? global_i() : global_i() - 1; j = y < 0.5f? global_j() : global_j() - 1; k = z < 0.5f? global_k() : global_k() - 1; // adjust offsets x = x < 0.5f? x + 0.5f : x - 0.5f; y = y < 0.5f? y + 0.5f : y - 0.5f; z = z < 0.5f? z + 0.5f : z - 0.5f; return trilerp(quantity, i, j, k, x, y, z); } for current particle from flip particle set Coupled model means particle positions local and in range

Iterator means modify voxel data with reference to a child particle Splatting Read from many, write to one to avoid race conditions Neighbour modifier adjusts region in which

out of range (2 x 2 x 2) if (local_i() == -1 && x < 0.5f) return; } // calculate particle weighting if (local_i() == -1) else if (local_i() == 1) else if (x < 0.

16 Iterator means modify voxel data with reference to a child particle Splatting Read from many, write to one to avoid race conditions Neighbour modifier adjusts region in which to iterate over particles Splat : Voxel<Particle> (output, flip, attribute) { limit(1, 1, 1, 1, 1, 1); neighbour(-1, -1, -1, 1, 1, 1); x = flip:x(0); // early exit if particle out of range (2 x 2 x 2) if (local_i() == -1 && x < 0.5f) return; } // calculate particle weighting if (local_i() == -1) else if (local_i() == 1) else if (x < 0.5f) else // (missing code for calculating ty and tz) // sum the weighted results tx = x - 0.5f; tx = 0.5f - x; tx = x + 0.5f; tx = 1.5f - x; return output:(0, 0, 0) + attribute:(0) * tx * ty * tz;

17 Results Interpolation 500ms 506ms (6.4x) (7.2x) (14.8x) 250ms 79.0ms 70.2ms 34.1ms CPU (unthreaded) Jet CPU (8 threads) Jet GPU (Q4000) Jet GPU (Tesla K20) 6190ms Splatting 6000ms (7.8x) (9.3x) (22.2x) 3000ms 790ms 663ms 283ms CPU (unthreaded) Jet CPU (8 threads) Jet GPU (Q4000) Jet GPU (Tesla K20)

18 Particle Sorting Advection Particles must be sorted after being moved Rely on subsampling to maintain particle density Seeding Voxels with too many particles get starved Voxels with too few particles get spawned

19 Inter-tile Sorting Identify escaped particles Perform stream compaction to select them Deduce desired destination tile Move particles into spare tile scratch area

20 Intra-tile Sorting Sort each tile independently Include any particles in tile scratch area Deduce desired destination voxel Perform radix sort on voxel indices

21 Conclusion Complex data model compliments atomic algorithms Jet DSL provides efficient grid-particle interactions Particle sorting currently provided implicitly

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!