GPU-based Distributed Behavior Models with CUDA

Similar documents
CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Lecture 6: Input Compaction and Further Studies

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133)

Abstract. Introduction. Kevin Todisco

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

Using GPUs to compute the multilevel summation of electrostatic forces

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Shape of Things to Come: Next-Gen Physics Deep Dive

CS 378: Computer Game Technology

Graphs, Search, Pathfinding (behavior involving where to go) Steering, Flocking, Formations (behavior involving how to go)

Lattice Boltzmann with CUDA

Complex Systems Simulations on the GPU

Chapter 3 Implementing Simulations as Individual-Based Models

CS 231. Crowd Simulation. Outline. Introduction to Crowd Simulation. Flocking Social Forces 2D Cellular Automaton Continuum Crowds

Real-Time Volumetric Smoke using D3D10. Sarah Tariq and Ignacio Llamas NVIDIA Developer Technology

Particle Systems. Typical Time Step. Particle Generation. Controlling Groups of Objects: Particle Systems. Flocks and Schools

GPU Particles. Jason Lubken

COMPUTER EXERCISE: POPULATION DYNAMICS IN SPACE September 3, 2013

LECTURE 16: SWARM INTELLIGENCE 2 / PARTICLE SWARM OPTIMIZATION 2

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

UberFlow: A GPU-Based Particle Engine

Multi Agent Navigation on GPU. Avi Bleiweiss

Robots & Cellular Automata

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

CUDA Particles. Simon Green

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

Efficient Crowd Simulation for Mobile Games

Fast Multipole Method on the GPU

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

GPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data. Ralf Kähler (KIPAC/SLAC)

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

MD-CUDA. Presented by Wes Toland Syed Nabeel

S7316: Real-Time Robotics Control and Simulation for Deformable Terrain Applications Using the GPU

Overview of Traditional Surface Tracking Methods

URBAN SCALE CROWD DATA ANALYSIS, SIMULATION, AND VISUALIZATION

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

Variations on Genetic Cellular Automata

Traffic/Flocking/Crowd AI. Gregory Miranda

OpenACC programming for GPGPUs: Rotor wake simulation

Warped parallel nearest neighbor searches using kd-trees

B. Tech. Project Second Stage Report on

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

A Cross-Input Adaptive Framework for GPU Program Optimizations

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos

Double-Precision Matrix Multiply on CUDA

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

Youngho Kim CIS665: GPU Programming

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Advanced Graphics. Path Tracing and Photon Mapping Part 2. Path Tracing and Photon Mapping

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

Realistic Animation of Fluids

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

V-Ray RT: A New Paradigm in Photorealistic Raytraced Rendering on NVIDIA GPUs. Vladimir Koylazov Chaos Software.

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Simulation: Particle Systems

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

NVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film

Optical Flow Estimation with CUDA. Mikhail Smirnov

GPU Computing: Particle Simulation

Parallel Computing: Parallel Architectures Jin, Hai

Di Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio

Static Scene Reconstruction

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

The future is parallel but it may not be easy

Input Nodes. Surface Input. Surface Input Nodal Motion Nodal Displacement Instance Generator Light Flocking

OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs

Lecture 5: Input Binning

Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Foster s Methodology: Application Examples

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC

CUDA Particles. Simon Green

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

Navier-Stokes & Flow Simulation

CMU-Q Lecture 4: Path Planning. Teacher: Gianni A. Di Caro

What is a Rigid Body?

Black Desert Online. Taking MMO Development to the Next Level. Dongwook Ha Gwanghyeon Go

Distributed Consensus in Multivehicle Cooperative Control: Theory and Applications

CGT 581 G Fluids. Overview. Some terms. Some terms

Fast BVH Construction on GPUs

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Discrete Element Method

UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE

Principles of Computer Game Design and Implementation. Revision Lecture

PARTICLE SWARM OPTIMIZATION (PSO)

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Software and Performance Engineering for numerical codes on GPU clusters

Morphological: Sub-pixel Morhpological Anti-Aliasing [Jimenez 11] Fast AproXimatte Anti Aliasing [Lottes 09]

Real-time Path Planning and Navigation for Multi-Agent and Heterogeneous Crowd Simulation

Optimization solutions for the segmented sum algorithmic function

1/16. Emergence in Artificial Life. Sebastian Marius Kirsch Back Close

Recent applications of overset mesh technology in SC/Tetra

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Simulation in Computer Graphics. Introduction. Matthias Teschner. Computer Science Department University of Freiburg

Ray Tracing on GPUs. Ian Buck. Graphics Lab Stanford University

GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

UNIT 9C Randomness in Computation: Cellular Automata Principles of Computing, Carnegie Mellon University

Transcription:

GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea

Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors (large # of elements). * need to manage aggregate motion of elements. Related work: * Navier-Stokes equations (simulation of physical flows). * robot motion planning. * ray tracing (computer graphics). Implementation: * information about neighbor elements = stream data. * implemented as fragment programs. * discrete simulation (behavioral and graphics).

Larger Context Why do we care about flocking? Models of complex, collective behavior (e.g. traffic modeling) * local interactions, global order parameters. Analogue for emergent processes (e.g. order from chaos, self-organized behavior) * physical processes (B-Z reactions, morphogenesis). Models of autonomous, interactive systems (e.g. social, comunication networks) * local interactions (connectivity) contributes to global properties (stability). Future work: * extend to other implementations of k-nearest neighbors search algorithm. * use scattering matrix, allows for position updates without centralized coordination.

Cellular Automata Models Cellular Automata: discrete dynamical simulation. Cells have properties and interaction rules, behave in parallel. Below: 2-D von Neumann neighborhood, order 1 Properties: internal state. Interaction rules: if n > 2 neighbors are red, turn red. Parallelism: all cells use same set of rules, have same properties. Example: Wolfram s Rule 30 (1-D lattice) 1 2 3 4 5 6 7 8 current pattern 111 110 101 100 011 010 001 000 new state for center cell 0 0 0 1 1 1 1 0 Rule 30 - model Rule 30 - nature

Defining Nearest-Neighbor in Parallel Agent-based, metric distance, and topological distance models: Generative (agent-based) Model COURTESY: http://en.wikipedia.org/ wiki/swarm_behaviour

Local parameters, data handling Cellular Automata-like: * coupled-map lattice. Space partitioning data structure (CPU): 1) gather data on neighbors. Synchronized aggregated motion of flocks: 2) update at every frame (movement). 1) fix one or more spatial goals (containment, obstacle avoidance). 2) leader following (whoever the leader is, determines where flock goes).

Data architecture Left: architecture of behavior. Personality (profile of each boid) blended and determines global parameters. Below: scattering matrix. Extracts global parameters from uniform shape.

Program Structure GPU: runs fragment programs, can use stream process * stream data arranged as textures. * handles scene rendering, updates. CPU: used for spatial sorting only. Four textures: T c = constant texture (global scalars max. velocity, max. acceleration). T o = orientation (3 scalar variables forward, side, up). T p = position (3 scalar values x,y,z). T n = nearest-neighbors (von Neumann neighborhood).

Example: N-body problem for CUDA GPU Gems, #31:

What is the n-body problem? N-body = every body in set n interacts (potentially) with all other bodies. Basic computational features: 1) all-pairs approach (unlike nearest-neighbor or structured topology). 2) O(N 2 ) time. 3) encoded as a kernel, determines forces in close-range interactions. Previous approaches: * fast multipole methods (FMMs). * particle mesh methods.

CUDA Implementation CUDA Implementation * calculate each entry f ij in an N x N grid of all pair-wise forces. * total force F i (acceleration a i ) on body I Σrow i * each entry computed independently, O(N 2 ) available parallelism. Code example #1: 31.1, Updating acceleration of one body as a result of its interaction with another body. Code example #2: 31.2, Evaluating interactions in a p p tile. Code example #3: 31.3, Listing 31-3. The CUDA kernel executed by a thread block with p threads to compute the gravitational acceleration for p bodies as a result of all N interactions.

Special features Serialize some computations: * achieve data reuse (bandwidth-performance tradeoff). * computational tile (square region - pxp in size - subset of pairwise forces). * 2p body descriptions, p^2 interaction within tile, total effect = update to p acceleration vectors. Body-body Force: * compute force on body i from interaction with body j updates acceleration a_i of body i resulting from interaction. * float4 data type (CUDA-specific), stored in GPUs device memory -- body mass = w field (allows coalesced memory access to arrays of data -- efficient memory request transfers).

Other details myposition: input parameter that holds position of body for executing thread. shposition: array of body descriptions in shared memory. thread: iterates over same p bodies. p threads: execute function body in parallel. In summary: * acceleration computed on individual body as a result of interaction with p other bodies. * tile evaluated by p threads -- same sequence of operations on different data. * each thread updates acceleration of one body as a result of interactions with p other bodies. * load p body descriptions from GPU device memory into shared memory provided to each thread block in CUDA model. * each thread in block evaluates p successive interactions -- p updates accelerations.

Nearest-Neighbor Searches (instance of n-body problem) using kd-trees (selected slides from Brown and Snoeyink, GPU Nearest- Neighbor searches using a minimal kd-tree. University of North Carolina Chapel Hill).

Courtesy: Brown and Snoeyink, GPU Nearest-Neighbor searches using a minimal kd-tree. Slides on Internet.

SIMPLIFY COMPRESS BREAK APART