UberFlow: A GPU-Based Particle Engine

Similar documents
GPU Particles. Jason Lubken

Sorting and Searching. Tim Purcell NVIDIA

Youngho Kim CIS665: GPU Programming

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Graphics Hardware. Instructor Stephen J. Guy

2.11 Particle Systems

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Volume Graphics Introduction

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Could you make the XNA functions yourself?

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Applications of Explicit Early-Z Culling

Graphics Processing Unit Architecture (GPU Arch)

The Application Stage. The Game Loop, Resource Management and Renderer Design

Deferred Rendering Due: Wednesday November 15 at 10pm

Cloth Simulation on the GPU. Cyril Zeller NVIDIA Corporation

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Shaders (some slides taken from David M. course)

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

CS 354R: Computer Game Technology

Performance OpenGL Programming (for whatever reason)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

General Algorithm Primitives

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Efficient and Scalable Shading for Many Lights

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Programming Graphics Hardware

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

WebGL: Hands On. DevCon5 NYC Kenneth Russell Software Engineer, Google, Inc. Chair, WebGL Working Group

GPU Memory Model Overview

Scheduling the Graphics Pipeline on a GPU

GPU Memory Model. Adapted from:

Portland State University ECE 588/688. Graphics Processors

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

EE 4702 GPU Programming

3D Rendering Pipeline

Adaptive Point Cloud Rendering

Attention to Detail! Creating Next Generation Content For Radeon X1800 and beyond

What s New with GPGPU?

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

Real-Time Volumetric Smoke using D3D10. Sarah Tariq and Ignacio Llamas NVIDIA Developer Technology

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

A Trip Down The (2011) Rasterization Pipeline

Spring 2009 Prof. Hyesoon Kim

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

A Hardware F-Buffer Implementation

Rendering Grass with Instancing in DirectX* 10

Windowing System on a 3D Pipeline. February 2005

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

First Steps in Hardware Two-Level Volume Rendering

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

Scanline Rendering 2 1/42

Computer Graphics (CS 543) Lecture 10: Soft Shadows (Maps and Volumes), Normal and Bump Mapping

CS451Real-time Rendering Pipeline

Spring 2009 Prof. Hyesoon Kim

Advanced Lighting Techniques Due: Monday November 2 at 10pm

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

GPU-AWARE HYBRID TERRAIN RENDERING

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

Spring 2011 Prof. Hyesoon Kim

A GPU Accelerated Spring Mass System for Surgical Simulation

Animation Essentially a question of flipping between many still images, fast enough

Evolution of GPUs Chris Seitz

Mali Demos: Behind the Pixels. Stacy Smith

Soft Particles. Tristan Lorach

GPU-based Distributed Behavior Models with CUDA

Particle Systems. Sample Particle System. What is a particle system? Types of Particle Systems. Stateless Particle System

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

GPU Computing: Particle Simulation

Computer graphics 2: Graduate seminar in computational aesthetics

Blink: 3D Display Multiplexing for Virtualized Applications

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Rasterization Overview

APPROVAL SHEET. Jonathan Willard Decker Master of science, 2007

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

The Rasterization Pipeline

CS GAME PROGRAMMING Question bank

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

The Problem: Difficult To Use. Motivation: The Potential of GPGPU CGC & FXC. GPGPU Languages

Direct3D 11 Performance Tips & Tricks

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Point-Based rendering on GPU hardware. Advanced Computer Graphics 2008

Graphics and Interaction Rendering pipeline & object modelling

Transcription:

UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München

Motivation Want to create, modify and render large geometric models Important example: Particle system

Motivation Major bottleneck - Transfer of geometry to graphics card Process on GPU if transfer is to be avoided - Need to avoid intermediate read-back also Requires dedicated GPU implementations Perform geometry handling for rendering on the GPU

Bus transfer - Send geometry for every frame - because simulation or visualization is time-dependent - the user changed some parameter - Render performance: 12.6 mega points/sec - Make the geometry reside on the GPU - need to create/manipulate/remove vertices without read-back - Render performance: 114.5 mega points/sec ATI Radeon 9800Pro, AGP 8x, GL_POINTS with individual color

Motivation Previous work - GPU used for large variety of applications - local / global illumination [Purcell2003] - volume rendering [Kniss2002] - image-based rendering [Li2003] - numerical simulation [Krüger2003] - GPU can outperform CPU for both computebound and memory-bound applications Geometry handling on GPU potentially faster

GPU Geometry Processing Simple copy-existing-code-to-shader solutions will not be efficient Need to re-invent algorithms, because - different processing model (stream) - different key features (memory bandwidth) - different instruction set (no binary ops)

GPU Geometry Processing Need shader access to vertex data - OpenGL SuperBuffer - Memory access in fragment shader - Directly attach to compliant OpenGL object - VertexShader 3.0 - Memory access in vertex shader - Use as displacement map - Both offer similar functionality

OpenGL SuperBuffer Separate semantic of data from it s storage - Allocate buffer with a specified size and data layout Create OpenGL objects - Colors: texture, color array, render target Vectors: vertex array, texcoord array If data layout is compatible with semantic, the buffer can be attached to / detached from the object - Zero-copy operation in GPU memory Render-to-vertex array possible by using floating-point textures and render targets

OpenGL SuperBuffer - Example: floating point array that can be read and written (not at the same time) OpenGL texture object OpenGL render target (offscreen) glgentextures() gldrawbuffer() OpenGL memory object change of attachment possible outside rendering activity RGBA_FLOAT32_ATI

GPU Particle Engine cool demo

Overview GPU particle engine features - Particle advection - Motion according to external forces and 3D force field - Sorting - Depth-test and transparent rendering - Spatial relations for collision detection - Rendering - Individually colored points - Point sprites

Particle Advection Simple two-pass method using two vertex arrays in double-buffer mode - Render quad covering entire buffer - Apply forces in fragment shader bind to texture buffer 0 pass 1: integrate bind to render target render target screen pass 2: render buffer 1 bind to vertex array

Sorting Required for correct transparency and collision detection - Bitonic merge sort (sorting network) [Batcher1968] - Sorting n items needs (log n) stages - Overall number of passes ½ (log²n + log n)

Sorting a 2D field - Merge rows to get a completely sorted field - Implement in fragment shader [Purcell2003] - A lot of arithmetic necessary - Binary operations not available in shader

Fast sorting Make use of all GPU resources - Calculate constant and linear varying values in vertex shader and let raster engine interpolate - Render quad size according to compare distance - Modify compare operation and distance by multiplying with interpolated value +1-1 < row sort +1 +1-1 -1 < +1 column sort -1

Fast sorting - Perform mass operations (texture fetches) in fragment shader t0 = fragment position t1 = parameters from vertex shader (interpolated) OP1 = TEX[t0] sign = (t1.x < 0)? -1 : 1 OP2 = TEX[t0.x + sign*dx, t0.y] return (OP1 * t1.y < OP2 * t1.y)? OP1 : OP2

Fast sorting - Final optimization: sort [index, key] pairs - pack 2 pairs into one fragment - lowest sorting pass runs internal in fragment shader - Generate keys according to distance to viewer or use cell identifier of space partitioning scheme initial pass collapse into single pass third pass collapse into single pass

Fast sorting - Same approach for column sort, just rotate the quads - Benefits for full sort of n items - 2*log(n) less passes (because of collapse and packing) - n/2 fragments processed each pass (because of packing) - workload balanced between vertex and fragment units (because of rendering quads and interpolation) Speedup factor of 10 compared to previous solutions

Fast sorting - Performance: full sort n sorts/sec mega items/sec mega frag/sec 128² 175.0 2.8 130 256² 43.6 2.8 171 512² 9.3 2.4 186 1024² 1.94 2.0 193 128² 238.0 3.9 177 256² 110.0 7.2 433 512² 24.4 6.4 489 1024² 4.85 5.1 483 ATI Radeon 9800Pro ATI Radeon X800 XT

Particle Scene Collision Additional buffers for state-full particles - Store velocity per particle (Euler integration) - Keep last two positions (Verlet integration) - Simple: Collision with height-field stored as 2D texture - RGB = [x,y,z] surface normal - A = [w] height - Compute reflection vector - Force particle to field height

Particle Particle Collision Essential for natural behavior - Full search is O(n²), not practicable - Approximate solution by considering only neighbors - Sort particles into single grid spatial structure - Staggered grid misses only few combinations

Particle Particle Collision Essential for natural behavior - Full search is O(n²), not practicable - Approximate solution by considering only neighbors - Sort particles into staggered grid spatial structure - Staggered grid misses only few combinations

Particle Particle Collision - Check m neighbors to the left/right - Collision resolution with first collider (time sequential) - Only if velocity is not excessively larger than integration step size solve quadratic equation on GPU

GPU Particle Engine more cool demos

GPU Particle Engine Acknowledgements - ATI Research for providing hardware - Jens Krüger for insight on shader programming http://wwwcg.in.tum.de/gpu