Parallel Compact Roadmap Construction of 3D Virtual Environments on the GPU

Size: px

Start display at page:

Download "Parallel Compact Roadmap Construction of 3D Virtual Environments on the GPU"

Beryl Kerry Haynes
6 years ago
Views:

1 Parallel Compact Roadmap Construction of 3D Virtual Environments on the GPU Avi Bleiweiss NVIDIA Corporation

2 Programmability GPU Computing CUDA C++ Parallel Debug Heterogeneous Computing Productivity Efficiency CPU Fast Scalar GPU Scalable Parallel

3 Fermi Architecture 512 cores Host Interface GigaThread Engine 1024 threads/ L1/L2 1.5 Tflops Memory Controller Memory Controller Memory Controller GPC Polymorph Engine Polymorph Engine Raster Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine GPC Polymorph Engine Polymorph Engine L2 Cache Polymorph Engine Polymorph Engine Raster Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Polymorph Engine Memory Controller Memory Controller Memory Controller GPC Raster Engine GPC Raster Engine

4 Planning System C space Roadmap Construction Goals Path Searching State Multi Agent Simulation Position Velocity

5 Roadmap Rapidly-exploring Random Tree (RRT) Simple, fast Unexplored space bias In-lined build, query Probabilistic Roadmap (PRM) Uniform sampling Completion unbound Often large footprint Highly scalable Reachability Roadmap (RRM) Resolution complete 2D/3D environments Compact graph

6 Connectors IROS 2010 Construction Flood Fill Guards C space Grid MAT MST Graph DT [Geraerts and Overmars 2005]

7 Challenges Large, dependent 3D data structures Divergent, irregular threads Coverage inherently serial Limited Flood Fill stack

8 Implementation Global memory resources 2D/3D pitched linear Kernel per RRM stage Implied synchronization Free intermediate data Dynamic parallelism A B C 0 C 1 C 2 C 3 C 4 D Parallel Kernels DAG Task Manager

9 Medial Axis Transform Serial running time O(kn 3 ) n 3 GPU threads, per pass O k time for CDT O(1)for qualifier O 1 for resolve Binary Grid Chess Distance Transform Qualify Resolve [Lee and Horng 1996] MAT T

10 Distance Transform Squared Euclidian distance Serial running time O(n 3 ) Parallel linear time O(n) Slice, column, row passes n 2 GPU threads, per pass [Felzenszwalb and Huttenlocher 1996]

11 Flood Fill Obstacle aware 3D line drawing Parallel initial guards Single cell Large stack Scan line slower! Smaller stack

12 Connectors Read-only produced data 3D grid, MAT, DT, coverage Parallel surviving guards n(n 1)/2 threads A connector per thread

13 Experiments Vertices C space Faces Closure Resolution GPU Threads MAT DT

14 Statistics Guards Coverage Connectors Graph Nodes Edges Weight % % %

15 Process Binary Grid MAT DT Flood Fill Connectors 77.5

16 Results C space MAT DT Graph

17 Processors GPU s Warps/ Clocks (MHz) L1/Shared (KB) GTX /1446/ /16 GTX /1476/1242 NA Fermi Scale compute 0.98 memory 1.08

18 Running Time (sec) Speedup IROS 2010 Running Time 25 GTX480 GTX lower is good Configuration Space 0 higher is good

19 MAT Throughput IROS 2010 Throughput GTX higher is good Thread Count

20 Limitations Stack space bounds Less concurrency <100% coverage MAT samples insufficient MST single threaded Guards Flood Fill Launches

21 DP GFlops/Watt IROS 2010 Future Work Medial axis retraction Bucket DT cells Shorter path extractions Add useful cycles High clearance paths [Geraerts and Overmars 2006] Maxwell Kepler Fermi Tesla

22 Summary Parallel RRM More work remains GPU roadmap Dynamic environments Programming tools Constantly improving

23 Thank You! Questions?

24 Info Base GPU AI Technology Preview Toolkit CUDA Zone Debugger Parallel Nsight

25 Backup

26 Appendix Compute scale ( Clk GTX480 Clk GTX285 ) ( ((Warps ) s) GTX480 ((Warps ) s) GTX285 ) Memory scale MemClk GTX480 MemClk GTX285 ( MemBusWidth GTX480 MemBusWidth GTX285 ) GTX480 L1/Shared (KB) config Up to 1.35X faster in 48/16 vs. 16/48

27 MAT Running Time (sec) Speedup IROS 2010 Running Time (1) GTX480 i7 8 Threads lower is good Configuration Space 0 higher is good

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation