On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

Similar documents
A High-Performance Software Graphics Pipeline Architecture for the GPU

Lecture 2. Shaders, GLSL and GPGPU

A High-Performance Software Graphics Pipeline Architecture for the GPU

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Rendering Objects. Need to transform all geometry then

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

Spring 2009 Prof. Hyesoon Kim

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Spring 2011 Prof. Hyesoon Kim

Hardware-driven Visibility Culling Jeong Hyun Kim

The Rasterization Pipeline

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS427 Multicore Architecture and Parallel Computing

Threading Hardware in G80

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Rendering approaches. 1.image-oriented. 2.object-oriented. foreach pixel... 3D rendering pipeline. foreach object...

Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Graphics Processing Unit Architecture (GPU Arch)

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Scheduling the Graphics Pipeline on a GPU

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Pipeline Operations. CS 4620 Lecture 14

Portland State University ECE 588/688. Graphics Processors

Shaders. Slide credit to Prof. Zwicker

Course Recap + 3D Graphics on Mobile GPUs

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Functional Programming of Geometry Shaders

GRAPHICS PROCESSING UNITS

Deus Ex is in the Details

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Scanline Rendering 2 1/42

The Graphics Pipeline

CS4620/5620: Lecture 14 Pipeline

Chapter 1 Introduction

Mattan Erez. The University of Texas at Austin

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

GeForce4. John Montrym Henry Moreton

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

PowerVR Hardware. Architecture Overview for Developers

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Rasterization Overview

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

GPU Memory Model. Adapted from:

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Computer graphics 2: Graduate seminar in computational aesthetics

Parallel Programming for Graphics

EECS 487: Interactive Computer Graphics

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research

E.Order of Operations

2.11 Particle Systems

GPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Graphics and Imaging Architectures

CS451Real-time Rendering Pipeline

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Mattan Erez. The University of Texas at Austin

GPU-Based Visualization of AMR and N-Body Dark Matter Simulation Data. Ralf Kähler (KIPAC/SLAC)

A Trip Down The (2011) Rasterization Pipeline

GPU Architecture. Michael Doggett Department of Computer Science Lund university

A Reconfigurable Architecture for Load-Balanced Rendering

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Rendering Subdivision Surfaces Efficiently on the GPU

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Lecture 25: Board Notes: Threads and GPUs

Introduction to Shaders.

The Graphics Pipeline

CS 464 Review. Review of Computer Graphics for Final Exam

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

Ciril Bohak. - INTRODUCTION TO WEBGL

GPUs and GPGPUs. Greg Blanton John T. Lubia

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

Teaching a Modern Graphics Pipeline Using a Shader-based Software Renderer

Direct Rendering of Trimmed NURBS Surfaces

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

Evolution of GPUs Chris Seitz

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University

The Rendering Pipeline

Hardware-driven visibility culling

The Graphics Pipeline

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Administrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting

The Application Stage. The Game Loop, Resource Management and Renderer Design

Graphics Processing Unit (GPU)

Transcription:

2018 On-the-fly for Massively-Parallel Software Geometry Processing Bernhard Kerbl Wolfgang Tatzgern Elena Ivanchenko Dieter Schmalstieg Markus Steinberger

5 4 3 4 2 5 6 7 6 3 1 2 0 1 0, 0,1,7, 7,1,2, 3,4,2, 2,4,7, 7,4,5, 7,5,6, 7,6,0, 0 1 2 3 4 5 6 2

5 4 4 2 3 Vertex Processing 5 6 7 6 0 3 1 1 2 Primitive Processing 0, 0,1,7, 7,1,2, 3,4,2, 2,4,7, 7,4,5, 7,5,6, 7,6,0, 0 1 2 3 4 5 6 3

Vertex Cache Vertex Processing Classic Approach [Hoppe 1999]: cache last! shaded vertices During Primitive Processing: vertex needed check cache cache miss rerun vertex processing Vertex Cache Primitive Processing 4

Why? Rise of Compute Mode Rendering our ongoing research on GPU Software Rendering Pipelines Vertex Attributes Indices Input Assembly Vertex Shading Geometry Processing Primitive Assembly Clipping/ Culling Triangle Setup Triangles Rasterizer Rasterization Fragment Shading Raster Operations Framebuffer Vertex Processing Primitive Processing Fragment Processing A High-Performance Software Graphics Pipeline Architecture for the GPU [Kenzel et al. 2018] Thursday, August 16 @ SIGGRAPH 18 5

Motivation More and more things are done in Compute Mode cannot leverage hardware vertex reuse Just implement Vertex Caching!? - Scalability - not efficient to do in Software 6

Aspects of Mesh Optimization Scheduling of Vertex Processing to exploit locality of vertex references Reordering of the index stream to maximize locality of vertex references This work Most previous work 7

Parallel Need to divide input stream among processors Basic Tradeoff: reuse potential parallelism,0,1,7,7,1,2,3,4,2,2,4,7,7,4,5,7,5,6,7,6,0,6,0,8 0,1,2,7 2,3,4,7 4,5,6,7 0,6,7,8 0,1,2,3,4,5,6,7,8 8

Building Blocks Two components go into enabling : Batching: How do we divide the input stream? Deduplication: How do we identify duplicates? 9

Batching Static Batching Dynamic Batching equally-sized! " indices per batch + batches independent some underutilization variably-sized max.! # unique indices per batch each batch depends on previous + full utilization guaranteed 10

Deduplication Compare each index with each other index Warp Voting Sort indices and skip identical values Sorting Use a hash table to map indices to threads Hashing 11

Deduplication: Warp Voting,2,3,9,3,4,9,4,5,6,4,6,9,9,6,7,9,7,8,9,8,2, 2 3 9 3 4 9 4 5 6.. 2 3 9 4 2 34 95 46 5 6 Vertex Map thread register shared memory 12

Deduplication: Sorting,4,5,6,4,6,9,9,6,7,3,6,9,9,6,7,9,7,8,9,8,2, Vertex Map Transformed Vertices sort 4,4,5,6,6,6,7,9,9 1,0,1,1,0,0,1,1,0 scan 0,0,1,2,2,2,3,4,4 thread register shared memory 13

Deduplication: Hashing,4,5,6,4,6,9,9,6,7,3,6,9,9,6,7,9,7,8,9,8,2, 4,5,6,4,6,9,9,6,7 5 7 4 6 9 Transformed Vertices thread register shared memory 14

Methods Overview Static Batching Dynamic Batching Warp Voting Sorting Hashing Collaborative Hashing 15

Evaluation Perform Vertex Shading Total War: Shogun 2 Test Set: ~100 scenes captured from video games preprocessed with methods of Hoppe [1999], and Forsyth [2006] Varying shader complexity Rise of the Tomb Raider The Witcher 3: Wild Hunt simple Matrix Vector Multiplication simulated load of! Fused Multiply Add instructions 16

Results: Processing Time Complex Shader relative processing time Simple Shader relative processing time GTX 780 Ti GTX 980 Ti GTX 1080 Ti 17

Hoppe unprocessed Results: Shading Rate NVIDIA GTX 1080 Ti Intel HD Graphics 630 AMD RX Vega 64 Our Warp Voting Our Sorting 18

Forsyth unprocessed Results: Shading Rate NVIDIA GTX 1080 Ti Intel HD Graphics 630 AMD RX Vega 64 Our Warp Voting Our Sorting 19

Results: Average Shading Rate ASR NVIDIA GTX 1080 Ti Intel HD 630 AMD Vega 64 Our Warp Voting Our Sorting ASR = #vertex shader invocations #triangles 20

Conclusion Very simple shaders: no benefit from vertex reuse Low to medium complexity Warp Voting best allrounder Very high complexity Sorting or Hashing 21

Future Directions generalizes to a more abstract problem The same solutions may be useful beyond graphics and rendering pipelines 22

https://github.com/gpupeople/vertex_reuse

Results: Processing Time 25

Results: ASR 26