Fast Tessellated Rendering on Fermi GF100. HPG 2010 Tim Purcell

Similar documents
Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

A Trip Down The (2011) Rasterization Pipeline

Scheduling the Graphics Pipeline on a GPU

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Rendering approaches. 1.image-oriented. 2.object-oriented. foreach pixel... 3D rendering pipeline. foreach object...

Approximate Catmull-Clark Patches. Scott Schaefer Charles Loop

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Rendering Objects. Need to transform all geometry then

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Direct Rendering of Trimmed NURBS Surfaces

Lecture 2. Shaders, GLSL and GPGPU

Beyond Programmable Shading. Scheduling the Graphics Pipeline

The Application Stage. The Game Loop, Resource Management and Renderer Design

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Motivation MGB Agenda. Compression. Scalability. Scalability. Motivation. Tessellation Basics. DX11 Tessellation Pipeline

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

CS451Real-time Rendering Pipeline

The Graphics Pipeline

PowerVR Hardware. Architecture Overview for Developers

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

The Graphics Pipeline

Ray Casting of Trimmed NURBS Surfaces on the GPU

A Reconfigurable Architecture for Load-Balanced Rendering

Monday Morning. Graphics Hardware

Graphics Hardware. Instructor Stephen J. Guy

Parallel Rendering. Johns Hopkins Department of Computer Science Course : Rendering Techniques, Professor: Jonathan Cohen

Threading Hardware in G80

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside

The Graphics Pipeline

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

GeForce4. John Montrym Henry Moreton

Could you make the XNA functions yourself?

CS452/552; EE465/505. Clipping & Scan Conversion

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Introduction to Computer Graphics with WebGL

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Hardware Displacement Mapping

Real-Time Hair Rendering on the GPU NVIDIA

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

CS427 Multicore Architecture and Parallel Computing

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

What s New with GPGPU?

Graphics Processing Unit Architecture (GPU Arch)

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Pipeline Operations. CS 4620 Lecture 14

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Spring 2009 Prof. Hyesoon Kim

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Efficient GPU Rendering of Subdivision Surfaces. Tim Foley,

Spring 2011 Prof. Hyesoon Kim

Mattan Erez. The University of Texas at Austin

HW-Tessellation Basics

Introduction to the Direct3D 11 Graphics Pipeline

A Data-Parallel Genealogy: The GPU Family Tree

Evolution of GPUs Chris Seitz

Curved PN Triangles. Alex Vlachos Jörg Peters

MESH SHADERS IN TURING. Patrick Mours, 10/11/2018

Portland State University ECE 588/688. Graphics Processors

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Models and Architectures. Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico

CS GAME PROGRAMMING Question bank

Direct Rendering. Direct Rendering Goals

D3D11 Tessellation. Sarah Tariq NVIDIA

The Rasterization Pipeline

Models and Architectures

Deus Ex is in the Details

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

GPU A rchitectures Architectures Patrick Neill May

Chapter 1 Introduction

Approximating Subdivision Surfaces with Gregory Patches for Hardware Tessellation

COMP30019 Graphics and Interaction Rendering pipeline & object modelling

Lecture outline. COMP30019 Graphics and Interaction Rendering pipeline & object modelling. Introduction to modelling

NVIDIA Parallel Nsight. Jeff Kiel

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

The NVIDIA GeForce 8800 GPU

Ulf Assarsson Department of Computer Engineering Chalmers University of Technology

Lecture 25: Board Notes: Threads and GPUs

2.11 Particle Systems

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Transcription:

Fast Tessellated Rendering on Fermi GF100 Hot3D @ HPG 2010 Tim Purcell

Outline DX10 state of the art DX11 tessellation Fermi GF100 architecture Results Demo 2

State of the Art: DX10 pipeline Input Assembler Vertex Shader Geometry Shader izer/ Interpolator Pixel Shader Image from Far Cry 2, courtesy of Ubisoft Output Merger Pixels are meticulously shaded 3

State of the Art: DX10 pipeline Image from Far Cry 2, courtesy of Ubisoft Pixels are meticulously shaded, but geometric detail is modest 4

Tessellation in DirectX 11 From input assembly Hull shader Computes the tessellation factor Runs pre-expansion Explicitly parallel across control points Control points Vertex Patch Assembly Hull Domain Primitive Assembly Geometry

Tessellation in DirectX 11 From input assembly Fixed function tessellation stage Configured by LOD output from Hull Shader Produces triangles and lines Control points Vertex Patch Assembly Hull Domain Primitive Assembly Geometry

Tessellation in DirectX 11 From input assembly Domain shader Runs post-expansion Maps (u,v) to (x,y,z,w) Implicitly parallel Vertex Patch Assembly Hull Control points Domain Primitive Assembly Geometry

Crossbar Primitive Distributor NVIDIA DX10 Logical Pipeline 8 Dataflow Vertex Shader Geometry Shader Pixel Shader

DX10 Logical Pipeline + Tessellation Primitive Distributor TS TS TS TS TS TS TS TS Tessellation Shader (includes Hull Shader,, and Domain Shader) Dataflow Crossbar 9

Problems With DX10 + Tess. (1) Data expansion in TS can lead to excessive buffering requirements after Have to drain 0 before 1 to maintain API ordering TS 0 TS n 0 n 10

Problems With DX10 + Tess. (2) limited prim rate TS 0 TS 1 TS n Already a problem for CPU-generated geometry Tessellation generates even more primitives 0 1 n / 0 m 11

Crossbar Primitive Distributor Work Distribution Crossbar Crossbar Task Distributor Work Distribution Crossbar Fermi GF100 Logical Pipeline 12 Hull Shader Dataflow Domain Shader (includes Edge, izer, Zcull )

Crossbar Primitive Distributor Work Distribution Crossbar Crossbar Task Distributor Work Distribution Crossbar Fermi GF100 Logical Pipeline 13 Dataflow Polymorph (includes,, and )

Fermi GF100 Tessellation Features 1 Primitive Distributor Crossbar Task Distributor Task ~= Hull Shader output Control points plus tessellation factor Pre-expansion Distributes tasks for subsequent geometry processing Expand patch into primitives Optional Reduces buffering requirements Task Distributor Work Distribution Crossbar Crossbar Work Distribution Crossbar 14

Fermi GF100 Tessellation Features 2 Parallel Includes clipping and culling Eliminate serialization point 15 Crossbar Primitive Distributor Work Distribution Crossbar Crossbar Task Distributor Work Distribution Crossbar

Fermi GF100 Tessellation Features 3 Primitive Distributor Crossbar Parallel ization Including Edge, ization, and Zcull Improved primitive throughput Multiple primitives per clock Screen mapped for load balancing Task Distributor Work Distribution Crossbar Work Distribution Crossbar Crossbar 16

Screen Mapped ization 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 1 2 3 0 1 2 3 0 1 2 3 A 2 3 0 1 2 3 0 1 2 B3 0 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 izer 0 izer 1 izer 2 izer 3 Each block is a tile of pixels izers uniquely own pixel tiles Note: small primitives can overlap multiple tiles 17

Challenge: Maintaining API Order Primitive Distributor Crossbar Parallelization is easy Maintaining API ordering is a challenge Work Distribution Crossbar (WDX) Similar to Pomegranate [Eldridge et al. 2000] Task Distributor Work Distribution Crossbar Work Distribution Crossbar Crossbar 18

Work Distribution Crossbar OWDX SWDX Uses primitive bounding box to determine which rasterizers get which primitives reconstructs API order izers uniquely own pixels No subsequent sorting for ordering required OWDX SWDX OWDX Work Distribution Crossbar SWDX OWDX SWDX OWDX SWDX 19

Handling Load Imbalance Load imbalance is a concern for all region-based architectures because they are non adaptive Static imbalance Different screen regions get differing amounts of work Mitigated by relatively small pixel tiles Dynamic imbalance Lots of tiny primitives in same screen region at the same time Mitigated by buffering We can only buffer so much 20

Results 21

M tris/sec Historical Triangle Rate 2000 1750 GF100 1500 1250 1000 750 500 250 0 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Chip Release 22

Triangles per clock Results Synthetic Performance Test 3 2.5 Z + per-vertex color Z-only 2 1.5 1 0.5 0 0 2 4 6 8 10 12 14 16 Triangle Size (pixels) 23

Results Island Demo Frame average of 1.94 triangles per clock 24

Results Unigine Heaven 2.34 triangles per clock in highly tessellated areas 25

Demo Time 26

Summary Tessellation Increases geometric realism without burdening the CPU Fermi GF100 is designed for tessellation Eliminates serialization points Dramatically improved primitive throughput Next-generation visual fidelity 27

Acknowledgements Steve Molnar Ziyad Hakura Andreas Dietrich (demo help) Entire Fermi GF100 team 28

Questions? 29