Hardware-driven Visibility Culling Jeong Hyun Kim

Similar documents
Hardware-driven visibility culling

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

PowerVR Hardware. Architecture Overview for Developers

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Rendering Objects. Need to transform all geometry then

Performance OpenGL Programming (for whatever reason)

CS4620/5620: Lecture 14 Pipeline

GeForce4. John Montrym Henry Moreton

Drawing Fast The Graphics Pipeline

CS535 Fall Department of Computer Science Purdue University

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Rasterization Overview

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

The Application Stage. The Game Loop, Resource Management and Renderer Design

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Software Occlusion Culling

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Visibility and Occlusion Culling

Models and Architectures

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Introduction to Computer Graphics with WebGL

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

CS 498 VR. Lecture 18-4/4/18. go.illinois.edu/vrlect18

Performance Analysis and Culling Algorithms

Direct Rendering of Trimmed NURBS Surfaces

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

The Graphics Pipeline

LOD and Occlusion Christian Miller CS Fall 2011

CSE 167: Lecture #4: Vertex Transformation. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Lecture 2. Shaders, GLSL and GPGPU

GPU Memory Model Overview

Wed, October 12, 2011

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

1.2.3 The Graphics Hardware Pipeline

CS451Real-time Rendering Pipeline

Spring 2009 Prof. Hyesoon Kim

Pipeline Operations. CS 4620 Lecture 14

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Visibility: Z Buffering

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

GeForce3 OpenGL Performance. John Spitzer

CSE 167: Introduction to Computer Graphics Lecture #4: Vertex Transformation

Triangle Rasterization

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Hidden surface removal. Computer Graphics

Drawing Fast The Graphics Pipeline

SUMMARY. CS380: Introduction to Computer Graphics Ray tracing Chapter 20. Min H. Kim KAIST School of Computing 18/05/29. Modeling

Point Sample Rendering

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Models and Architectures. Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico

Scanline Rendering 2 1/42

The Rasterization Pipeline

Chapter 1 Introduction

CS427 Multicore Architecture and Parallel Computing

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

PowerVR Series5. Architecture Guide for Developers

Could you make the XNA functions yourself?

The Graphics Pipeline

Course Recap + 3D Graphics on Mobile GPUs

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Drawing Fast The Graphics Pipeline

Scheduling the Graphics Pipeline on a GPU

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

POWERVR MBX. Technology Overview

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

The Graphics Pipeline

OUTLINE. Learn the basic design of a graphics system Introduce pipeline architecture Examine software components for a graphics system

Spring 2009 Prof. Hyesoon Kim

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Mobile HW and Bandwidth

Lecture 25: Board Notes: Threads and GPUs

Spring 2011 Prof. Hyesoon Kim

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Graphics Processing Unit Architecture (GPU Arch)

VISIBILITY & CULLING. Don t draw what you can t see. Thomas Larsson, Afshin Ameri DVA338, Spring 2018, MDH

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Computer graphics 2: Graduate seminar in computational aesthetics

Computer Graphics. Shadows

Vertex Shader Design I

Rendering approaches. 1.image-oriented. 2.object-oriented. foreach pixel... 3D rendering pipeline. foreach object...

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

Deferred Rendering Due: Wednesday November 15 at 10pm

CSE 167: Introduction to Computer Graphics Lecture #5: Projection. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2017

Mattan Erez. The University of Texas at Austin

Graphics Hardware. Instructor Stephen J. Guy

Computergrafik. Matthias Zwicker. Herbst 2010

Transcription:

Hardware-driven Visibility Culling Jeong Hyun Kim KAIST (Korea Advanced Institute of Science and Technology)

Contents Introduction Background Clipping Culling Z-max (Z-min) Filter Programmable culling unit Proposed Idea Conclusion 2

Introduction The goal of 3D accelerator hardware is Real Time Rendering of Photorealistic Scene. It needs more processing time, more memory bandwidth. To reduce processing time GPU To reduce memory BW R&D of faster memory expand possible BW Culling Schemes reduced asked BW 3

Introduction Software-driven Use CPU Cons -- CPU is slow! Pros -- Reduce data before GPU processing 4 Hardware-driven Means GPU culling. Cannot reduce burden of Geometry processing Only reduce burden of Fragment processing But, Fragment processing is Bottleneck~! Memory Access Ex) R580(ATI, X1900) has 8 VS and 48 PS.

Background_clipping Clipping What does clipping mean? (literally) In fact, clipping is not culling. In Hardware Implementation, clipping unit does view-frustum culling. 5

Background_clipping Clipping Vertex Shader Primitive Assembly Clipping Rasterizer Reject or Pass or Clip&pass Many triangles are rejected A very simple primitive culling method. 6

Background_fragment culling Fragment culling If a pixel is occluded something, we don t have to process that pixel. Use depth info. in Frame Buffer? Consistency problem. Write FB is end of processing. There is many pixels in processing. Rasterizer Early depth test processing Real depth test & write to FB 7 Read Z Frame Buffer Write Z

Background_fragment culling We have to use another information Hierarchical Z-buffer Z-max algorithm Have another z-cache, HZ-cache. It needs much memory size. Increase another memory BW. High rejection ratio. 8 Depth Filter Z-max algorithm Rough Culling Need much smaller cache than HZ-buffer

Background_HZ-buffer Hierarchical Z-buffer A value of level N takes the max value of four pixels of N-1 level. 5 4 2 1 5 4 6 4 8 8 max 9 There is cache structure issue.

Background_depth filter Depth Filter HZB needs another big size cache. Depth is 24bit for each pixel. Hierarchical map needs several maps for each level Instead of that, all that depth filter needs are only 1 or 2 bits per pixel. Filter mask. Rough culling. 10

Background_depth filter Depth filter separates view volume. Pixels front of DF(Blue area) are passed to next and make high the filter mask. Pixels back of DF(Red area) are checked by filter mask. If mask is high, the pixel is rejected. If mask is low, the pixel is accepted. No change in filter mask. Still low. Near Plane Depth Filter Far Plane 1 bit per pixel 11

Background_depth filter 2 bits per pixel We can separate view volume into 4 spaces. higher rejection ratio. Near Plane Depth Filter0 Depth Filter1 Depth Filter2 Far Plane 12

Background_depth filter Adaptive modification of location of DF is possible. Maximize rejection ratio. When the # of pixels in front of DF and the # of survived pixels(back of DF and passed) are same. Decrease memory BW proportionally to depth complexity. DF method can be applied to primitive culling. 13

Background_PCU Programmable culling unit The PCU does tile based culling. The fundamental difference compared to Hierarchical Depth Culling is.. HDC does fixed function computations. PCU bases its decision on the output from a shader program execution. 14

Background_PCU Cull program executes KIL instruction fast. PCU executes per-tile computation. A tile can be killed only if all of fragments in the tile can be killed. Use interval computation. 15

Background_PCU Program Compilation and Separation This PCU must be ease to use. The programmer writes a combined program. It is up to the driver or compiler to separate the cull and fragment program. Use Dead Code Elimination [Cytron 1991] For fragment program Mark color outputs, depth outputs, all KIL statements For cull program All CUL and KIL statements Remove all code not contributing to the result 16

Background_PCU Higher Level Culling PCU does culling in per-tile, per-triangle basis. If triangles are very small. (even smaller than a pixel) Than, per-tile check may be meaningless overhead. Delay stream like unit [Aila 2003] This unit receives triangles in the same order. This unit groups triangles until the group will be larger than a tile. If a group grows enough, execute cull program. 17

Background_PCU Implementation of PCU PCU is combined with shader unit. Reuse existing hardware. PCU Mainly consists of extra control logic before and after the ALU. Role: value rerouting, detecting and handling special cases. Additional texture unit. Min & Max units to assemble the result. 18

19

20 Background_PCU

Proposed Idea Clipping unit is fixed functional culling unit. Every Graphics hardware implements Clipping unit. Apply exist Clipping unit to more culling. If we use Clipping unit, it can be the simplest culling scheme. 21

Proposed Idea Closer to Clipping unit. If we use normalization system, VS outputs normalized vertices to view volume. If the depth of input vertex is over 1, that means out of Far Plane. We are using homogeneous coordinate system. (w,y,z,w) 22

Proposed Idea When checking Z == w means the vertex is on the Far Plane. We can simply modify the Far Plane by multiplying a value to W. Lower W closer Far Plane. How about make closer the far plane at the value of Z-min of last frame? We can obtain additional culling effect. 23

Proposed Idea Result 24

Proposed Idea At Clipping Unit. Far plane Z-min point of last frame New Far plane Near plane Near plane 25

Proposed Idea There are many issues to solve. It will show improvement of performance only specified situations. It s enough. 26

Conclusion Because Clipping Unit is implemented in graphics hardware and it handles all triangles, I tried to use this hardware more. It can get additional primitive culling effect. It is need to try to cull more primitives or fragments. Programmable or fixed functional or mixed? Programmable is always good? PCU has not been implemented. 27