OPENGL BLUEPRINT RENDERING

Similar documents
E.Order of Operations

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

The Application Stage. The Game Loop, Resource Management and Renderer Design

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Pipeline Operations. CS 4620 Lecture 14

Rendering Objects. Need to transform all geometry then

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

PowerVR Series5. Architecture Guide for Developers

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Mali & OpenGL ES 3.0. Dave Shreiner Jon Kirkham ARM. Game Developers Conference 27 March 2013

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

The Graphics Pipeline

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

OpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015

CS452/552; EE465/505. Clipping & Scan Conversion

The NVIDIA GeForce 8800 GPU

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Optimisation. CS7GV3 Real-time Rendering

CS4620/5620: Lecture 14 Pipeline

Evolution of GPUs Chris Seitz

Portland State University ECE 588/688. Graphics Processors

LSU EE Homework 5 Solution Due: 27 November 2017

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Starting out with OpenGL ES 3.0. Jon Kirkham, Senior Software Engineer, ARM

Deferred Rendering Due: Wednesday November 15 at 10pm

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Rasterization Overview

1.2.3 The Graphics Hardware Pipeline

White Paper. Solid Wireframe. February 2007 WP _v01

Shaders (some slides taken from David M. course)

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Introduction to the OpenGL Shading Language

Programming Graphics Hardware

Introduction to Shaders for Visualization. The Basic Computer Graphics Pipeline

Adaptive Point Cloud Rendering

Real-Time Hair Rendering on the GPU NVIDIA

Efficient and Scalable Shading for Many Lights

Computergrafik. Matthias Zwicker. Herbst 2010

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Computer Graphics (CS 543) Lecture 10: Soft Shadows (Maps and Volumes), Normal and Bump Mapping

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS452/552; EE465/505. Image Processing Frame Buffer Objects

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

OPENGL AND GLSL. Computer Graphics

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

3D Rendering Pipeline

Pixels and Buffers. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science

CS770/870 Spring 2017 Open GL Shader Language GLSL

CS770/870 Spring 2017 Open GL Shader Language GLSL

CS427 Multicore Architecture and Parallel Computing

GeForce4. John Montrym Henry Moreton

POWERVR MBX. Technology Overview

Pipeline Operations. CS 4620 Lecture 10

Performance OpenGL Programming (for whatever reason)

Point-Based rendering on GPU hardware. Advanced Computer Graphics 2008

Shaders. Slide credit to Prof. Zwicker

Shadow Techniques. Sim Dietrich NVIDIA Corporation

Hardware-driven visibility culling

Beyond Programmable Shading Course ACM SIGGRAPH 2010 Bending the Graphics Pipeline

LOD and Occlusion Christian Miller CS Fall 2011

Volume Shadows Tutorial Nuclear / the Lab

Lecture 13: OpenGL Shading Language (GLSL)

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Lecture 2. Shaders, GLSL and GPGPU

Programmable Graphics Hardware

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Terrain rendering (part 1) Due: Monday, March 10, 10pm

Lecture 25: Board Notes: Threads and GPUs

Octree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

High-Quality Surface Splatting on Today s GPUs

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Terrain Rendering (Part 1) Due: Thursday November 30 at 10pm

Last Time. Why are Shadows Important? Today. Graphics Pipeline. Clipping. Rasterization. Why are Shadows Important?

Rendering Grass with Instancing in DirectX* 10

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside

GPU Programming EE Final Examination

Real-Time Shadows. Last Time? Today. Why are Shadows Important? Shadows as a Depth Cue. For Intuition about Scene Lighting

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Computer Graphics. - Rasterization - Philipp Slusallek

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

PowerVR Hardware. Architecture Overview for Developers

Robust Stencil Shadow Volumes. CEDEC 2001 Tokyo, Japan

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Computer Graphics Shadow Algorithms

The Traditional Graphics Pipeline

CSE 167: Introduction to Computer Graphics Lecture #5: Visibility, OpenGL

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

CS451Real-time Rendering Pipeline

Computer Graphics. Shadows

Flowmap Generator Reference

Shadow Rendering EDA101 Advanced Shading and Rendering

Real-Time Shadows. MIT EECS 6.837, Durand and Cutler

Transcription:

April 47, 2016 Silicon Valley OPENGL BLUEPRINT RENDERING Christoph Kubisch, 4/7/2016

MOTIVATION Blueprints / drawings in CAD/graph viewer applications Documents can contain many LINES and LINE_STRIPS Various line styles can be used (worldspace widths, stippling, joints, caps...) Potential CPU bottlenecks Generating geometry for complex styles Collecting and rendering geometry Model courtesy of PTC 2

MOTIVATION Not targeting full vector graphics NV_path_rendering covers high fidelity vector graphics rendering Perpixel quadratic Bézier evaluation Stencil & Cover pass to allow sophisticated blending Focus of this talk is rendering lines defined by traditional vertices Rendering data from OpenGL buffer objects Singlepass, but does mean not safe for blending (does selfoverlap) 3

DEMO: BASIC DEMONSTRATION 4

LINE RASTERIZATION Representation Standard: skewed rectangle pixel snapped lines Multisampling: aligned rectangle smooth lines Both suffer from visible gaps and overlaps on increasing line width 5

LINE RASTERIZATION Stippling Stippling only in screenspace Patterns must be expressable with 16 bits LINES restart pattern every segment LINE_STRIPS have continous distance 6

SHADERDRIVEN LINES TECH Appearance on screen Create TRIANGLES/QUADS for line segments Project extruded vertices to keep line width consistent Clip and color in fragment shader based on UV coordinates and line distance Geometry in world coordinates Shapes via fragment shader discard 7

SHADERDRIVEN LINES TECH FLEXIBILITY Create TRIANGLES for line segments, project extrusion to world/screen, discard fragments Arbitrary stippling patterns and line widths Joint and capstyles Different distance metrics New coloring/animation possibilities via shaders Thin center line as effect 8

SHADERDRIVEN LINES TECH FLEXIBILITY CAVEATS Create TRIANGLES for line segments, project extrusion to world/screen, discard fragments Arbitrary stippling patterns and line widths Joint and capstyles Different distance metrics New coloring/animation possibilities via shaders Cannot be as fast as basic line rasterization Not all data local at rendering time (line strip distances need extra calculation) Geometry still selfoverlaps 9

SHADERDRIVEN LINES Sample implementation/library C interface library to render different line primitives (LINES, LINE_STRIPS, ARCS) provided as flexible framework rather than blackbox Two different rendermodes: render as extruded triangles, or one pixel wide lines Uses NVIDIA and ARB OpenGL extensions if available 10

SHADERDRIVEN LINES Sample implementation/library Global style and stipple definitions Stipple from arbitrary bitpattern, or float values Style 0 Style 1... Style Definitions Stipple Patterns Pattern texture A Pattern texture B... typedef struct NVLStyleInfo_s { NVLSpaceType projectionspace; NVLJoinType join; NVLCapsType capsbegin; NVLCapsType capsend; float thickness; NVLStippleID stipplepattern; float stipplelength; float stippleoffsetbegin; float stippleoffsetend; NVLAnchorType stippleanchor; NVLboolean stippleclamp; } NVLStyleInfo; typedef enum NVLSpaceType_e { NVL_SPACE_SCREEN, NVL_SPACE_SCREENDIST3D, NVL_SPACE_CUSTOM, NVL_SPACE_CUSTOMDIST3D, NVL_NUM_SPACES, }NVLSpaceType; typedef enum NVLAnchorType_e { NVL_ANCHOR_BEGIN, NVL_ANCHOR_END, NVL_ANCHOR_BOTH, NVL_NUM_ANCHORS, }NVLAnchorType; typedef enum NVLCapsType_e { NVL_CAPS_NONE, NVL_CAPS_ROUND, NVL_CAPS_BOX, NVL_NUM_CAPS, }NVLCapsType; typedef enum NVLJoinType_e { NVL_JOIN_NONE, NVL_JOIN_ROUND, NVL_JOIN_MITER, NVL_NUM_JOINS, }NVLJoinType; 11

SHADERDRIVEN LINES Sample implementation/library Uses GPU friendly collection mechanism: Record many primitives then render Optionally render subsections Raw Primitives pass vertex data directly Geometry Primitives reference existing Vertex Buffers Collections have usagestyle flags: Geometry/Raw Recording Geometry Primitives VBO reference Raw Primitives Vertex values Matrix Color Style reference filled new perframe recorded once, reused many frames 12

SHADERDRIVEN LINES Quad extrusion Faster geometry creation by just using Vertex Shader, avoiding extra GeometryShader stage VS GS Render GL_QUADS (4 vertices each segment) Use gl_vertexid to fetch line points VertexBuffer texelfetch(...gl_vertexid/4 + 0 or 1) Use it for the offsets as well Using custom vertexfetch generally not recommended, but useful for special situations gl_vertexid % 4 + 0 gl_vertexid % 4 + 1 13

SHADERDRIVEN LINES Minimize Overdraw No naive rectangles but adjacency in LINE_STRIP is used to tighten the geometry Reduces overdraw and minimizes potential artifacts resulting from that 14

SHADERDRIVEN LINES Depth clamping Joints and caps exceed original line definition Can cause depthbuffer artifacts Prevent depth overshooting by passing closest depth to fragment shader and clamp there Can use ARB_conservative_depth or just min/max to keep hardware zcull active #extension GL_ARB_conservative_depth : require layout (depth_greater) out float gl_fragdepth; in flat float closestpointdepth;... gl_fragdepth = max(gl_fragcoord.z, closestpointdepth); 15

DISTANCE COMPUTATION LINE_STRIPS need dedicated calculation phase V 0 V 1 V 2 Read vertices and calculate distances along the strip VertexBuffer V V 3 4 Strip Length 0 1 2 3 Sections drawn indepedently Fetch vertices & distances D 0 D 1 D 1 D 2 DistanceBuffer D 0 [0,1] [0,1]+[1,2] [0,1]+[1,2]+[2,3] Distances are fetched at rendertime D 2 D 3 16

DISTANCE COMPUTATION Shader Tips One LINE_STRIP per thread can lead to under utilization and non ideal memory access due to divergence Strip Length Thread: 0... 3 3 2 4 8 SIMT hardware processes threads together in lockstep, common instruction pointer (masks out inactive threads). NVIDIA: 1 warp = 32 threads VertexBuffer Distance Accumulation Loop 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

DISTANCE COMPUTATION Shader Tips Compute one LINE_STRIP at a time across warp, gives nice memory fetch NV_shader_thread_shuffle to access neighbors and do prefixsum calculation vec3 posa = getposition ( gl_threadinwarpnv + ) vec3 posb = shuffleupnv (posa, 1, gl_warpsizenv);... Handle first thread point differently float dist = distance(posa, posb); Short strips may still underutilize warp, but are taking only one iteration Strip Length VertexBuffer Distance Accumulation Loop Access neighbor point via shuffleupnv and compute distance Thread: 0... 3 9 9 9 9 0 1 2 3 [0,0] [0,1] [1,2] [2,3]... Prefixsum over distances... 4 5 6 7 8 18

DISTANCE COMPUTATION Batching & Latency hiding Memory intensive operations prefer many threads to hide latency of fetch Would not compute distance for a single strip, but need many strips to work on Use one warp per strip if total amount of threads is low Warp 0 Warp 1 Warp 2 Warp 3 Fetch Wait For Memory Compute Effective Utilization Hardware switches activity between entire warps 19

DISTANCE COMPUTATION Batching & Latency hiding Launch overhead of compute dispatch not negligable for < 10 000 threads Use glenable(gl_rasterizer_discard); and VertexShader to do compute work No shared memory but warp data sharing as seen before (ARB_shader_ballot or NV_shader_thread_shuffle)... Compute alternative for few threads if (numthreads < FEW_THREADS){ gluseprogram( vs ); glenable ( GL_RASTERIZER_DISCARD ); gldrawarrays( GL_POINTS, 0, numthreads ); gldisable ( GL_RASTERIZER_DISCARD ); } else { gluseprogram( cs ); numgroups = (numthreads+groupsize1)/groupsize; gluniformi1 (0, numthreads); gldispatchcompute ( numgroups, 1, 1 ); }... Shader #if USE_COMPUTE layout (local_size_x=group_size) in; layout (location=0) uniform int numthreads; int threadid = int( gl_globalinvocationid.x ); #else int threadid = int( gl_vertexid ); #endif 20

SMOOTH TRANSITIONS Antialiasing edges within shader Fragment shader effects cause outlines of visible shapes to be within geometry No geometric edges No MSAA benefit MSAA will not add quality within triangle Need to compute coverage accurately (sampleshading) or approximate Use of gl_sampleid (e.g. with interpolateatsample) automatically makes shader run persample, discard will affect coverage mask properly Cheaper: GL_SAMPLE_ALPHA_TO_COVERAGE or clear bits in gl_samplemask in float stipplecoord;... sc = interpolateatsample (stipplecoord, gl_sampleid); stippleresult = computestippling( sc ); if (stippleresult < 0) discard; 21

SMOOTH TRANSITIONS Using Pixel Derivatives 1 1 Simple trick to get smooth transitions, also works well on surface contour lines 0 0 Use a signed distance field, instead of step function Find if sample is close to transition (zero crossing) via fwidth 0 1 fwidth ( signal ) smoothing zone around zero Compute smooth weight if required float weight = signal < 0? 1 : 1; float zone = fwidth ( signal ) * 0.5; if (abs (signal) < zone){ weight = signal / zone; } 1 signal within smoothing zone 22

RECORDING RAW DATA Using persistent mapped buffers When primitives & vertices are not reused, but regenerated by CPU, we want a fast way to get them to GPU CPU filling Buffers A Timeline GPU rendering Use ARB_buffer_storage/OpenGL 4.3 to have buffers in CPU memory for fast copying Need fences to avoid overwriting data still used by GPU, 3 frames typically enough to avoid synchronization Cycle sets of buffers each frame Buffers B Buffers C Wait Fence Buffers A Buffers A Signal Frame Fence Buffers B Buffers C CPU memory access okayish if data only read rarely (once for stipplecompute, once for render) Buffers B Buffers A Buffers B 23

RENDERING ARCS Not trivial to compute distance along an arbitrary projected arc/circle Approximate circle as line strip Allocate maximum subdivision Compute adaptively based on screenspace size (or frustum cull) Rendering only needs to fetch distance values, can still compute position on the fly 24

OUTLOOK & CONCLUSION Preserving all primitive order not optimal for performance, ideally application can operate in layers. Code your own special primitives for annotations (arrows...) Use of shaders can increase visual quality beyond fancy surface shading Do not need actual geometry for everything (distance fields are great) GPU programmable enough to move more effects from CPU to GPU 25

April 47, 2016 Silicon Valley THANK YOU JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join ckubisch@nvidia.com @pixeljetstream