Best practices for effective OpenGL programming. Dan Omachi OpenGL Development Engineer

Similar documents
Working with Metal Overview

COMP371 COMPUTER GRAPHICS

CS452/552; EE465/505. Image Processing Frame Buffer Objects

More frames per second. Alex Kan and Jean-François Roy GPU Software

CPSC 436D Video Game Programming

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Discussion 3. PPM loading Texture rendering in OpenGL

Programming shaders & GPUs Christian Miller CS Fall 2011

CS 432 Interactive Computer Graphics

Copyright Khronos Group 2012 Page 1. Teaching GL. Dave Shreiner Director, Graphics and GPU Computing, ARM 1 December 2012

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

The Transition from RenderMan to the OpenGL Shading Language (GLSL)

OpenGL Performances and Flexibility. Visual Computing Laboratory ISTI CNR, Italy

OpenGL ES for iphone Games. Erik M. Buck

Introduction to Computer Graphics with WebGL

Profiling and Debugging Games on Mobile Platforms

Computergraphics Exercise 15/ Shading & Texturing

CS 432 Interactive Computer Graphics

CS475/CS675 - Computer Graphics. OpenGL Drawing

CSE 167: Introduction to Computer Graphics Lecture #7: GLSL. Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016

Starting out with OpenGL ES 3.0. Jon Kirkham, Senior Software Engineer, ARM

Computação Gráfica. Computer Graphics Engenharia Informática (11569) 3º ano, 2º semestre. Chap. 4 Windows and Viewports

Shaders. Slide credit to Prof. Zwicker

CSE 167. Discussion 03 ft. Glynn 10/16/2017

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

CS 432 Interactive Computer Graphics

Programmable Graphics Hardware

Information Coding / Computer Graphics, ISY, LiTH. OpenGL! ! where it fits!! what it contains!! how you work with it 11(40)

CS 450: COMPUTER GRAPHICS REVIEW: STATE, ATTRIBUTES, AND OBJECTS SPRING 2015 DR. MICHAEL J. REALE

Computer Graphics Seminar

Kenneth Dyke Sr. Engineer, Graphics and Compute Architecture

Shader Programs. Lecture 30 Subsections 2.8.2, Robb T. Koether. Hampden-Sydney College. Wed, Nov 16, 2011

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

WebGL. Creating Interactive Content with WebGL. Media #WWDC14. Session 509 Dean Jackson and Brady Eidson WebKit Engineers

Windowing System on a 3D Pipeline. February 2005

Overview. By end of the week:

Comp 410/510 Computer Graphics Spring Programming with OpenGL Part 2: First Program

Introduction to the OpenGL Shading Language (GLSL)

Rendering Objects. Need to transform all geometry then

Sign up for crits! Announcments

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Today. Rendering - III. Outline. Texturing: The 10,000m View. Texture Coordinates. Specifying Texture Coordinates in GL

CS770/870 Spring 2017 Open GL Shader Language GLSL

CS770/870 Spring 2017 Open GL Shader Language GLSL

Shadow Rendering. CS7GV3 Real-time Rendering

The Basic Computer Graphics Pipeline, OpenGL-style. Introduction to the OpenGL Shading Language (GLSL)

Graphics Programming. Computer Graphics, VT 2016 Lecture 2, Chapter 2. Fredrik Nysjö Centre for Image analysis Uppsala University

Geometry Shaders. And how to use them

Shader Programming. Daniel Wesslén, Stefan Seipel, Examples

Dave Shreiner, ARM March 2009

Today s Agenda. Basic design of a graphics system. Introduction to OpenGL

GPU Programming EE Final Examination

GeForce3 OpenGL Performance. John Spitzer

OpenGL pipeline Evolution and OpenGL Shading Language (GLSL) Part 2/3 Vertex and Fragment Shaders

12.2 Programmable Graphics Hardware

Information Coding / Computer Graphics, ISY, LiTH GLSL. OpenGL Shading Language. Language with syntax similar to C

WebGL: Hands On. DevCon5 NYC Kenneth Russell Software Engineer, Google, Inc. Chair, WebGL Working Group

Introductory Seminar

CSE 167: Lecture #8: GLSL. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Levy: Constraint Texture Mapping, SIGGRAPH, CS 148, Summer 2012 Introduction to Computer Graphics and Imaging Justin Solomon

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Introduction to Shaders for Visualization. The Basic Computer Graphics Pipeline

Working With Metal Advanced

OpenGL Performances and Flexibility

Shaders. Introduction. OpenGL Grows via Extensions. OpenGL Extensions. OpenGL 2.0 Added Shaders. Shaders Enable Many New Effects

LSU EE Homework 5 Solution Due: 27 November 2017

Mali & OpenGL ES 3.0. Dave Shreiner Jon Kirkham ARM. Game Developers Conference 27 March 2013

Programmable Graphics Hardware

The Application Stage. The Game Loop, Resource Management and Renderer Design

CSE 4431/ M Advanced Topics in 3D Computer Graphics. TA: Margarita Vinnikov

OpenGL ES 3.0 and Beyond How To Deliver Desktop Graphics on Mobile Platforms. Chris Kirkpatrick, Jon Kennedy

API Background. Prof. George Wolberg Dept. of Computer Science City College of New York

Fog example. Fog is atmospheric effect. Better realism, helps determine distances

The Graphics Pipeline

CS4621/5621 Fall Basics of OpenGL/GLSL Textures Basics

CS 543 Lecture 1 (Part 3) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

the gamedesigninitiative at cornell university Lecture 6 Scene Graphs

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Supplement to Lecture 22

Texturing. Slides done bytomas Akenine-Möller and Ulf Assarsson Department of Computer Engineering Chalmers University of Technology

INTRODUCTION TO OPENGL PIPELINE

GLSL Overview: Creating a Program

Introduction to the OpenGL Shading Language

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

EECS 487: Interactive Computer Graphics

PowerVR Series5. Architecture Guide for Developers

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Building Models. Prof. George Wolberg Dept. of Computer Science City College of New York

Achieving High-performance Graphics on Mobile With the Vulkan API

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

CS452/552; EE465/505. Image Formation

Applying Textures. Lecture 27. Robb T. Koether. Hampden-Sydney College. Fri, Nov 3, 2017

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

An Overview GLUT GLSL GLEW

CT5510: Computer Graphics. Texture Mapping

Vulkan and Animation 3/13/ &height=285&playerId=

Information Coding / Computer Graphics, ISY, LiTH GLSL. OpenGL Shading Language. Language with syntax similar to C

Texture Mapping. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science

Transcription:

Best practices for effective OpenGL programming Dan Omachi OpenGL Development Engineer 2

What Is OpenGL? 3

OpenGL is a software interface to graphics hardware - OpenGL Specification 4

GPU accelerates rendering Faster rendering = better image quality Drawing efficiently allows drawing more More models, more vertices, more pixels, longer shaders 5

Using OpenGL efficiently OpenGL under the hood State Validation: Translates GL calls into GPU commands A CPU intensive operation Fundamental OpenGL techniques Avoiding CPU bottlenecks Efficient GPU access Apply equally to ios 4 and Mac OS X 6

A low level API Direct access to GPU capability High level enough to drive many GPUs Lots of work to translate to GPU 7

Designed to be a low level API A state machine Maps to the GPU pipeline GPU Pipeline Context State 8

The OpenGL software stack The OpenGL Software Stack When a state change call is made Context state updated 9

The OpenGL software stack The OpenGL Software Stack API calls translated into GPU commands Work deferred until draw call Translation stage: CPU intensive operation 10

The OpenGL software stack The OpenGL Software Stack GPU commands are inserted into a command buffer If the buffer fills up OR glflush called 11

The OpenGL software stack The OpenGL Software Stack Command buffer is flushed to GPU 12

The OpenGL software stack The OpenGL Software Stack Textures and VBOs needed are loaded into GPU Flushing also a CPU intensive process 13

The OpenGL software stack GPU processes the command buffer Starts pipeline by fetching vertices The OpenGL Software Stack GPU Command Buffer 14

The GPU pipeline execution The OpenGL Software Stack Sends the data down the GPU Pipeline 15

The GPU pipeline execution Many potential bottlenecks on the GPU Common bottleneck is the CPU 16

CPU/GPU parallelism Key point: GPU another processor Parallel to the CPU 17

CPU/GPU parallelism Don t let GPU wait for CPU Don t waste CPU for graphics Offload the graphics work to the GPU 18

CPU/GPU parallelism Don t let the CPU wait for the GPU Anytime your app needs data from the GPU, it can stall the CPU Readpixels, queries, fences 19

GPU command translation (state validation) For each draw call Context state and Draw call translated to GPU Commands Called state validation CPU intensive operation State Validation 20

State validation cost Draw calls look CPU intensive in a Shark profile Most of cost result of state setting i.e. State setting cost won t show up in the state set call Shows up in subsequent draw call 21

Summary of techniques Use OpenGL s objects Manage rendering state Sort state Batching state Reduce setting state by combining object 22

Objects and state validation Pre-validated state cached in objects Not validated at draw time When bound, easily translated to GPU commands Set up objects when app/level/document loads Don t wait until it s in the middle of your real-time run loop 23

Fixed function: Shader in disguise Vertex and fragment pipe is entirely programmable No fixed function vertex or fragment pipe on GPU Shader objects created internally to emulate fixed function void main (void) { } Fixed Function Shader Generation vec4 eye = gl_modelviewmatrix * gl_vertex; gl_position = gl_projectionmatrix * eye; float u = nomalize(vec3(eyeposition)); float r = reflect(u, gl_normal); float m = 2.0 * sqrt(r.x * r.x + r.y + r.y + (r.z + 1.0) * (r.z + 1.0)); gl_texcoord[0] = vec2(r.x / m + 0.5, r.y / m + 0.5); gl_fogfragcoord = eye.z; float ndotvp = max(0.0, dot(gl_normal, gl_lightsource[0].position)); gl_color = incolor * ndotvp; 24

Program objects and pipeline setup Program objects: most efficient way to setup pipeline Specify shader code, compile, link into a program object Program objects most efficient way to setup pipeline Specify shader code, compile, link into a program object 25

Vertex shader Shader executed for each vertex Inputs are per-vertex attributes Specified outside the shader 0.4 0.5 0.6 0.7 0.8 Two types of outputs One position in clip space gl_position One or many varyings Color, normals, texture coordinates Values interpolated across rasterized triangles 26

OpenGL Objects Vertex Shader Example Vertex shader example varying vec2 vartexcoord; void main (void) { gl_position = gl_modelviewprojectionmatrix * gl_vertex; vartexcoord = gl_multitexcoord0.st; } 27

Fixed function built-in shader variables Avoid fixed function built-in uniforms, attributes, or varyings in shaders OpenGL needs to perform mapping Not forward compatible Don t exist on OpenGL ES 2.0 Coding without them is the future of OpenGL gl_modelviewprojectionmatrix gl_modelviewmatrix gl_projectionmatrix gl_modelviewprojectionmatrixinverse gl_point gl_lightsource[] gl_lightmodelparameters gl_fog gl_vertex gl_color gl_secondarycolor gl_normal gl_fogcoord gl_multitexcoord0-7 28

Coding shaders with generics attribute vec4 inposition; attribute vec2 intexcoord; uniform mat4 modelviewprojectionmatrix; varying vec2 vartexcoord; void main (void) { gl_position = modelviewprojectionmatrix * inposition; vartexcoord = intexcoord; } 29

Fragment shader Runs once per pixel produced by each polygon Can render effects not possible with the fixed function pipeline varying vec3 normal; void main (void) { float edgeness = dot(vec3(0,0,1), normal); vec4 color = vec4(0, 0, 1, 1); // Blue if (abs(edgeness) < 0.45) color = vec4(0, 0, 0, 1); // Black } gl_fragcolor = color; 30

Client side vertex arrays glvertexattribpointer points to data to feed vertex shader GLfloat* positiondata = (GLfloat*) malloc(positiondatasize); // Load vertex position data into positiondata memory... glenablevertexattrib(positionattributeindex); glvertexattribpointer(positionattributeindex, 3, GL_FLOAT, GL_FALSE, 12, positiondata); 31

Overhead of client-side vertex arrays CPU cycles required to copy vertex data OpenGL may copy the data to the command buffer Command buffer fills quicker Expensive flushes more frequent Client Vertex Data 32

Vertex buffer objects Store vertex data in VBOs OpenGL caches this in the GPU s VRAM Draw command references data on GPU Vertex Buffer Data 33

Vertex buffer objects Allocate VBO store with glbufferdata glgenbuffers(1, &vboname); glbindbuffer(gl_array_buffer, vboname); glbufferdata(gl_array_buffer, positiondatasize, positiondata, GL_STATIC_DRAW); glenablevertexattrib(positionattributeindex); glvertexattribpointer(positionattributeindex, 3, GL_FLOAT, GL_FALSE, 12, 0); // <- data offset in VBO 34

Dynamic vertex data You may want to modify vertex data for animations If data small enough, use multiple VBOs If data generated on-the-fly use glbuffersubdata or glmapbuffer glbufferdata(gl_array_buffer, positiondatasize, positiondata, GL_DYNAMIC_DRAW);... // Modify vertex data to update... glbuffersubdata(gl_array_buffer, updateoffset, updatesize, updatedata); 35

Dynamic vertex data and syncing Changing vertex data of a VBO can force GPU to sync with CPU CPU waits for draw to finish before update Can happen with glbuffersubdata and glmapbuffer Don t load vertex buffer while it s read by GPU Use double buffering technique 36

Double buffering VBOs // On odd frames... glbindbuffer(gl_array_buffer, oddbuffer); glbuffersubdata(gl_array_buffer, offset, size, data); gldrawarrays(gl_triangles, 0, numberofverticies);... // On even frames... glbindbuffer(gl_array_buffer, evenbuffer); glbuffersubdata(gl_array_buffer, offset, size, data); gldrawarrays(gl_triangles, 0, numberofverticies); 37

Vertex array layout Not only does glvertexattribpointer indicate where vertices live Also specifies vertex layout Attrib0 type = FLOAT size = 3 stride = 16 offset = 0 Attrib1 type = BYTE size = 4 stride = 16 offset = 12 38

Vertex array objects Allows OpenGL to cache vertex layout Attrib0 type = FLOAT size = 3 stride = 16 offset = 0 Attrib1 type = BYTE size = 4 stride = 16 offset = 12 39

Coding with vertex array objects glgenvertexarrays(1, &vaoname); glbindvertexarray(vaoname); glenablevertexattrib(positionattributeindex); glvertexattribpointer(positionattributeindex, 3, GL_FLOAT, GL_FALSE, 16, 0); glenablevertexattrib(colorattributeindex); glvertexattribpointer(colorattributeindex, 4, GL_BYTE, GL_TRUE, 16, 12);... glbindvertexarray(vaoname); gldrawarrays(gl_triangles, 0, numberofverticies); 40

Framebuffer objects and renderable textures With EAGL and OpenGL ES you must always use an FBO Attaching textures to FBOs enables some interesting effects Reflections, Refractions, and Shadows With EAGL and OpenGL ES you must always use an FBO Attaching textures to FBOs enables some interesting effects Reflections, refractions, and shadows 41

Framebuffer objects and renderable textures With EAGL and OpenGL ES you must always use an FBO Attaching textures to FBOs enables some interesting effects Reflections, glgentextures(1, Refractions, and &texname); Shadows glbindtexture(gl_texture_2d, texname); glteximage2d(gl_texture_2d, 0, GL_RGBA, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, imagedata); glgenframebuffers(1, &fboname); glbindframebuffer(gl_framebuffer, fboname); // Later, when texture is no longer bound... glframebuffertexture2d(gl_framebuffer, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, texname, 0); gldrawarrays(gl_triangles, 0, numverticies); 42

Object mutability An object s state can be modified after its initial setup Avoid this Forces OpenGL to re-validate object next time it s used glgentextures(1, &texname); glbindtexture(gl_texture_2d, texname); glteximage2d(gl_texture_2d, 0, GL_RGBA, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, imagedata); gltexparameteri(gl_texture_2d, GL_TEXTURE_MIN_FILTER, GL_LINEAR);... // After the run-loop has begun glbindtexture(gl_texture_2d, texname); gltexparameteri(gl_texture_2d, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR) 43

Lazy validation Objects aren t actually pre-validated when created Objects validated when first used to draw Shader objects can t be compiled until they are used to draw Compiler needs to know other context state set before it can compile May need FBO, VAO, textures bound, and the blend state used Textures and VBOs won t get cached in VRAM until they are first drawn 44

Pre-warming objects Lazy validation may cause hiccups during run-loop Avoid validation during run-loop by pre-warming objects Bind the object and draw it Use state and other objects it s used with Do this before app s run-loop begins Only consider using this if your app experiences hiccups for every program in our scene bind the program for every VAO used with that program bind the VAO for every texture used with that VAO and program bind the texture for every blend state used with this program Set the blend state Draw 45

Object size Know how much memory your objects take All current graphics resources need to fit in memory Some devices have limited VRAM Use compressed textures Fit texture to size of model rendered Fit entire frame s resources in to VRAM If possible, fit entire level s/scene s textures into VRAM 46

Cost of object binding Some objects more expensive Determine cost through profiling Batch draw calls to reduce binding more expensive objects 47

Object Culling Visibility Visibility OpenGL processes everything sent to it Even if ultimately not visible Cull non-visible objects and don t send to OpenGL 48

Object Culling Visibility Visibility 49

Visibility lists Insert objects into a visibility list 50

0 Render state and visibility Don t draw objects in same order they were determined visible Sort by render state Textures, programs, vertex arrays 1 2 3...... n 51

State trees 52

State trees void main (void) { vec4 eye = gl_modelviewmatrix * gl_vertex; gl_position = gl_projectionmatrix * eye; float u = nomalize(vec3(eyeposition)); float r = reflect(u, gl_normal); } void main (void) { gl_texcoord[0] = vec2(r.x / m + 0.5, r.y / m + 0.5); float ndotvp = max(0.0, dot(gl_normal, gl_lightsource[0].position)); gl_color = incolor * ndotvp; } 53

Combined draw calls Reduce CPU overhead made by draw calls by making less draw calls Combine many renderable objects into a single draw call Texture atlas Combine multiple textures into a single texture image Instancing Combine vertex array data into a single VBO 54

Bind overhead 55

Texture atlases 56

Quest texture atlas 57

Reasons to multithread Makes sense to move some of the work to another thread Second thread amortizes CPU cost across other cores Makes sense on ios 4 devices too CPU intensive calls can block Won t be able to handle UI event, audio, etc. Watchdog kills apps blocking main thread too long 58

Object creation on secondary threads Simple multithreading technique Second thread loads vertex and texture data and compiles shaders Kill second thread and context when loading done OpenGL will take locks if shared context live Increases CPU overhead Can block threads 59

Rendering on secondary thread Main thread produces data Produces data dependent on app logic Position, animation frame, visibility No OpenGL context When producer done with data Signal render thread to begin Render thread only consumes data Has only OpenGL context Calls OpenGL with produced data Main thread can process audio, input, other app logic in parallel To balance some app logic can move to render thread 60

The GPU and the Display CVDisplayLink on MacOS Synchronizing with the display You can only render as fast as the display can refresh Running as fast as possible wastes power 61

The GPU and the Display CVDisplayLink on MacOS CADisplayLink with ios 4 On ios 4, use CADisplayLink to initiate per-frame rendering NSTimer arbitrary when fired with respect to display Causes more latency in rendering Can reduce frame rate [CADisplayLink displaylinkwithtarget:self selector:@selector(renderframe:)]; 62

The GPU and the Display CVDisplayLink on Mac OS X On Mac, use CVDisplayLink Can control app loop Don t need to render faster than display can refresh Looping more than needed wastes power CVDisplayLinkSetOutputCallback(displayLink, &MyRenderFrameCallback, self); 63

Mac and iphone OS Portability Coding for both platforms OpenGL ES a subset of OpenGL Coding with OpenGL ES 2.0 allows porting to Mac easier Things to be aware of Memory and performance constraints on ios devices Different compressed formats Slightly different function names Sample code cross compiles for both 64

Minimize OpenGL s CPU overhead Efficiently access the GPU Validation during draw where CPU overhead occurs OpenGL objects cache validation Minimize state changes and draw calls to reduce validation 65

Allan Schaffer Graphics and Game Technologies Evangelist aschaffer@apple.com Documentation OpenGL Dev Center http://developer.apple.com/opengl Apple Developer Forums http://devforums.apple.com Session Specific Information http://developer.apple.com/wwdc/sessions/details/?id=414 66

OpenGL ES Overview for the iphone OpenGL ES Shading and Advanced Rendering OpenGL ES Tuning and Optimization OpenGL for Mac OS X Taking advantage of Multiple GPUs Presidio Wednesday 2:00PM Presidio Wednesday 3:15PM Presidio Wednesday 4:30PM Nob Hill Thursday 9:00AM Nob Hill Thursday 10:15AM 67

OpenGL ES Lab OpenGL for Mac OS X Lab Graphics and Media Lab A Thursday 9:00AM Graphics and Media Lab C Thursday 2:00PM 68

69

70