Michał Radziszewski
Why modern versions of OpenGL should be used Some useful API commands and extensions Timer Query EXT Direct State Access (DSA) Geometry Programs Position in pipeline Rendering wireframe over solid in one pass
Tesselation Programs Tesselation control and tesselation evaluation Position in pipeline PN-triangles Atomic Counters, Image Load/Store Order Independent Transparency (OIT) Subroutine Uniforms
Mark Segal, Kurt Akeley, The OpenGL Graphics System: A Specification (version 4.2), http://www.opengl.org/registry/ John Kessenich, Dave Baldwin, Randi Rost, The OpenGL Shading Language, Language Version: 4.2, http://www.opengl.org/registry/ Randi Rost, Bill Licea-Kane, OpenGL Shading Language, 3rd edition, Addison Wesley 2009
Richard S. Wright, Nicholas Haemel, Graham Sellers, OpenGL SuperBible, 5th edition, Addison- Wesley Professional 2010 Dave Shreiner, OpenGL Programming Guide: The Official Guide to Learning OpenGL, versions 3.0 and 3.1, 7th edition, Addison- Wesley Professional 2009
Tomas Akenine-Moller, Eric Haines, Naty Hoffman, Real-Time Rendering, 3rd edition, AK PETERS 2008, http:// www.realtimerendering.com GPU Gems 1-3, http://developer.nvidia.com/obje ct/gpu_gems_home.html
The development process of game is likely to take a few years Consider using the newest technology available when development starts Required GPUs probably will be popular when the game is ready to ship Today, the only real choice is either OpenGL or DirectX, in versions 4.2 and 11, respectively
The supported features and performance of newest versions of these libraries are almost identical Choice of one particular library can be based on its availablity on target platforms and coding style Many game engines support both APIs
Platform independent open standard Available on PC Windows/Linux, MAC, PS3 OpenGL ES on mobile devices, e.g. with Android Develped by SGI during early 90 Currently supported by Khronos group Hardware vendors can provide extensions Procedural programming style Most commonly used with C or C++
Only Direct3D (part of DirectX) is competing with OpenGL Microsoft proprietary API Officially available only on Windows operating system and Xbox consoles Newest versions (from 10.0 onward) unavailable on Windows XP (!) Extensions are unavailable (theoretically) Object oriented programming style
Very useful, yet rarely used feature Available since OpenGL 3.3 The way to calculate the amount of time taken by rendering commands on GPU Measuring time in application is unreliable Application timers measure time of sending commands to GPU, not executing them! Adding glfinish() calls disturbs cooperation between CPU and GPU
Start using TimerQuery since very beginning of development process Check the cost of any new feature added The time taken by all rendering commands should not be larger than, say, 1/60 sec (16 msec) If the rendering time is checked frequently, there is no risk that game won t achieve desired frame rate due to GPU limits
Allows manipulation of object state (textures, programs, etc.) without binding them Bindings are necessary only for draw calls Much less API commands executed Much cleaner code One of most useful extensions ever Purely software (driver) feature, no new hardware necessary
Programs operating on primitives (triangles, triangle fans/strips, lines, points, ) Geometry programs can read at once data from all vertexes of processed primitive This is impossible using vertex shader alone Geometry programs can add and/or remove vertexes, and they can also change primitive type
They can direct rendering output to a few texture layers at once For example, cubic shadow maps can be rendered with just one draw call Availablity As OpenGL extension since November 2006 In DirectX since verison 10.0 (January 2007) In OpenGL core since version 3.2 (August 2009)
Geometry programs are executed between vertex and fragment programs Vertex program outputs are passed to geometry program inputs unchanged Between geometry programs and fragment programs there is a fixed function processing step primitive rasterization and data interpolation It is identical as without geometry program
Fragment program receives input data from geometry program in the same way as it would receive data from vertex program Therefore fragment program can be written in the same way regardless if it cooperates with geometry program or not Fragment program receives its input data interpolated between vertexes The interpolation depends only on interpolation mode
Without geometry programs two passes Double the time required for vertex processing Depth bias necessary Just one pass using geometry programs Geometry program produces triangles Extra output attribute (vec3) necessary per vertex The geometry program write (0, 0, 1), (0, 1, 0) and (1, 0, 0) for three vertexes, order is unimportant
Before fragment program, the fixed function step interpolates the attribute For each component x, y and z in one vertex there is value 1.0, and 0.0 in remaining two For each component, 1.0 is in different vertex Two zeros mark the edge, its width can be constant in screen space (computed using derivatives) Merge three edges, one for each component wireframe is ready
Transform feedback Rendering cube shadow maps in one pass Rendering object silhouettes Volumetric shadows with geometry programs are much faster DO NOT use these programs to substantially increase the amount of rendered geometry It works, but is extremely inefficient Leave this for tesselation programs
Tesselation programs operate on patch primitives (GL_PATCHES) They control conversion of patches into triangles or lines, consumed by subsequent pipeline stages Tesselation allows efficient creation of large amounts of geometry on GPU Tesselation has been designed for such purpose it is much more efficient than geometry programs
Availability In DirectX since version 11.0 (October 2009) In OpenGL core since version 4.0 (March 2010)
Tesselation is executed after vertex program and before geometry program (if exist) or fragment program otherwise If tesselation is active, vertex program actually operates on patches control points, not on triangles vertexes There are two tesselation programs, separated by partially controllable fixed function processing step
Geometry program (or fragment program) receives input data from tesselation step in the same way as it would receive data if application rendered triangles, without tesselation Geometry and fragment programs can be written in the same way regardless of the tesselation is active or not
Immediately after vertex program tesselation control program is executed It operates on patches Specifies how many triangles should be generated for a given patch (edge tesselation and centre tesselation) Fixed function step performs tesselation Generates triangles with undefined vertex attributes these are to be computed later
Finally tesselation evaluation program is executed It should evaluate attributes of triangles vertexes It should write to gl_position variable if geometry program is not present Tesselation evaluation program has access to all atributes of patches control points These attributes are inaccesible in further processing steps
Algorithm for smoothing mesh using only vertexes positions and normals No new assets necessary A. Vlachos et al., Curved PN Triangles, Interactive 3D graphics 2001 The new surface is continuous, but not perfectly smooth This is not a problem in practice Carefully evaluated normals hide this flaw
Common tesselation applications Smoothing meshes, then adding detail with displacement maps Dynamic level of detail control Tesselation of object silhouettes adding triangles when they are most useful Rendering terrain with very few triangles, more geometry is generated only in places where height varies substantially
Atomic counter buffers Behave just like any other buffers Variable ingpu progrm uniform atomic_uint foo; Functions for accessing atomic counters uint atomiccounterincrement(atomic_uint); uint atomiccounterdecrement(atomic_uint); uint atomiccounter(atomic_uint);
Images are similar as textures Except that GPU programs can read from and write to them with special API functions To avoid potential conflicts resulting from concurrency there is set of atomic operations on images Atomic counters can also be used
Rendering to two-texture (image) buffer First (screen-size) texture contains uints for each texel indexes for second texture Second texture (1D) contains linked lists (one per pixel) with color and opacity information Size must be large enough for all fragments (there is likely to be more fragments than pixels!) All instances of a fragment shader write to the list texture, likely in the same time atomic counter necessary to resolve conflicts
Allow choice of particular algoirthm used by GPU program from application No recompilation is necessary Just assignment to uniform variable is enough Alternative to so called uber-shaders or huge number of shaders for each combination of algorithms Similar to function pointers known from C/C++
What is necessary in GPU program Define a subroutine (like function prototype) Define more than one function with the matching return value and arguments Define subroutine uniform Call by the subroutine uniform name, passing required arguments Application decides, which implementation is actually called
What is necessary in application Query the GPU program for subroutine implementation indexes Set the subroutine uniform to index of the desired implementation Subroutine uniform values are not stored as GPU program state, they must be set each time a program is bound
Any questions?