GDC 2014 Barthold Lichtenbelt OpenGL ARB chair
Agenda OpenGL 4.4, news and updates - Barthold Lichtenbelt, NVIDIA Low Overhead Rendering with OpenGL - Cass Everitt, NVIDIA Copyright Khronos Group, 2010 - Page 2
OpenGL Ecosystem News Valve s VOGL OpenGL capture / playback debugger - OpenGL 3.3, OpenGL 4 in progress - Now on github! Valve s ToGL - Subset of Direct3D 9.0c -> OpenGL - API and DX bytecode - On github Siggraph course Introduction to OpenGL programming - Free on youtube OpenTK updated to OpenGL 4.4 - Low-level C# library that wraps OpenGL and more Copyright Khronos Group, 2010 - Page 3
OpenGL Ecosystem News Visualization OpenGL blend functions - By Anders Riggelsen What version of OpenGL was this in? - glisdeprecated Books - Red Book (OpenGL Programming Guide) - OpenGL Super Bible Siggraph 2013 G-Truc Creation - Continues to keep us honest - www.g-truc.net Copyright Khronos Group, 2010 - Page 4
OpenGL version support www.g-truc.net Copyright Khronos Group, 2010 - Page 5
OpenGL 4.4 reference pages Huge thanks to Graham Sellers!!! Copyright Khronos Group, 2010 - Page 6
OpenGL Conformance Test Suite released! Conformance submissions are required for GL 4.4 implementations encouraged for earlier driver versions Shared codebase with OpenGL ES 3.0 CTS additional desktop-specific tests Core profile functionality Enhancements underway to add more coverage Copyright Khronos Group, 2010 - Page 7
Announcing OpenGL ES 3.1 compatibility ARB_ES3_1_compatibility specification - In the making Adds features missing in OpenGL - New function MemoryBarrierByRegion() - Raise minimum SSBO size to 128 MB - Support for GLSL ES version 310 - ImageAtomicExchange() - Extend mix() to int, uint and bool - gl_helperinvocation - gl_maxsamples - Adds several gl_max*imageuniforms builtins - gl_maxcombinedshaderoutputresources Only OpenGL 4.4 compatibility profile is a true superset 4.4 Copyright Khronos Group, 2010 - Page 8
Rapid OpenGL Innovation Bringing state-of-theart functionality to cross-platform graphics OpenGL 4.2 OpenGL 4.3 OpenGL 4.4 OpenGL 3.2 OpenGL 4.1 OpenGL 3.3/4.0 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 OpenGL 3.1 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 DirectX 9.0c DirectX 10.0 DirectX 10.1 DirectX 11 DirectX 11.1 Copyright Khronos Group, 2010 - Page 9
What is new in OpenGL 4.4? ARB_buffer_storage - Immutable storage for buffer objects - Explicit control over buffer placement; vidmem vs sysmem and cache behavior - Allows a mapped buffer to be used by the GPU ARB_enhanced_layouts (GLSL) - Allows compile-time constants in qualifiers - More control for placing shader interface variables - Pack vectors more efficiently with scalar types - More control of variable layout inside uniform blocks and shader storage blocks - In shader control of transform feedback variables. ARB_query_buffer_object - Allows a buffer object to be target of a query - Avoids CPU getting involved, no pipeline stall 4.4 Copyright Khronos Group, 2010 - Page 10
What is new in OpenGL 4.4? ARB_clear_texture - Clear texture values to a specific value ARB_texture_mirror_clamp_to_edge - allows the texture to be mirrored in the negative s, t, and r directions. ARB_texture_stencil8 - Create and sample stencil only textures ARB_vertex_type_10f_11f_11f_rev - Packs 3 components into 32 bit value ARB_multi_bind - One call to perform multiple bindings - Reduces driver CPU overhead 4.4 Copyright Khronos Group, 2010 - Page 11
New ARB only extensions ARB_bindless_texture - Allow referencing textures by handle in a shader ARB_sparse_texture - Support texture sizes beyond physical memory - Choose which parts of a texture are resident ARB_seamless_cubemap_per_texture - Control the seamless switch for cubemaps per texture ARB_indirect_parameters - count parameter of a multi-draw-indirect call can now come from a buffer object 4.4 Copyright Khronos Group, 2010 - Page 12
New ARB only extensions ARB_compute_variable_group_size - Allow compute shader dispatch to set size of the workgroup ARB_shader_draw_parameters - gl_baseinstance, gl_basevertex and gl_drawid as new GLSL builtins ARB_shader_group_vote - compute the composite of a set of boolean conditions across a group of shader invocations 4.4 Copyright Khronos Group, 2010 - Page 13
ARB_Buffer_Storage Immutable storage for buffer objects void BufferStorage(enum target, sizeiptr size, const void * data, bitfield flags); DYNAMIC_STORAGE_BIT If not set, allocation will be GPU accessible MAP_READ/WRITE_BIT Controls CPU caching policies MAP_COHERENT_BIT - Shared access by client and server will be coherent (*) MAP_PERSISTENT_BIT - Can use buffer while mapped CLIENT_STORAGE_BIT - This is a hint. Memory location will favor client access If you access a buffer without the right bit set, Bad Things will happen. (*) but read spec carefully! Copyright Khronos Group, 2010 - Page 14
Enhanced Layouts in GLSL Shader based Transform Feedback Layout - Specify buffers, strides, offsets - No TransformFeedbackVaryings() command needed layout (xfb_buffer = 0, xfb_stride = 32) out b { layout (xfb_offset = 0) vec2 a; // a goes to byte offset 0 of buffer 0 vec4 b; // b is not captured, no xfb_offset layout (xfb_offset = 16) vec4 c; // c goes to offset 16 of buffer 0 }; // there is a hole at bytes 8 through 15 Compile-Time constants, in any integer layout const int start = 6; layout(location = start + 2) int vec4 v; // Sets location to 8 Copyright Khronos Group, 2010 - Page 15
Enhanced Layouts in GLSL Explicit byte-offset layout of uniform blocks uniform layout(std140) Block { layout(offset = 0) vec4 batman; // gets byte offset 0 layout(offset = 64) vec4 robin; // gets byte offset 64 }; Locations on Input and Output blocks layout(location = 4) in block { vec4 batman; // gets location 4 vec4 robin; // gets location 5 layout(location = 7) vec4 joker; // gets location 7 vec4 riddler; // gets location 8 }; Copyright Khronos Group, 2010 - Page 16
Enhanced Layouts in GLSL Component-level slot utilization Old way // consume 5 slots in vec3 batman[4]; in float robin; New way // consume X/Y/Z components of 4 slots layout(location = 0, component = 0) in vec3 batman[4]; // consumes W component of first slot layout(location = 0, component = 3) in float robin; Copyright Khronos Group, 2010 - Page 17
Bindless Textures Problem statement - Binding to different texture objects takes validation time in driver - Applications are limited to small palette of bound textures Traditional OpenGL - GPU memory reads are indirected through bindings - Limited number of texture units Solution : Exposes textures as handles - Let shaders access textures directly 4.4 Copyright Khronos Group, 2010 - Page 18
Bindless Textures Increase number of unique textures available to shaders at run-time More different materials and richer texture detail in a scene Shader code texture #0 texture #1 texture #2 texture #16 Shader code Existing texture binding model bindless textures over 1 million unique textures Copyright Khronos Group, 2010 - Page 19
Bindless Textures Apropos for ray-tracing and advanced rendering where textures cannot be bound in advance Shader code Copyright Khronos Group, 2010 - Page 20
Bindless Textures Existing texture binding model CPU Load texture A Load texture B Load texture C Bind texture A to slot I Bind texture B to slot J Draw() GPU Read from texture at slot I Read from texture at slot J CPU Bind texture C to slot K Draw() GPU Read from texture at slot K bindless textures CPU Load textures A, B, C Draw() GPU Read from texture A Read from texture B Read from texture C Bindless model reduces CPU overhead and improves GPU access efficiency Copyright Khronos Group, 2010 - Page 21
Statistics OpenGL.org (whole site) page views per month? 12M OpenGL.org forum page views per month? 2M+ OpenGL wiki page views per month? 2M+ OpenGL 4.4 reference card downloads per month? 7400 Copyright Khronos Group, 2010 - Page 22
Summary OpenGL community is thriving Implementers hard at work to update drivers Conformance tests a reality Low overhead rendering and other key features added to OpenGL 4.4 Copyright Khronos Group, 2010 - Page 23