Rationale for Non-Programmable Additions to OpenGL 2.0

Similar documents
Graphics Processing Unit Architecture (GPU Arch)

E.Order of Operations

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

The Rasterization Pipeline

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Programming Graphics Hardware

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

OpenGL Status - November 2013 G-Truc Creation

Spring 2011 Prof. Hyesoon Kim

CS 354R: Computer Game Technology

Texture mapping. Computer Graphics CSE 167 Lecture 9

Spring 2009 Prof. Hyesoon Kim

Shaders (some slides taken from David M. course)

Rendering Objects. Need to transform all geometry then

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Introduction to Shaders.

Dave Shreiner, ARM March 2009

Hardware Displacement Mapping

1.2.3 The Graphics Hardware Pipeline

Could you make the XNA functions yourself?

Mattan Erez. The University of Texas at Austin

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CSE 167: Introduction to Computer Graphics Lecture #8: Textures. Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016

GeForce3 OpenGL Performance. John Spitzer

Computer Graphics. Bing-Yu Chen National Taiwan University

2.11 Particle Systems

The Application Stage. The Game Loop, Resource Management and Renderer Design

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

GPU Memory Model Overview

Scanline Rendering 2 1/42

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

A Proposal for OpenGL 1.4 Matt Craighead NVIDIA Corporation

Buffers. Angel and Shreiner: Interactive Computer Graphics 7E Addison-Wesley 2015

Course Recap + 3D Graphics on Mobile GPUs

Textures. Texture coordinates. Introduce one more component to geometry

Graphics Pipeline 2D Geometric Transformations

Working with Metal Overview

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Rasterization. CS 4620 Lecture Kavita Bala w/ prior instructor Steve Marschner. Cornell CS4620 Fall 2015 Lecture 16

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

GPGPU. Peter Laurens 1st-year PhD Student, NSC

A Trip Down The (2011) Rasterization Pipeline

CS427 Multicore Architecture and Parallel Computing

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

3D Programming. 3D Programming Concepts. Outline. 3D Concepts. 3D Concepts -- Coordinate Systems. 3D Concepts Displaying 3D Models

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

VGP352 Week March-2008

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

PowerVR Performance Recommendations. The Golden Rules

Deferred Rendering Due: Wednesday November 15 at 10pm

CS452/552; EE465/505. Clipping & Scan Conversion

CS4620/5620: Lecture 14 Pipeline

Introduction. What s New in This Edition

Soft Particles. Tristan Lorach

CS 179 GPU Programming

Rasterization. CS4620/5620: Lecture 12. Announcements. Turn in HW 1. PPA 1 out. Friday lecture. History of graphics PPA 1 in 4621.

CS 432 Interactive Computer Graphics

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Graphics Hardware. Instructor Stephen J. Guy

Rendering Grass with Instancing in DirectX* 10

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

CSE 167: Introduction to Computer Graphics Lecture #7: Textures. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

PowerVR Hardware. Architecture Overview for Developers

INF3320 Computer Graphics and Discrete Geometry

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Lecture 2. Shaders, GLSL and GPGPU

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation

CS770/870 Fall 2015 Advanced GLSL

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Rasterization. CS4620 Lecture 13

Pipeline and Rasterization. COMP770 Fall 2011

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Pipeline Operations. CS 4620 Lecture 14

CS 130 Final. Fall 2015

High-Quality Surface Splatting on Today s GPUs

Fast Third-Order Texture Filtering

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

CSE 167: Introduction to Computer Graphics Lecture #6: Lights. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2014

Project 1, 467. (Note: This is not a graphics class. It is ok if your rendering has some flaws, like those gaps in the teapot image above ;-)

Comparing Reyes and OpenGL on a Stream Architecture

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

GPU Memory Model. Adapted from:

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Line Drawing. Introduction to Computer Graphics Torsten Möller / Mike Phillips. Machiraju/Zhang/Möller

COMP Preliminaries Jan. 6, 2015

The graphics pipeline. Pipeline and Rasterization. Primitives. Pipeline


Why modern versions of OpenGL should be used Some useful API commands and extensions

CISC 3620 Lecture 7 Lighting and shading. Topics: Exam results Buffers Texture mapping intro Texture mapping basics WebGL texture mapping

NVIDIA nfinitefx Engine: Programmable Pixel Shaders

WebGL (Web Graphics Library) is the new standard for 3D graphics on the Web, designed for rendering 2D graphics and interactive 3D graphics.

Transcription:

Rationale for Non-Programmable Additions to OpenGL 2.0 NVIDIA Corporation March 23, 2004 This white paper provides a rationale for a set of functional additions to the 2.0 revision of the OpenGL graphics system specification. This white paper limits itself to nonprogrammable functionality. While quite important, programmable low-level assembly and shading language functionality is beyond the scope of this white paper. The functionality is listed in alphabetical order rather than by any subjective measure of their relative merit. ARB_point_sprite The ARB_point_sprite extension provides a means to render point primitives that include the assignment of varying texture coordinates over the point s rasterized pixels. This functionality has proven useful for particle systems, particularly in 3D games, and point-based rendering (splatting). Without the point sprite functionality, core OpenGL only provides a single uniform texture coordinate for all the fragments of a rasterized point. This makes it impossible to apply texture images to points. Instead, developers must simulate textured point sprites with textured quads that require 4 times the vertex transfer and transformation effort compared to point sprites. The point sprite functional is uncontroversial except the origin of the 2D s and t coordinates assigned to the point. The ARB_point_sprite extension provides an upperleft origin, consistent with Direct3D s point sprite functionality rather than a lower-left origin consistent with most other OpenGL coordinate systems. ARB_point_sprite s coordinate system convention has not proven an issue for OpenGL developers using the existing ARB_point_sprite functionality, but is inconsistent with other OpenGL lower-left origin conventions. For OpenGL 2.0, the origin discrepancy could be addressed by the addition of a global two-valued state variable (GL_LOWER_LEFT or GL_UPPER_LEFT) where the default (GL_UPPER_LEFT) would be made consistent with the existing ARB_point_sprite convention. Suggested usage: glpointparameteri(gl_point_sprite_coord_origin, originconvention); This is consistent with Intel s proposal. 1

ARB_texture_non_power_of_two The ARB_texture_non_power_of_two extension relaxes the power-of-two requirements for the sizes (width, height, and depth) of 1D, 2D, 3D, and cube map textures. Conventional texturing functionality such as mipmap filtering, the various wrap modes, normalized texture coordinate system ranges, compression, etc. just works with nonpower-of-two textures. This functionality allows developers to store textures in less space than would otherwise be required if a texture s base image was rounded up to the next power-of-two dimensions (say making a 640x480 texture into a 1024x512 texture). Developers can thereby reduce their texture storage requirements without large changes in texture image resolutions. Non-power-of-two textures are also important for imaging applications where resampling texture data is undesirable. Volume rendering is a key application domain that benefits from non-power-of-two textures because 3D volume data sets 1) do not have to be subjected to resampling, and 2) 3D volumes take up an unmanageable amount of space if rounded up to the next power-of-two dimensions. The ARB_texture_non_power_of_two extension uses a round down convention for mipmap levels. So a 13x11 mipmap level is followed by a 6x5 mipmap level. This is consistent with DirectX s convention (but adopting ARB_texture_non_power_of_two does not preclude other conventions in future extensions). While older hardware does not support non-power-of-two textures, future GPU designs certainly will. DirectX already provides for non-power-of-two textures and the hardware for non-power-of-two textures is sure to be common in the next year and ubiquitous by the following year. Making non-power-of-two textures part of the core of OpenGL will expedite this important hardware transition. ARB_texture_rectangle While not an approved extension, this specification is an update of the NV_texture_rectangle and EXT_texture_rectangle extension specification (shipping with current OpenGL implementations) to conform to the OpenGL 1.5 language. Though sometimes (inappropriately) thought of as a limited form of non-power-of-two texturing, the texture rectangle functionality is designed to be a texture image analogue to the pixel rectangle primitive (generated by gldrawpixels and glcopypixels) and the framebuffer. The texture rectangle matches the fundamental structure of the framebuffer and pixel rectangles: 1) multi-component data is stored on a 2D grid of arbitrary (nonpower-of-two) size, 2) addressing is by coordinates over a [0..width]x[0..height] range, and 3) lack mipmaps. Framebuffer data can be copied into a texture rectangle, be retained, and later resampled as a texture. Likewise, gldrawpixels image data can be retained and later resampled via a texture rectangle. The texture rectangle target supports only the zero mipmap level and provides no support for mipmap filtering. The repeat and mirror wrap modes are also not allowed. Most 2

importantly, the texture coordinate range is a [0..width]x[0..height] range rather than the normalized [0..1]x[0..1] range for conventional 2D textures. The texture rectangle has proven apropos and efficient in several increasingly important but unconventional OpenGL usage domains: 1) compositing window system support (e.g. Apple s OS X window system), 2) scientific data processing where the texture rectangle s [0..width]x[0..height] texture coordinate range maps well to array processing problems, and 3) image and video processing where image data must be loaded efficiently, lacks mipmaps, commonly of non-power-of-two size, and addressed with conventionally addressed with integer coordinates. Texture rectangles end up being more natural and more efficient than using conventional textures, even if non-power-of-two. The integer addressing is more natural for scientific computation and image processing tasks. Scaling texture coordinates by 1/17 and 1/13 when accessing a 17x13 texture is undesirable because of the additional multiply but also because 1/17 and 1/13 are not exactly representable values in floating-point while 17 and 13 are exactly representable. Additionally, to avoid sampling razor edges when nearest filtering an additional half-texel bias may be required if integer coordinates are to be downscaled to a normalized range. Applying these scale & bias operations are both unnatural and non-obvious to developers, require additional math operations unnecessary with texture rectangle, and mean shaders must know the size of images which is otherwise unnecessary with the texture rectangle functionality. Moreover, texture rectangles offer efficiency within the OpenGL driver too because functional limitations such as no mipmaps means mipmap consistency checks can be skipped. Additionally, because texture rectangles never have mipmaps, they can be arranged more intelligently in texture memory. Supporting ARB_texture_rectangle in hardware is straightforward and actually simplifies the texture coordinate addressing used for conventional textures (as long as no mipmaps or mirror/wrap modes are supported). Section 3.8.8 of the OpenGL 1.5 specification provides formulas for mapping s, t, and r into integer coordinates u, v, and w for texture sampling. These formulas are u(x,y)=2 n s(x,y), v(x,y)=2 m s(x,y), and w(x,y)=2 l s(x,y) where n, m, and l are log2 of the width, height, and depth of the given texture image. The texture rectangle functionality skips the effective multiply by the texture image s size dimensions in the above equations. Texture coordinates are simply directly mapped to texel locations. This simplicity is why hardware from 3 years ago can support the texture rectangle functionality. This simplicity is predicated on the lack of support for mipmap addressing. Work is on-going to specify how the OpenGL Shading Language (GLSL) should access texture rectangle targets. GLSL already reserved the sampler2drect keyword. And lowlevel fragment program extensions have provide the RECT texture target. ATI_texture_float and WGL_ATI_pixel_format_float The WGL_ATI_pixel_format_float and ATI_texture_float extensions add support for single-precision and half-precision floating-point frame buffer and texture formats. 3

Programmable shading now operates in floating-point values (rather than fixed-point values in earlier hardware) so it is crucial to provide floating-point texture formats. The floating-point frame buffer format allows full-precision values to be written into the frame buffer for use in subsequent passes, for applications like high dynamic range rendering and deferred shading. The floating-point texture formats allow applications to pass large data sets to programmable shaders, and use the contents of floating-point frame buffers via render texture. Two additional capabilities should be added: To facilitate the efficient loading of half-precision (fp16) textures, a half-precision data type for image loading (from the NV_half_float extension) should be added. The cases where [0,1] clamps in the existing non-programmable OpenGL pipeline are relaxed should be clearly documented. Clamping support should be provided where the new floating-point and pre-opengl 2.0 fixed point pipelines can coexist without breaking compatibility. As of DirectX 9, Direct3D provides floating-point texture formats. EXT_blend_equation_separate The EXT_blend_equation_separate extension completes the generalization of the blending state to be separately programmable for RGB and alpha components. The EXT_blend_func_separate functionality, now part of core OpenGL as of version 1.4, provided separate RGB and alpha blend function state, but provided only a single blend equation to apply both the RGB and alpha. As of DirectX 9, Direct3D provides support for a separate RGB and alpha blend equation. EXT_pixel_buffer_object The EXT_pixel_buffer_object extension provides efficient pixel data transfers in a manner entirely consistent with the ARB_vertex_buffer_object extension made part of the OpenGL core as of OpenGL 1.5. Pixel processing operations such as gldrawpixels, glreadpixels, and glteximage2d can be made more efficient by accessing memory from within a pixel buffer object, similar to a vertex buffer object. Pixel buffer objects, like vertex buffer objects, allows the driver to exploit high-bandwidth memory mapping such as provided by AGP memory or memory resident on the graphics card. Additionally, transfers can be made asynchronous to the OpenGL command stream. This is important and straightforward functionality for improving the bandwidth of OpenGL s pixel operations. Numerous graphics researchers and other developers have voiced complaints about the relative slowness of OpenGL pixel operations, particularly framebuffer pixel reads. 4

EXT_texture_mirror_clamp The EXT_texture_mirror_clamp extension completes the matrix of wrap modes started by the various clamp modes (GL_CLAMP, GL_CLAMP_TO_EDGE, and GL_CLAMP_TO_BORDER) and the mirrored mode (GL_MIRRORED_REPEAT). The ATI_texture_mirror_once initiated the combining of the clamp and mirrored modes by combining the mirrored mode with the clamp and clamp to edge modes. These modes are useful for symmetric textures such as spotlight patterns or a windowed sinc function where it is desirable to exploit the symmetry for reduced texture storage and improved texture access cache coherency. The EXT_texture_mirror_clamp extension adds one additional wrap mode combining mirrored and clamp to border. This is actually the most interesting of the mirror once modes because outside the texture coordinate [-1,+1] domain, texture accesses are clamped to 100% of the border color rather than the edge color or a combination of the border color and edge color. This is important when using mirror once mode with mipmapped textures where you don t want edge bleeding far outside the [-1,+1] range in extreme minification circumstances. SUN_slice_accum As it exists today in unextended OpenGL 1.5, the accumulation buffer, despite its extra precision, is not capable of alpha compositing operations. The SUN_slice_accum extension adds a new accumulation buffer mode (GL_SLICE_ACCUM_SUN) consistent with high-precision alpha compositing. Further discussion with Jack Middleton (the SUN_slice_accum author) and Brian Paul has led to a more general proposal called ARB_accum_composite that adds a few more compositing accumulation buffer operations. These additional operations provide over and under compositing operations assuming either pre-multiplied alpha or non-pre-multiplied alpha framebuffer images. The five ARB_accum_composite operations are GL_SLICE_ACCUM_ARB, GL_OVER_ARB, GL_UNDER_ARB, GL_PREMULT_OVER_ARB, and GL_PREMULT_UNDER_ARB. The functionality addresses the need of render preview applications (or any other application that requires compositing such as video editing) to easily merge multiple images based on alpha information with improved fidelity due to the high precision of the accumulation buffer and benefit from hardware acceleration. 5