CS179: GPU Programming Lecture 7: Render to Texture Lecture originally by Luke Durant, Russell McClellan, Tamas Szalay 1
Today: Render to Texture Render to texture in OpenGL Framebuffers and renderbuffers Multi-output rendering Image processing via RTT Reduction 2
How to in OpenGL Always had bad ways of doing it Very basic access to pixel data glreadpixels() Would need to get pixels and then set a texture using glteximage2d() Horribly horribly slow glreadpixels() is completely synchronous Same with glcopyteximage2d() 3
The better way We ve thus far kept textures and buffers separate Both are fundamentally the same thing: image data Textures generic read, buffers function-specific write: Front, back, depth, stencil There should be a way to make more 4
Framebuffer objects (FBOs) Framebuffer objects are groups of renderable buffers Can be associated with existing textures GLuint fbo; glgenframebuffers(1, &fbo); glbindframebuffer(gl_framebuffer, fbo); But what do we draw into? FBOs are just groupings of render targets Nothing to draw into yet 5
Renderbuffers Now we need to create render buffers to actually draw into (for example, a depth buffer) GLuint depthbuffer; glgenrenderbuffers(1, &depthbuffer); glbindrenderbuffer(gl_renderbuffer, depthbuffer); glrenderbufferstorage(gl_renderbuffer, GL_DEPTH_COMPONENT, width, height); And we need to attach it to the FBO (as a depth buffer) glframebufferrenderbuffer(gl_framebuffer, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, depthbuffer); 6
Adding textures Could add color buffers like the depth buffer above But we want to bind existing textures: GLuint img; // is a texture (with associated data!) glframebuffertexture2d(gl_framebuffer, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, img, 0); To check for errors (see docs for error codes) GLenum status = glcheckframebufferstatus(gl_framebuffer); 7
And to render One key step is changing viewport Recall: viewport is NDC -> pixel coordinate transform Texture size will likely differ from window size, so need a different viewport transform Do all this as follows: glbindframebuffer(gl_framebuffer, fbo); glpushattrib(gl_viewport_bit); glviewport(0,0,width, height); // RENDER HERE // Output goes to fbo objects glpopattrib(); glbindframebuffer(gl_framebuffer, 0); Rendered data will now be in textures! 8
Generating more outputs Might want to generate mipmaps after rendering glgeneratemipmap(gl_texture_2d); You may have noticed GL_COLOR_ATTACHMENT0 There can be multiple color buffers, up to 16ish You can specify which ones to draw to using gldrawbuffers GLenum buffers[] = {GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1}; gldrawbuffers(2, buffers); In GLSL: gl_fragcolor is an array gl_fragcolor[i] draws to buffers[i] in current FBO 9
To summarize You can think of all the buffers you have now as one FBO (it is, actually, with ID = 0) A new FBO is just another group of render buffers Render buffers can either be created with new storage or from an existing texture Those textures can then be rendered to 10
Uses of RTT Most things we have talked about (and some we haven t yet) hint at RTT Shadow maps Dynamic environment mapping Image processing General computation Deferred rendering A couple of examples: 11
Portal (environment mapping) 12
Reflection (environment mapping) 13
Deferred rendering 14 Example from Bill Clark s 2010 presentation
Distortion image filter 15
Heat shimmer 16
Image processing Say we want to compute a blur center pixel = average of surrounding pixels But we want to save it instead of displaying it Can we write back into itself? Basically, no source data needs to remain intact, or the result will be wrong Some pixels will be blurred with the result, not the source Solution: use two textures, A and B Read from A, write to B And if you want to iterate filters, go back and forth Called flipping or ping-ponging 17
Image processing Many common algorithms iterated One pixel at a time, but parallelizable Kernel-based algorithms very fast Read from small region near pixel Blur, erode, dilate, edge finding Most have been implemented on GPU How to find minimum or maximum pixels in an image? In fragment shader, want to just compare each fragment to some global value Except all globals are read-only Could always do glreadpixels() and find on CPU An example problem took about 1 second on my computer Not fast enough if you need to do this many times! 18
Ping-ponging reduction Best way: reduction shader Go from texture A -> B, B half the size of A in each dimension For each destination pixel (x, y) in B, set color to be min/max of pixels (2x, 2y), (2x+1, 2y), (2x, 2y+1), and (2x+1, 2y+1) Repeat until result is a 1x1 texture 19
Reduction shader Can read out final result using glreadpixels() The same example took about 10 ms on my computer 100x faster! Need to be extremely careful with texture coordinates, since FragCoords are in pixels but we read using s and t, which are floats FragCoords are also at centers of pixels, so bottom-left pixel is (0.5, 0.5) Can do larger steps than 2x->1x I found ~10x->1x to be most efficient 20
Buffer blending mode Recall the heat shimmer example So far, have used no blending replace current color in buffer with frag color (possibly with depth test) Many more options available: glblendfunc(src, dest); Blending can use some combination of source or destination color or alpha to weight input and output pixels But color clamping still occurs on output pixels! And they get converted to integer RGBA Don t want to do all our processing in range 0 1 21
Floating point buffers Floating point buffers are now supported glteximage2d(gl_texture_2d, 0, GL_RGBA32F, etc.. In documentation, you will often see this and others with ARB or EXT appended ARB means working extension, not mandatory, but will eventually become spec EXT means optional extension Also have vendor specific: NV_POINT_SPRITE, ATI_... glext.h resolves most of these issues, removes ARB if you support it, etc. 22
Color clamping So now have destination color blended and written in floating point, but still clamped Solution: glclampcolor glclampcolorarb(gl_clamp_vertex_color_arb, GL_FALSE); glclampcolorarb(gl_clamp_fragment_color_arb, GL_FALSE); glclampcolorarb(gl_clamp_read_color_arb, GL_FALSE); Now behaves as expected But wait! Are floating point texture accesses clamped? Thankfully, no 23
Performance Floating point textures are slower But there s no way around that, really New hardware will probably help this So avoid when possible or whenever you can store numbers as an integer range 24
Computation on the GPU Now know enough to do general computation Textures effectively represent large arrays Can read from multiple arrays using multiple texture bindings Can write to multiple arrays using multiple render targets Tricky part is breaking up the problem into small parallel pieces Much of the course will be focused on this 25