Point-Based rendering on GPU hardware Advanced Computer Graphics 2008
Outline Why use the GPU? Splat rasterization Image-aligned squares Perspective correct rasterization Splat shading Flat shading Gouroud shading Deferred shading Anti-aliasing 2
Why use the GPU? Much faster (billions of splats per second) High data-parallelism Hardware support for point primitives CPU can do something else in the meantime 3
Splat rasterization Determine which pixels are covered by the projected splat 4
Splat rasterization Surfel layout in memory struct Surfel { float pos[3]; float color[4]; float uvec[3]; float vvec[3]; }; POSITION TEXCOORD0 TEXCOORD1 TEXCOORD2 Pass point attributes to shader in texture coordinates Draw array at once using gldrawarrays() Calling glvertex and gltexcoord billions of times would saturate the CPU Point data can be stored in GPU memory 5
Vertex program Point size determined by vertex program glenable(gl_vertex_program_point_size); Conservative approximation of screen-space size of splat Project ellipse radius to near plane Then multiply by viewport height to get pixel size s s 6
Vertex program void main(float4 pvec : POSITION, float4 color : TEXCOORD0, float3 uvec : TEXCOORD1, float3 vvec : TEXCOORD2, uniform float4x4 ModelViewProj, uniform float2 wsize, uniform float near, uniform float top, uniform float bottom, { out float4 pout out float psize out float4 cout ) : POSITION, : PSIZE, : TEXCOORD0 pout = mul(modelviewproj, pvec); float radius = sqrt(max(dot(uvec, uvec), dot(vvec, vvec))); psize = 2.0 * radius * (-near / peye.z) * (wsize.y / (top bottom)); } cout = float4(color.xyz, 1.0); 7
Image-aligned squares Simplest technique Just render GL_POINTS glenable(gl_vertex_program_point_size); glclientactivetexture(gl_texture0); gltexcoordpointer(4, GL_FLOAT, sizeof(surfel), &pts[0].color); glenableclientstate (GL_TEXTURE_COORD_ARRAY);... glvertexpointer(3, GL_FLOAT, sizeof(surfel), &pts[0].pos); glenableclientstate(gl_vertex_array); gldrawarrays(gl_points, 0, numpoints); glclientactivetexture(gl_texture0); gldisableclientstate(gl_texture_coord_array);... gldisableclientstate(gl_vertex_array); 8
Image-aligned squares Simplest technique Just render GL_POINTS glenable(gl_vertex_program_point_size); glclientactivetexture(gl_texture0); gltexcoordpointer(4, GL_FLOAT, sizeof(surfel), &pts[0].color); glenableclientstate (GL_TEXTURE_COORD_ARRAY);... glvertexpointer(3, GL_FLOAT, sizeof(surfel), &pts[0].pos); glenableclientstate(gl_vertex_array); gldrawarrays(gl_points, 0, numpoints); glclientactivetexture(gl_texture0); gldisableclientstate(gl_texture_coord_array);... gldisableclientstate(gl_vertex_array); 9
Image-aligned squares Simplest technique Just render GL_POINTS glenable(gl_vertex_program_point_size); glclientactivetexture(gl_texture0); gltexcoordpointer(4, GL_FLOAT, sizeof(surfel), &pts[0].color); glenableclientstate (GL_TEXTURE_COORD_ARRAY);... glvertexpointer(3, GL_FLOAT, sizeof(surfel), &pts[0].pos); glenableclientstate(gl_vertex_array); gldrawarrays(gl_points, 0, numpoints); glclientactivetexture(gl_texture0); gldisableclientstate(gl_texture_coord_array);... gldisableclientstate(gl_vertex_array); 10
Image-aligned squares Results Depth artifacts, no blending, blocky contours 11
Perspectively correct rasterization Render object-space elliptical splats With perspectively accurate kernels In fragment shader, raycast against primitive Solve... 12
Raycasting Solve Using Cramer's rule: Only depends on the current pixel, the rest can be precalulated in the vertex shader Discard fragment if 13
Z-Buffer Depth in Z-Buffer must be correct Otherwise, intersecting splats will be rendered incorrectly By default, a GL_POINT has constant depth Interpolate depth value as well in fragment shader DEPTH binding semantic is in clip-space 14
Fragment shader void main( { } float4 col : TEXCOORD0, float3 v1 : TEXCOORD1, float3 v2 : TEXCOORD2, float3 v3 : TEXCOORD3, float depthin : TEXCOORD4, float2 wpos : WPOS, uniform float2 unproj_scale, uniform float2 unproj_offset, uniform float near, uniform float zb_scale, uniform float zb_offset, uniform float epsilon, out float depthout : DEPTH, out float4 colorout : COLOR0 ) float3 q = float3(wpos*unproj_scale - unproj_offset, -near); float dn = dot(q, v3); float u = dot(q, v1)/dn; float v = dot(q, v2)/dn; float radius = u*u + v*v; if(radius > 1.0) discard; float qz = q.z * depthin/dn + epsilon; depthout = zb_scale/qz + zb_offset; colorout = col; 15
Splat shading Flat shading Simplest, lighting is applied per splat, without interpolation 16
Gouraud shading Interpolate shaded color over primitives By rendering splats as a Gaussian kernel and accumulating the result for each pixel colored point primitive ci alpha mask w(x,y) (2D Gaussian) splat primitive ci.w(x,y) Fragment shader computes gaussian from u and v Splat color is returned in rgb like before Weight returned in alpha component 17
Gouraud shading Start with visibility splatting pass Makes sure that only splats within a surface band are blended Render splats, apply an offset to the computed z value Enable z-write, disable color write 18
Gouraud shading Before rendering, set up alpha blending glenable(gl_blend); glblendfuncseparate(gl_src_alpha, GL_ONE, GL_ONE, GL_ONE); Intermediate render target needed with floating point precision 8 bit per component would quickly suffer from clipping and quantization artifacts 19
Frame buffer objects Frame Buffer Objects expose off-screen rendering functionality glgenframebuffersext(1, &fbo); glbindframebufferext(gl_framebuffer_ext, fbo); glframebuffertexture2dext( GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_TEXTURE_2D, color_tex, 0); glframebufferrenderbufferext( GL_FRAMEBUFFER_EXT, GL_DEPTH_ATTACHMENT_EXT, GL_RENDERBUFFER_EXT, depthbuffer); assert(glcheckframebufferstatusext( GL_FRAMEBUFFER_EXT) == GL_FRAMEBUFFER_COMPLETE_EXT); 20
Normalization pass Render intermediate texture as full-screen quad Perform normalization of accumulated values, so that weights sum to 1 float4 main( float2 coord : TEXCOORD0, uniform samplerrect input ): COLOR { float4 d = texrect(input, coord); return float4(d.rgb / d.a, 1.0); } Results in a division by zero if no splats overlap a pixel glclearcolor(0, 0, 0, 1e-6) 21
Gouraud shading Gouraud shading is certainly an improvement As lighting is calculated per vertex, result can be blurry Quality of shading depends on point density Only ambient lighting Specular highlight 22
Deferred shading Phong shading First, do visibility splatting pass Splat normal and other surfel attributes, interpolate over surface (in similar way as with color in Gouraud) For this, we need to render to multiple render targets For example: color (float3), normal (float3), depth (float) Shader outputs multiple colors: COLOR0, COLOR1,... Use frame buffer with multiple color attachments Final pass does normalization, deferred shading 23
Deferred shading Multiple render targets glbindframebufferext(gl_framebuffer_ext, fbo); GLenum bufs[2] = {GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT} gldrawbuffers(2, bufs);... (render) 24
Anti-aliasing Built-in FSAA of the GPU can be used Basically redirects rendering to higher-resolution frame-buffer, then down-samples Even this could suffer from aliasing, as it only moves the problem Approximate EWA filter Footprint is computed as maximum, instead of convolution, of projected reconstruction filter and screen-space prefilter A fragment is accepted if it lies in the union of the screen-space prefilter and projected splat New splatting kernel: 25
Anti-aliasing 26