TSBK03 Screen-Space Ambient Occlusion Joakim Gebart, Jimmy Liikala December 15, 2013 Contents 1 Abstract 1 2 History 2 2.1 Crysis method..................................... 2 3 Chosen method 2 3.1 Algorithm outline................................... 3 3.2 Blur.......................................... 3 3.3 Range check...................................... 3 3.4 Noise.......................................... 3 3.5 Depth buffer and per-fragment normal vectors.................... 4 4 SSAO into the depth 5 4.1 Retrieving a per fragment depth normal-map.................... 5 4.2 Reconstructing the depth............................... 5 4.3 Generate random points in hemisphere........................ 6 4.4 Computing the occlusion............................... 8 5 OpenGL 4 8 5.1 Tessellation...................................... 8 5.2 Wireframe using geometry shaders (GS)....................... 8 6 Result 8 1 Abstract Ambient occlusion methods are approximations of the shading that comes from ambient light being occluded by nearby geometry, this allows for a simulation of soft proximity of the middle frequency shadows. Ambient occlusion is used to scale the value of the ambient light in each point, which leads to that points that are less occluded will become brighter and occluded points will become darker because less light probably will hit the occluded spots. Screen-Space Ambient Occlusion (SSAO) is a collection of methods where the calculations are done in screen-space, i.e. for each fragment. Calculating the ambient occlusion for every point on every surface would not be feasible in real time, but reducing the number of points to only the rendered pixels enables us to use these methods using today s GPU hardware-technology. 1
2 History Offline calculations of ambient occlusion have been possible for quite some time by generating rays of light to trace, but the first implementation of SSAO in real time in a larger game project was achieved by Crytek [X] in their engine named CryEngine2, used for the games such as Crysis (year 2007). Today (2012) every state-of-the-art game engine has some form of ambient occlusion implementation. 2.1 Crysis method The algorithm used in Crysis samples a number of random points in a sphere centered around each rendered point. The occlusion factor is calculated from the number of those samples that are behind geometry, where the depth value in the sampled point is greater than the value of the depth buffer. Since all modern game engines already do additional post-processing steps where the depth data will be available for no extra render cost since it is already being used by other parts of the post-processing shaders. The occlusion factor is used to darken the occluded parts of the image in a post-processing stage. Because the random points are sampled from a sphere in the Crysis method there will in most cases be points that are inside geometry, which makes convex surfaces appear brighter than flat walls in addition to darkening concave surfaces. This effect is not photorealistic but it can be mitigated by choosing samples in other ways, but this is also something that gives the graphic a special touch/feeling that some people tend to enjoy. 3 Chosen method The method of choice for the computation of the ambient occlusion was chosen as a variant of the Crysis method, where the random samples are picked inside a normal-oriented hemisphere (see figure 1) on the rendered surface instead of a sphere as in the original implementation used by Crytek. This sampling makes flat surfaces have the same occlusion factor as convex surfaces while still darkening concave surfaces, resulting in a somewhat more realistic appearance. Figure 1: Sample points from inside a hemisphere. 2
One drawback of this method is that the surface normal needs to be computed per fragment, if not already available. However, since surface normals are usually needed for per pixel lighting or some other post-processing effects, they will most likely already be available for no extra cost in a real life situation when using a modern graphics engine. 3.1 Algorithm outline The algorithm is implemented as follows: For each fragment: Generate a number of random sample points in a hemisphere around the fragment. Project the sample points into screen space to find the matching value in the depth buffer. Compare depth buffer value against the depth value of the sample point, if the sample point depth value is greater than the depth buffer value the sample point is occluded, increment occlusion factor. The occlusion factor can then be used in a post-processing step to achieve ambient occlusion. 3.2 Blur In order to keep framerates interactive the number of random samples have to be kept to a minimum. The result of this is that the number of bits of numeric precision on the occlusion factor will be less than what is required for nice, smooth shading. To reduce the problem, a blur is applied to the occlusion factor buffer before combining the occlusion factor with the post-processing input image, which yields a smoother shading of the image. 3.3 Range check If the blur is applied to the occlusion factor buffer as a whole without any conditions the result would be that shadows bloom and can darken parts that are far behind or in front of the actual darkened corner. In order to eliminate this effect, a condition is added to the blur shader that computes the z-distance between two points in the occlusion factor buffer and only apply blur between points that are close to each other in the z-direction. 3.4 Noise It is difficult to efficiently generate random numbers in a shader program. Therefore, instead of performing expensive mathematical operations for each fragment, a texture containing random white noise is used as a source of random numbers. By using the fragment coordinate as an offset into the texture a random number can be generated that is not the same across all fragments. The fragment coordinate can also be multiplied with different prime factors on each random draw in the shader code to achieve a period that is longer than the size of the texture. The random numbers are used to build vectors that form the points in the sampling hemisphere. The repeating pattern of the random texture will cause visible repetition in the result. To further increase the period of the pattern, the sampling vectors are also rotated along the normal axis using a value from the random texture as well. 3
3.5 Depth buffer and per-fragment normal vectors The depth buffer must be rendered to a framebuffer object (FBO) before computing the occlusion factor, because of the coupling between the occlusion factor and nearby fragments depth value. The normals which in the three-dimensional world has has three components are saved in the red, green and blue channel of the FBO, while the depth values are stored in the alpha channel. One important thing regarding the depth is that the depth-buffer (or often referred to as the z-buffer) is not linear. The z-buffer has higher resolution (or more samples if you look at it that way) closer to the camera and resolution decreases the further away from the camera it goes. The linear depth is in this case obtained first by computing the per-vertex depth which is found by applying the Model-view matrix on each vertex (in the vertex-shader) and thereafter the depth is saved into the alpha channel in the fragment-shader, depth values in between vertex-points will be interpolated in the fragment shader. The depth values are normalized by dividing the depth by the distance between the near-plane and far-plane of the camera, this is done to simplify debugging, figure 2 shows an example of a normalized, linear depth buffer. Figure 2: Depth buffer with normalized values. One important thing regarding normals are to keep them in correct space, no translation or perspective division should be applied to the normals. The normal should stay perpendicular to the the surface. This is achieved by applying the inverse transpose of the upper left part of the Modelview matrix (also known as the Normal matrix) on the normals. 4
4 SSAO into the depth The Screen Space Ambient Occlusion is in this case (and in most cases) computed on the Graphic Processing Unit (GPU) using the OpenGL Shading Language (GLSL), which enables us to make realtime calculations at high speed. In this chapter the SSAO algorithm of choice is described in better detail. Screen Space Ambient occlusion is in this chapter explained and divided into the following steps: Retrieving a per fragment depth normal-map Reconstructing the depth Generate random points in hemisphere Computing the occlusion 4.1 Retrieving a per fragment depth normal-map The first thing that had to be found and stored for later use is the the depthmap from the cameras point of view, often referred as camera-space, view-space or eye-space. The depth were to be stored in a texture, at the same time also the normals were saved for later use. In this case the texture used is a RGBA-texture contained four channels Red, Green, Blue and Alpha. The normals which in the three-dimensional world has has three components were saved in the red, green and blue channel of the texture, while the depth were stored in the alpha channel. One important thing regarding the depth is that the depth-buffer (or often referred to as the z-buffer) is not linear. The z-buffer has higher resolution (or more samples if you look at it that way) closer to the camera and continues to decrease the further away from the camera it goes. The higher resolution closer to the camera is because it is more important that everything looks correct closer to the camera, or in clear words, it is more important that the z-buffer culling is correct close to the camera so we will minimize the risk to see artifact due to the culling. The linear depth is in this case obtained first by computing the per-vertex depth which is found by applying the Model-view matrix on each vertex (in the vertex-shader) and thereafter the depth is saved into the alpha channel in the fragment-shader, depth values in between vertex-points will be interpolated in the fragment shader. If normalizing the depth values by dividing the depth by the distance between the near-plane and far-plane of the camera values between zero and one will be obtained for the depth, which makes it rather simple to use the depth as output to the screen, just to confirm it is correct. In OpenGL the obtained depth values are also multiplied with minus one, because in OpenGL we look in the negative z-direction. In cases everything is done correctly the output should look something like in figure 2, or similar (depending on the scene). One important thing regarding normals are to keep them in correct space, no translation or perspective division should be applied to the normals. The normal should stay perpendicular to the the surface. This can be achieved by applying the inverse transpose of the upper left part of the Modelview-matrix figure 3 on the normals. 4.2 Reconstructing the depth To obtain the correct depth-value for each pixel a quad is used as input to the SSAO-shader. The quad is built up by four vertices, according to figure 4 which in the vertex-shader are stored as 5
Figure 3: llustration of the upper left part of the modelview-matrix (marked as M1) that should be used to create the normal matrix. Figure 4: Illustrates the quad and its positions of the vertex points that are to be used to access a pixels correct corresponding depth-value from the texture. texture-coordinates to obtain the correct depth value corresponding to each pixel. The screencoordinates in GLSL are always between -1.0 and 1.0 in the xy-plane, therefore the glposition that is stored in the vertex-shader has to be transformed into that range figure 5. At this stage the fragment-shader is now able interpolating the correct texture-coordinate for the correct corresponding pixel on the screen. In the fragment-shdaer step the depth is retrived from the depth-texture, using the obtained coordinates and thereafter adding up the the near-viewplane distance to obtain the exact correct depth. Adding up the distance to the near-viewplane is not absolutely necessary since that distance is the same to each pixel. 4.3 Generate random points in hemisphere Generating random -point in a hemispehere around each fragment-point can also be achived in many different ways, in this case a noisy-texture figure 6 is used to obtain randomized values. Each pixel of the texture containes three components (red, green blue), which are used as x,y and z-components to build the random-vector. Since the texture has values between 0 and 1, 6
Figure 5: Illustrates the transformed vertex points of the quad that are used as glposition in the vertex-shader in order to get the correct interpolated coordinates in the fragment-shader. Figure 6: The noisy-texture used to obtain randomized points inside a a hemisphere each value is multiplied by 2 and thereafter 1 is subracted to obtain values inbetween -1 and 1. Once the random-vector is obtained it will be a random point inside a sphere. In order to make it to be inside a hemisphere aligning the normal the random-vector has to be substracted by normal (normal randomv ector). At this point a random-point is obtained inside a hemisphere aligned to the normal, now a cross-product is made between the normal and the random-vector to obtain a new point inside the hemisphere which is choosen to be called v2.. Now three perpendicular vectors are obtained inside a hemisphere aligned to the normal which are the normal itself, the random vector and v2. These three vectors are used as a new basis in the next step, to easily generate several random positions inside the hemisphere. 7
4.4 Computing the occlusion 5 OpenGL 4 Another goal of the project was to utilize some new features of the OpenGL 4 pipeline. The most prominent such new features are the tessellation shader stages that can be used for on-the-fly geometry modifications. 5.1 Tessellation A new feature of OpenGL 4 and DirectX 11 is the new tessellation pipeline. The tessellation in OpenGL 4 is designed to do surface subdivisions in hardware in order to get a mesh with higher resolution, which can then be transformed in various ways; one straightforward example is the use of a displacement map to generate surface features such as for example the grooves and ridges between stones in a stone wall. An appreciated feature of the tessellation pipeline is that the level of subdivision can be controlled per primitive, therefore, it is possible to have a smooth level-of-detail transition based on distance from the eye. A user controlled implementation of the tessellation level has been implemented for purpose of testing the new tessellation features of the pipeline. The subdivision levels can be altered by pressing keys on the keyboard in the example implementation. 5.2 Wireframe using geometry shaders (GS) The use of geometry shaders should be minimized as much as possible, due to performance. In some cases a geometry shader may be the way to go, like in the case of building up a wireframe. In a geometry shaders access can to neighbouring primitives can be obtained, which can be useful for example in a case when building up a wireframe. In order to evaluate the tessellation result, it is useful to be able to view the wireframe of the resulting mesh. A geometry shader was developed which allows the user to display both the original mesh and the subdivided mesh, in separate colours, to distinguish them from each other. This is achieved by computing the distance from the fragment to the triangle edge, and the distance from the fragment to the nearest subdivided triangle edge. 6 Result Figure 7 shows an example of the final result of the SSAO shading. 8
Figure 7: Example of the SSAO shading 9