For example, could you make the XNA func8ons yourself?

For example, could you make the XNA func8ons yourself? For the second assignment you need to know about the en8re process of using the graphics hardware. You will use shaders which play a vital role in the process, and they interact with a lot of stages. 2

The goals of this lecture We will look at the en8re process of using graphics hardware. The most important part is how shaders interact in this process. What are shaders anyway? Shaders are small applica8ons running on the GPU. They are your most important way of influencing what happens on the GPU, because they give total control over their stage. We will also take a look at the basics of how to create a shader. 3

The CPU communicates with the GPU over the Bus (you learned that at Computerarchitectuur en Netwerken). (The newest CPU s don t use the normal bus anymore, it s too slow. Instead, a more direct connec8on is used.) Both have their own memory, and cannot access each other s memory. If the GPU wants to read something from the CPU, it has to be copied to GPU memory first. 4

The CPU can run any type of program. It is designed to be very flexible to deal with every applica8on. The GPU can only run specific programs which conform to the GPU s specific data flow. The instruc8ons are specialized for vector calcula8ons. This specialism is required to make the GPU massively parallel. The CPU has around 4 cores, whereas the GPU can have over 1000 cores. The specialism of the GPU has as consequence that it is useless by itself, it needs a smarter CPU to give him commands. 5

We will first talk about what is happening in on the CPU side, and on the GPU side. I will shortly show something about the actual GPU Hardware. Then we will discuss the communica8on between the GPU and CPU And finally I will explain the basics of shader programming. 6

So, what happens on the CPU side? Your Game tells XNA what to do. XNA uses DirectX to accomplish what you want. XNA is similar to DirectX, but it removes a lot of the hassle that is required to deal with DirectX. And XNA is very nicely object oriented. A disadvantage is that a few obscure features of DirectX are missing in XNA, but they are barely used anyway. DirectX is very important here. It is called a Graphics API and it forces all drivers to behave similarly (if they want to support DirectX). If it wasn t for graphics API s, you would have to make a lot of excep8ons for specific drivers in your code. (Actually, DirectX is more than just a graphics API. The graphics API part is called Direct3D.) OpenGL is another graphics API. The func8onality is sort of the same as DirectX, but it is structured differently. DirectX is slightly object oriented, but OpenGL is not object oriented at all. OpenGL is cross pla_orm while DirectX is Windows specific.. In the future you might interface with a graphics API directly, without the nice interface of XNA. And finally the graphics driver which actually send the data and instruc8ons over to the GPU. 8

That was easy, now lets get to the GPU side because the GPU architecture determines how we have to use it. 9

The GPU is structured as a pipeline. Years ago, games used the fixed func8on pipeline to render (before 2000). This is barely used anymore because the programmable pipeline gives much more control. 10

The BasicEffect you used in the first prac8cal assignment worked similar to a fixedfunc8on pipeline: you just set some secngs, and it all works! You had to set the View and Projec8on matrices which transform the 3D model to 2D. And to add ligh8ng, you just had to enable it, and specify the posi8on of the light source. Here you see that the matrices are applied first on the ver8ces. Now that the ver8ces are in 2D, the rasterizer determines which pixels are occupied by the triangle. For each pixel, the final color is determined. This can be done using the specified color of a vertex, or using textures. And the color can be influenced by a light source. Note that the ligh8ng can be calculated either per vertex, or per pixel. Per vertex is faster, but per pixel gives a befer result. In Basiceffect per pixel ligh8ng is used, but in the days that the fixed func8on pipeline was used per vertex ligh8ng was used. Note that this en8re pipeline is executed each frame. 30 60 8mes per second. 11

The vertex transforma8on stage is replaced by the vertex shader, and the pixel color determina8on stage is replaced by the Pixel Shader. Note that the Pixel Shader is some8mes called the Fragment shader. Fragment shader is the term used in OpenGL, whereas Pixel Shader is used by DirectX. There is one disadvantage: all the things the replaced stages already did, aren t done anymore! We have to program them manually in the shaders. The shaders have access to the GPU memory, so they can also access matrices and ligh8ng informa8on. But you can also pass different things to your shaders, for example other secngs to use in your shader. 12

The vertex shader is executed for each vertex separately. It cannot access other ver8ces. The vertex shader is the same for all ver8ces in one draw call. The most important task is to do the vertex transform from the Vertex Transforma8on stage it replaced. Wolfgang will say something about this in a later lecture. A vertex does not exist of only a posi8on. It can have many more afributes. These can be modified in the vertex shader, or new afributes can be added. For example, we can do ligh8ng in the vertex shader: We can read the light source informa8on from the GPU memory, and calculate the ligh8ng. That ligh8ng is stored in a vertex afribute. Storing something in the output vertex is the only way a vertex shader can output data. 13

The rasterizer executes for each triangle, and it s main func8on is to determine which pixels are occupied by that triangle. This stage is not programmable. First, the rasterizer checks if the triangle is visible at all. If it is only par8ally visible it is cut off, so only the visible part is remaining. Note that if you draw a large model (city, for example) while only a small part is visible, a lot of triangles are removed at this point. This means that the previous Vertex Shader calcula8ons were useless! That is a waste of resources, so it should be prevented that a lot of non visible geometry is drawn. This is called culling. Then the rasterizer checks which pixels are occupied by this triangle. Wolfgang will say something about this in a later lecture. The reason why a vertex has mul8ple afributes, is that these values can be accessed in the pixel shader. The values are interpolated over the triangle. You have already seen in the first assignment what happens if color is interpolated over a triangle. That results in a nice gradient, which is actually just the interpolated color. 14

The pixel shader gives the most freedom of the stages. It receives interpolated vertex data from the rasterizer, and only needs to output a color. Using data from the GPU memory, we are free to do whatever we want: using the texture coordinates we got from the ver8ces, we can lookup a color from a texture we can also calculate the ligh8ng at this point And a whole bunch of other techniques. 15

There are two addi8onal stages which concern the input and output of the pipeline. The input Assembler constructs the ver8ces, and determines the corresponding triangles for rasteriza8on. The index buffer together with the topology determine the way the rasterizer works. The Primi8veType is whether a triangle, line or point has to be drawn. The input assembler formats the vertex data in such a way that the Vertex Shader can access it. The output merger is the final stage in the pipeline, and is the only stage which outputs to GPU memory. It does z buffer tes8ng. This makes sure that triangles closer to the camera overlap triangles further away. Wolfgang will say something about this in a later lecture. The output merger can also blend the resul8ng color on top of a previous image. This is done with transparency. For transparent objects, both the background and the transparent geometry is visible. A transparent pixel can be blended on top of a previous image in this stage. Finally, the output merger writes to an output image, more specifically a render target. This render target can be the back buffer. As shortly explained in the first assignment, the back buffer is swapped with the front buffer at the end of a frame. The front buffer is shown on the display, while the back buffer is used to draw on. This is called double buffering. Instead of wri8ng to the back buffer, the output merger can also write to a normal image. That image can be used in a later draw call as input for the pixel shader. It is even possible to write to mul8ple render targets. 16

DirectX 10 has introduced new stages in the Graphics Pipeline. XNA uses DirectX 9, so you cannot use this. You don t need to remember this, it s just to show some of the cool recent developments. Remember that the Vertex Shader only works on isolated ver8ces, which cannot access any neighbors? That s where the Geometry Shader comes in, because it can access its neighbors. It can also create new ver8ces, and even store them in GPU memory for later use. This makes new graphics technique possible in real 8me. An example is fur genera8on. 17

DirectX 11 also introduced new stages in the Graphics Pipeline. These new stages can subdivide exis8ng triangles into more triangles, called tessella8on. This adds a lot dynamic detail to the world. With this technique it is not no8ceable that the world actually consists of triangles. 18

Now shortly about how the pipeline is reflected in the actual hardware. 19

This is an overview of the A8 Radeon HD 6900 series, currently one of the fastest graphics cards. In the red area, the smallest square (nearly a line) is one processing unit. There are approximately 1500 processing units in this GPU. You can clearly see the massive parrallelism here. Each processing unit is called a unified shader, and can execute either vertex shader, or pixel shader code. In the past, there were separate units for vertex shaders and pixel shaders. These more general unified shaders are befer because if the vertex shader is very short and the pixel shader is very long, the processing power can be divided accordingly. The non shader stages are not handled by these unified shaders, but by dedicated hardware. For example the Rasterizer and Vertex Assembler are shown on top. In the red area, it says SIMD Engines. SIMD means: Single Instruc8on, Mul8ple Data. That is exactly how the unified shaders are designed: To do exactly the same, but using different data. This is how the vertex/pixel shader works: the same shader is executed for each vertex/pixel. 20

Now back to focusing on the normal pipeline, without DirectX 10 or 11! What communica8on is there between the CPU and the GPU? 21

The things you can send to the GPU can be divided in three categories. The first is rough data, describing the models The second category is effects, which can be considered to be the actual shader code and the dynamic variables it has access to (for example, the projec8on matrix or the ligh8ng secngs). The third category is secngs for the non shader stages (Input Assembler, Rasterizer, Output Merger) 22

The crea8on and ac8va8on is separated in the GPU communica8on. As you might remember from one of the first slides, the CPU and GPU have a separate memory. If data has to be accessed by the GPU, it has to be copied over the bus. However, the bus is very slow. If you would copy all data every frame, the GPU would constantly have to wait for the bus un8l enough data has arrived. You did this in the first assignment by using the DrawUserPrimi8ves func8on. All 16000 ver8ces were copied over the bus each frame. It is much faster to store them once in a vertex buffer in GPU memory. When you want to draw the ver8ces, you just have to ac8vate that vertex buffer. Did you no8ce any frame rate improvement when you used vertex buffers? If not, you will no8ce it when using larger models. In games, over one million ver8ces are drawn each frame. If you would send them over to GPU memory each frame, your game will run very slow. It is very important to move as much stuff as possible to the GPU memory at load 8me, so you don t have to do it at run 8me. At run 8me, you just ac8vate the things you want to use, which are already in GPU memory. The only things copied each frame to GPU memory, are the things that can change each frame. For example the view matrix, which changes when the camera moves. 23

Ver8ces and Indices are copied to GPU memory using the SetData func8ons. They are ac8vated by secngs the ac8ve VertexBuffer / IndexBuffer in the Graphics Device. You have already done this in the first assignment. The Textures are loaded using the ContentManager. Internally, this creates a Texture2D object and calls the SetData func8on. It is ac8vated similar to how shader variables are set, which is explained next slide. 24

This requires a bit more explana8on. A shader is always stored in an effect. An effect can have mul8ple shaders, but at least a vertex shader and a pixel shader. You cannot individually select a shader, instead you have to select a technique. A technique can have mul8ple passes. And each pass has one vertex shader and one pixel shader. Usually, you will use just one technique and one pass. A technique is a specific way you want to render something. For example, you might want to create a different technique for each material in the game. Then metal surfaces can be rendered en8rely different from concrete surfaces. A Pass is one itera8on through the graphics pipeline. For some effects you have to go through the pipeline mul8ple 8mes. When the func8on apply is called, the shader is ac8vated. But also all shader variables are copied to GPU memory. These have to be set first. This is done using the Parameters Property. In the effects file, you can declare variables. These variables can be filled here. The name has to be equal to the variable name in the effects file. The set values are copied to the GPU memory, so the shader can use it. The excep8on is with textures, these have to be already present in GPU memory, and are only ac8vated. Retrieving an effectparameter each frame is slow. It is faster to retrieve it once. 25

States are loaded to GPU memory when they are updated. So you should not adapt a state each frame, because it will be sent mul8ple 8mes over the bus to GPU memory. States are the secngs for the stages without shaders. We cannot program those stages, so we can configure them. You have already seen a lot of these in the first assignment. For the Input Assembler stage, all states are set implicitly. The VertexDeclara8on explains to the GPU what the contents of the VertexBuffer are. This is a member variable of a VertexBuffer, and automa8cally gets ac8vated when the vertex buffer is ac8vated. The Primi8veType tells what kind of shape has to be drawn: triangles, lines or points. The rasterizer can be configured using the RasterizerState object. This object is ac8vated in the GraphicsDevice. The rasterizer state has secngs to draw a wireframe, or to apply backface culling. 26

The DepthStencilState has secngs for both the z buffer and the stencil buffer. Wolfgang will say something about this in a later lecture. The BlendState configures how the currently drawn images should be added on the render target The RenderTarget2D is the image being drawn on. A RenderTarget2D object is set as RenderTarget. The RenderTarget2D object is a subclass of Texture2D, so it can be used directly as shader input. If the rendertarget is set to null, the backbuffer is selected, so it draws to the screen. 27

Data is always transferred from the CPU to the GPU. Note that the resul8ng image from the GPU is not transferred back to the CPU, instead it goes directly to the display. It is possible to transfer data from the GPU to the CPU. The only data for which this is useful is texture data. That s because texture data is the only thing the pipeline outputs. Don t use this every frame, because it s slow. 28

We have seen what shaders do and the context they work in, but how do we create a shader? 29

Shaders for XNA are wrifen in HLSL. That is the same language which is used for DirectX. HLSL is similar to C#, so you have the same syntax for assigning a value to a variable. However, there is a lot missing which is not relevant for shader programming. There are no classes, just func8ons and structs. Some other things are added, for example easy to access func8ons for a lot of calcula8ons such as sinus, absolute, round. Wri8ng a shader is totally different from wri8ng normal code. There is no autocomplete, and no no8fica8on if your code is incorrect. So make sure you type the variable names correctly! There is no real debugging support. You get errors when there are problems compiling your code, but if the code compiles correctly but produces the wrong results, it is not possible to add a breakpoint and check what the shader is doing. So make sure you are really sure your code is correct, or else you can spend a lot of 8me debugging! Basically, wri8ng shaders is really hardcore programming. A good approach is to write a shader incrementally. Do not write the en8re shader first and only test the shader when it s done, then you will have a hard 8me debugging. It s befer to test the shader with every change you make. So start out simple, and test with every change you make. When a change does not give the expected result, the bug has to do something with the last change you made. 30

This is an example of the basic shader layout. You will get something similar when you create a new effect in XNA. As you can see it is fairly similar to C#, but also somewhat different. Let s start at the end of the shader. A technique is defined. As already men8oned, shaders are always part of one or more techniques. A technique describes the way something has to be rendered. For example, you can have a technique which you render metal surface with, and another technique used for concrete surfaces. A technique can have mul8ple passes. A pass is one itera8on through the graphics pipeline. In most cases you will just have one technique and one pass per effect. Inside the pass, the vertex shader and the pixel shader are selected for this pass. The vs_2_0 part is the hlsl version used. Render states can also be set in the pass. However, you can also set them from the C# code, as we ve seen before (like the BlendState, and RasterizerState). I do not recommend secng those states here, because it is not flexible. If you want to have an op8on to see the world in wireframe for example, you can use a normal if statement in your C# code. If you set your states in the pass, you cannot use an if statement, so you would have to create two techniques. This becomes exponen8ally worse with the amount of op8ons you want to set. Now for the Vertex Shader. It has one vertex as input and one vertex as output. The layout of these ver8ces are defined in the structs VSInput and VSOutput. There are two new things here: The types are different than you re used to. Float4 means that it is a vector with 4 floats. Float2 means that it is a vect with 2 floats. The other new thing is the POSITION and TEXCOORD0 names behind the variables. Those are called seman8cs. The seman8cs tell the compiler what the variable will be used for. You might remember the VertexDeclara8on created in the first assignment. We had to specify the usage for each element in the VertexDeclara8on. These usages correspond with these seman8cs. The elements in the vertexbuffer with usage Posi8on, will arrive in the posi8on variable of the VSInput struct. The VSOutput struct is returned in the vertex shader. The posi8on seman8c in VSOutput struct is used in the rasterizer. It is required that a variable with the posi8on seman8c is set as output of the vertex shader. The content of the VSOutput struct is interpolated in the rasterizer, and that interpolated VSOutput struct is used as input for the pixel shader. The pixel shader returns a float4, and at the end of the line func8on declara8on, the seman8c of that output is shown. That seman8c is Color0, so this pixel shader will output a color, which is the color for this pixel. It is also possible that the pixel shader returns mul8ple colors. That is done when there are mul8ple render targets set. In this case you will have to create a PSOutput struct. In this case the pixel shader returns always the same color. The float4 vector is used as container for this color. The values in the float4 are red, green, blue and alpha. Alpha is the translucency. There is one more thing. The shader variables passed to the effect are defined like you would expect. At the start of this effect file, a shadervariable is defined. 31

The normal types you know are present, like bool, int and float. A half is a half float, which is just like a normal float but with less precision. This can be faster, at the cost of less precision. A vector can be created from normal types by just appending the number. Only the numbers 1 to 4 are supported. A matrix can be created from the normal types by appending two numbers. Only the numbers 1 to 4 are supported. In most cases, you will just use a float4x4. A swizzle is a short way of re arranging the components of a vector. The x,y and z components can be selected in a different order. The statement behind the swizzle is the longer version which does the same. In a swizzle, one component can be selected mul8ple 8mes. Instead of using xyzw, rgba can also be used. Rgba should be used when swizzling with colors. The xyzw and rgba swizzling sets cannot be mixed. Note that not all shader versions support arbitrary swizzling. If you have problems, compile with a higher shader model. There are a lot of seman8cs, these are the most common. Not all seman8cs can be used everywhere. The normal cannot be present in the vertex shader output! So if you do want to pass the normal to the pixel shader, you have to pass it as a texcoord! Note that there can be a number behind the seman8c. This means that mul8ple texcoords can be used at the same 8me, but they do need a unique number. 32

The normal operators can be used, like you re used to. There are a lot of built in func8ons. You should use these over your own func8ons because it s easier for the compiler to op8mize these. dot is the dot product, cross is the cross product, and mul is a matrix mul8plica8on. That last one is important to remember. abs is the absolute value, normalize normalizes the vector min and max choose the minimum and maximum value of two. clamp is an important one. It clamps a value to stay within a range. So if the range is from 5 to 10, any value below 5 will be changed to 5 and any value over 10 will be changed to 10. saturate is the same as a clamp from 0 to 1. There are many more func8ons available, but these are the most important ones. You can also define your own func8ons. That is very similar to defining the vertex/ pixel shader func8ons, except there are no seman8cs. 33

Your can also use the normal looping and if/else statements like you re used to. However, there is one problem with this. Shaders are designed to do all the same. If there is a different execu8on path because of an if/else statement, that s very inefficient and thus slow. When the shaders can take a different execu8on path, that s called dynamic branching. For very simple cases, it also possible to get the same result while the shaders s8ll do the same. In this case, no dynamic branching is used. With a for statement, when the amount of itera8ons is constant, the loop can be unrolled. Then the contents are copied mul8ple 8mes. It is also possible to drop a pixel in the pixel shader. This will simply result in no output. This is done using the discard statement. 34

Remember not to use dynamic branching. If you want to use dynamic branching, use only one or two branches in a shader. Very short if/else statements are no problem, because these can be executed without dynamic branching. Because debugging is very difficult, you should think before wri8ng the code. The shader should be wrifen incrementally to detect bugs early. When you do have to debug, you can do this the old fashioned way. You can add an if statement to the code to check for unexpected results. If the result is unexpected, you can directly return a color (red for example). When you run the shader, you can see the result. If you really want to debug, there are some op8ons, but that is quite a hassle. The compiler in the graphics driver op8mizes your code. If a calculated result is never used, the calcula8on is removed. This can op8mize more than you would expect. It occurred a few 8mes to me that I thought I did a great op8miza8on, but it turned out that the compiler had already op8mized that! 35

The best way to start with shader programming is to get a bit of a feeling for it. If you create a new effect in XNA, it already gives you a very basic shader to start with. You should just try adding some stuff to the shader to get a feeling for it, and have some fun playing around! 36