Graphics Processing Units (GPUs) V2.2 (January 2013) Jan Kautz, Anthony Steed, Tim Weyrich

Similar documents
Graphics Processing Units (GPUs) V1.2 (January 2010) Outline. High-Level Pipeline. 1. GPU Pipeline:

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Pipeline Operations. CS 4620 Lecture 14

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Methodology for Lecture. Importance of Lighting. Outline. Shading Models. Brief primer on Color. Foundations of Computer Graphics (Spring 2010)

Lecture 19: OpenGL Texture Mapping. CITS3003 Graphics & Animation

5.2 Shading in OpenGL

Shading in OpenGL. Outline. Defining and Maintaining Normals. Normalization. Enabling Lighting and Lights. Outline

Programmable Graphics Hardware

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic.

E.Order of Operations

GLSL Introduction. Fu-Chung Huang. Thanks for materials from many other people

1.2.3 The Graphics Hardware Pipeline

Today. Rendering - III. Outline. Texturing: The 10,000m View. Texture Coordinates. Specifying Texture Coordinates in GL

Supplement to Lecture 22

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Tutorial on GPU Programming. Joong-Youn Lee Supercomputing Center, KISTI

CS 130 Final. Fall 2015

Pipeline Operations. CS 4620 Lecture 10

INF3320 Computer Graphics and Discrete Geometry

Introduction to Programmable GPUs CPSC 314. Real Time Graphics

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

INF3320 Computer Graphics and Discrete Geometry

12.2 Programmable Graphics Hardware

Programmable Graphics Hardware

Texture Mapping. CS 537 Interactive Computer Graphics Prof. David E. Breen Department of Computer Science

Deferred Rendering Due: Wednesday November 15 at 10pm

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

Rasterization Overview

Shaders. Slide credit to Prof. Zwicker

The Graphics Pipeline and OpenGL III: OpenGL Shading Language (GLSL 1.10)!

CSE 167: Introduction to Computer Graphics Lecture #6: Lights. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2014

Graphics. Texture Mapping 고려대학교컴퓨터그래픽스연구실.

Shading and Illumination

Basics of GPU-Based Programming

Texture Mapping. Mike Bailey.

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Programmable Graphics Hardware

Graphics Pipeline & APIs

CSE 167: Lecture #8: GLSL. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Computer Graphics. Illumination and Shading

Lab 9 - Metal and Glass

Sung-Eui Yoon ( 윤성의 )

Computergraphics Exercise 15/ Shading & Texturing

Lecture 17: Shading in OpenGL. CITS3003 Graphics & Animation

CS 148, Summer 2012 Introduction to Computer Graphics and Imaging

Illumination Model. The governing principles for computing the. Apply the lighting model at a set of points across the entire surface.

INF3320 Computer Graphics and Discrete Geometry

Graphics Hardware. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 2/26/07 1

Lecture 2. Shaders, GLSL and GPGPU

Illumination and Shading

Graphics Pipeline & APIs

CS4620/5620: Lecture 14 Pipeline

CHAPTER 1 Graphics Systems and Models 3

INF3320 Computer Graphics and Discrete Geometry

Texture Mapping and Sampling

Programming shaders & GPUs Christian Miller CS Fall 2011

Introduction to Shaders for Visualization. The Basic Computer Graphics Pipeline

Sign up for crits! Announcments

Projections and Hardware Rendering. Brian Curless CSE 557 Fall 2014

CS 354R: Computer Game Technology

Buffers. Angel and Shreiner: Interactive Computer Graphics 7E Addison-Wesley 2015

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

CS 432 Interactive Computer Graphics

Introduction to the OpenGL Shading Language

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Introduction to Shaders.

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

CISC 3620 Lecture 7 Lighting and shading. Topics: Exam results Buffers Texture mapping intro Texture mapping basics WebGL texture mapping

Lighting/Shading III. Week 7, Wed Mar 3

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

CS4621/5621 Fall Basics of OpenGL/GLSL Textures Basics

CS770/870 Spring 2017 Open GL Shader Language GLSL

CS770/870 Spring 2017 Open GL Shader Language GLSL

Rendering Objects. Need to transform all geometry then

Introduction to Programmable GPUs CPSC 314. Introduction to GPU Programming CS314 Gordon Wetzstein, 09/03/09

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Real-Time Graphics Architecture

Objectives Shading in OpenGL. Front and Back Faces. OpenGL shading. Introduce the OpenGL shading methods. Discuss polygonal shading

CS452/552; EE465/505. Texture Mapping in WebGL

CSE 167: Lecture #8: Lighting. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2011

Evolution of GPUs Chris Seitz

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

Illumination and Shading

Programming Graphics Hardware

Shader Programs. Lecture 30 Subsections 2.8.2, Robb T. Koether. Hampden-Sydney College. Wed, Nov 16, 2011

Mattan Erez. The University of Texas at Austin

CSE 167: Introduction to Computer Graphics Lecture #8: Textures. Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016

Optimal Shaders Using High-Level Languages

Models and Architectures

Drawing Fast The Graphics Pipeline

Graphics Processing Unit Architecture (GPU Arch)

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Textures. Texture Mapping. Bitmap Textures. Basic Texture Techniques

Graphics Hardware. Instructor Stephen J. Guy

Could you make the XNA functions yourself?

3D buzzwords. Adding programmability to the pipeline 6/7/16. Bandwidth Gravity of modern computer systems

Lecture 4 Advanced Computer Graphics (CS & SE )

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Transcription:

Graphics Processing Units (GPUs) V2.2 (January 2013) Jan Kautz, Anthony Steed, Tim Weyrich

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting 6. Geometry & Tesselation Shaders 7. Bottlenecks

High-Level Pipeline Application Processing Geometry Processing Rasterization CPU GPU Important to remember that CPU to GPU implies a client/server-like relationship Why do we even have GPUs? What are the implications of this high-level architecture?

High-Level Pipeline Application Geometry Rasterization (a.k.a. vertex pipeline ) (a.k.a. pixel pipeline or fragment pipeline ) Handle input Transform Rasterize (fill pixels) Simulation & AI Lighting Interpolate vertex parameters Look up/filter textures Culling Skinning Z- and stencil tests LOD selection Prefetching Calculate texture coords Blending

Traditional OpenGL Pipeline Geometry & State ModelView Transformation Lighting Perspective Transformation Viewport Transformation and Clipping Geometry Processing Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Rasterization Per Fragment Operations Frame-Buffer

Traditional OpenGL Pipeline Sometimes referred to as fixed function pipeline compared to modern programmable pipeline which uses shaders Traditional OpenGL pipeline still exists: you don t need to use shaders in OpenGL OpenGL is very state-heavy: certain properties are retained within the pipeline until they are changed. Thus there is slow-changing data (typically textures, lights, etc.) and fast-changing data (typically vertices, colours, etc.)

Traditional OpenGL Pipeline Geometry & State ModelView Transformation Lighting Perspective Transformation Viewport Transformation and Clipping Geometry Processing Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Rasterization Per Fragment Operations Frame-Buffer

Geometry Processing: Transform ModelView Transformation Tranform geometry from world coordinates into eye coords Includes camera positioning and object rotation/ translation Single model-view matrix in OpenGL

Geometry Processing: Transform Perspective Transformation Project eye coordinates into screen-plane (perspective or orthogonal). Simple specification glfrustum(left, right, bottom, top, near, far) gluperspective( fovy, aspect, near, far ) glortho(left, right, bottom, top, near, far) Viewport Transformation glviewport(x, y, width, height) Clipping Done automatically

Geometry Processing: Lighting Lighting If turned off: specified per-vertex color is interpolated Up to 8 light sources Lighting model: physically incorrect Blinn-Phong Ambient: k a (independent of light source & viewer) Diffuse: k d * (L.N) Specular: k s *(H.N) n, H = L+V/( L + V )

Geometry Processing: Lighting Lighting Only per-vertex (for every light source, in eye-coord): k a + I/a(r) * (k d *(L.N) + k s *(H.N) n ) I: intensity of light source, r: distance to light, a(r) = 1, r, or r 2 Interpolate result between vertices No per-pixel shading coarse mesh: highlight artifacts Extension for per-pixel shading Types of lights: Point lights, spot lights, directional lights

OpenGL Lighting Example float mat_ambient[] = {0.1, 0.1, 0.1, 1.0}; float mat_shininess[] = {20.0}; float mat_specular[] = {0.1, 0.1, 0.1, 0.0}; float mat_diffuse[] = {0.7, 0.7, 0.7, 0.0}; float light0_ambient[] = {1, 1, 1, 1.0}; float light0_diffuse[] = {1.0, 1.0, 1.0, 0.0}; float light0_position[] = {0, 6, 6, 1.0}; // [3] decides whether directional or positional float light0_specular[] = {1.0, 1.0, 1.0, 0.0}; gllightmodeli( GL_LIGHT_MODEL_TWO_SIDE, GL_FALSE ); // Light definition gllightfv(gl_light0, GL_AMBIENT, light0_ambient); gllightfv(gl_light0, GL_DIFFUSE, light0_diffuse); gllightfv(gl_light0, GL_SPECULAR, light0_specular); gllightfv(gl_light0, GL_POSITION, light0_position); // Material definition (for subsequent polygons) glmaterialfv(gl_front, GL_AMBIENT, mat_ambient); glmaterialfv(gl_front, GL_SHININESS, mat_shininess); glmaterialfv(gl_front, GL_SPECULAR, mat_specular); glmaterialfv(gl_front, GL_DIFFUSE, mat_diffuse); // Turn it on glenable(gl_lighting); glenable(gl_light0);

Traditional OpenGL Pipeline Geometry & State ModelView Transformation Lighting Perspective Transformation Viewport Transformation and Clipping Geometry Processing Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Rasterization Per Fragment Operations Frame-Buffer

Rasterization: Texturing Texturing 1D, 2D, 3D textures Contains: Color (RGB) + Alpha (default: 8888 bits) Luminance (L) + Alpha (default: 88 bits) Latest GPUs: support 16/24/32bit floating point textures Filtering: nearest neighbor, linear, mip-mapping Latest GPUs: floating point filtering on NV, not ATI

Rasterization: Texturing Texture Coordinates By hand (for every vertex) Generated automatically (by OpenGL) Object coordinates: linear combination of object coords. Eye coordinates: linear combination of eye coords. E.g.: draw height contours: z-coord as texcoord of 1D texmap Texture matrix: 4x4 matrix applied to tex.-coords. before they are used E.g.: project texture onto geometry (slide projector)

Rasterization: Texturing Lighting and Texturing: GL_REPLACE: replace lighting with texture GL_MODULATE: multiply light with texture ok for diffuse lighting, but specular part should be added later: L a,d * texture + L s Supported via extension GL_BLEND: Blend between fragment, texture and environment color: C = C f (1-C t ) + C e C t

OpenGL Texturing Example // bind texture glgentextures( 1, &texname ); glbindtexture( GL_TEXTURE_2D, texname ); // initialize parameters gltexparameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP ); gltexparameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT ); gltexparameterf( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR ); gltexparameterf( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR ); // how do we combine texture with lighting/color gltexenvi( GL_TEXTURE_2D, GL_TEXTURE_ENV_MODE, GL_MODULATE ); // specify texture glteximage2d( GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_FLOAT, buffer ); // turn it on glenable( GL_TEXTURE_2D );

Rasterization: Fog Objects that are further away fade to fog Affects transformed, lit, and textured objects User controls: Fog density, Fog color C f Fog increase (depending on z-value): linear, exp, exp^2 Fog start and end Hardware automatically computes fog factor f Blended together with actual color: C = f *C i + (1-f )*C f, Cannot do volumetric fog where f is fog factor

Traditional OpenGL Pipeline Geometry & State ModelView Transformation Lighting Perspective Transformation Viewport Transformation and Clipping Geometry Processing Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Rasterization Per Fragment Operations Frame-Buffer

Per Fragment Operations Fragment: color + alpha + depth Scissor-Test Specify screen-rectangle: fragments only drawn in there Alpha-Test Test source alpha (incoming fr.) against reference alpha value Depending on result: discard / keep fragment Tests: <, >, <=, >=, =,!=, always, never E.g.: textures for tree leaves Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Per Fragment Operations Stencil-Test: stencil buffer with N bits information Test stencil value against reference value S (see above tests) Depending on result discard / keep fragment Depending on result and on depth test result (zfail/ zpass): increase / decrease / zero / keep / invert / set stencil value Use: e.g. render to certain area of screen only Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Per Fragment Operations Depth-Test Test source depth against destination depth Depending on result: discard / keep fragment Tests: <, >, <=, >=, =,!=, always, never Newer hardware: early z-reject Before fragment is shaded (i.e. textured, etc ) If test fails, fragment is killed GPUs keep a separate buffer for that! Helps tremendously for some applications Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Per Fragment Operations Frame-buffer blending Combine source fragment with destination fragment Blending equation: C = C s *S + C d *D C s = source color, C d = destination color S = source factor, D = destination factor S/D = zero, one, dst_color, src_color, src_alpha, Extension: change + to - in equation All standard imaging operators possible E.g.: See through a window with dark glass Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Per Fragment Operations Dither Dithering is needed if frame buffer has less bits than incoming fragment By default turned on! Not sure, if hardware nowadays even supports this anymore Enable/disable Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Per Fragment Operations Logic Operators Legacy feature Logic operations between incoming fragment and destination fragment. Done for every color channel separately Operations: clear (=0), copy (=src), noop (=dst), set (=1) copy invert, invert, and, or, nand, nor, xor, Operations: Scissor Alpha Stencil Depth Blending Dither LogicOp

Frame Buffer Baseline Assumption Color Buffer: RGB + alpha (32bit) Depth Buffer: z-value (24bit) Stencil Buffer: stencil value (8bit) Standard depths: 32bit color 24bit depth 8bit stencil Only very few bits, especially if used for multi-pass rendering

Frame Buffer Latest GPUs Color Buffer: RGB + alpha (128bits, floating point) Depth Buffer: z-value (24bit) Stencil Buffer: stencil value (8bit) Buffer not only for screen buffer Usually only for storing intermediate results Use as textures in next pass

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting 6. Geometry & Tesselation Shaders 7. Bottlenecks

Traditional GPU Pipeline (~OpenGL 1.0) Geometry & State Geometry Shader Lighting Perspective Transformation Viewport Transformation and Clipping Geometry Processing Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Rasterization Per Fragment Operations Frame-Buffer

Towards Recent GPU Pipeline (~OpenGL 2.0) Geometry & State ModelView Transformation Lighting Perspective Transformation Viewport Transformation and Clipping Vertex Processing Fragment Assembly Scan-Conversion Texturing Fog Scissor Alpha Stencil Depth Blending Dither LogicOp Fragment Assembly Pixel Processing Per Fragment Operations Frame-Buffer

Recent GPU Pipeline (~OpenGL 2.0) Geometry & State Vertex Shader Fragment Assembly Pixel Shader Per Fragment Operations Frame-Buffer Geometry Shader & Pixel Shader are programmable Fragment Assembly & Per Fragment Operations are fixed function

Modern GPU Pipeline (~OpenGL3.3/4.0) Tesselation Shaders Control Point Shader Tesselation Control Shader Tesselation Evaluation Shader Geometry & State Vertex Shader Geometry Shader Fragment Assembly Pixel Shader Per Fragment Operations Frame-Buffer

tessellatio n texture fetches Control point processing Vertex assembly Command parser vertex buffer objects primitive batch type, vertex indices, vertex attributes pixel image or texture image specification Pixel unpacking transformed control points patch control points primitive batch type, vertex data buffer data, unmap buffer map buffer, get buffer data pixel unpack buffer objects unpacked pixels Buffer store Patch assembly & processing transform feedback buffer objects Vertex processing texture buffer objects Pixel packing pixel pack buffer objects Pixel processing pixels to pack primitive topology, transformed vertex data Transform feedback transformed vertex attributes Texture mapping vertex textur e fetche s texture image specification Patch tessellation generation transformed patch transformed patch patch topology, evaluated patch vertex image rectangles, bitmaps Geometric primitive assembly & processing geometr y texture fetches fragme nt texture fetches transformed patch, bivariate domain Fragment processing Image primitive processing point, line, and polygon fragments pixels in framebuffer object textures image and bitmap fragments Patch evaluation processing programmable operations Raster operations fixed-function operations Legend stenciling, depth testing, blending, accumulation OpenGL 4 vertices pixels copy pixels, copy texture image patch data fragments filtered texels buffer data Framebuffer From OpenGL 4 for 2010, Mark Kilgard, Barthold Lichtenbelt, SIGGRAPH 2010

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting 6. Geometry & Tesselation Shaders 7. Bottlenecks

Pixel, Vertex Shaders Are full processors with a machine code In older GPUs pixel & vertex shaders were slightly different, and were different silicon Now the differences are negligible and units might be interchanged Programmers rarely program the machine code High-Level Shading Language (DirectX) CG (Nvidia similar to HLSL, works in both DirectX and OpenGL) GL Shading Language

Vertex Shaders Geometry & State Vertex Shader Fragment Assembly Pixel Shader Per Fragment Operations Frame-Buffer Vertex Shader must apply (at least) Lighting (but see our later example) Model transforms However, can do whatever it wishes, and can pass data down the pipeline to the Pixel Shader

Vertex Shaders: What for? Custom transform, lighting, and skinning

Vertex Programs: What for? Custom cartoon-style lighting

Vertex Shaders: What for? Dynamic displacements of surfaces by objects

Vertex Shaders What is a vertex shader? A small program running on GPU for each vertex GPU instruction set to perform all vertex math Reads an untransformed, unlit vertex Creates a transformed vertex Optionally Lights a vertex Creates texture coordinates Creates fog coordinates Creates point sizes Latest GPUs: can read from texture

Vertex Shaders What does it not? Does not create or delete vertices (1 in 1 out) No topological information provided This has changed with next generation Has an additional geometry shader (after vertex shader) Can create/delete polygons at that stage.

Vertex Shaders Vertex Attributes e.g. normals, Vertex Program 256 instructions Vertex Output Mx4 registers Program Parameters (read only, global) Automatic tracking of matrices (model-view, ) Temporary Registers 12x4 registers Homogeneous clip space position Primary & secondary color Fog coord, point size, texture coordinates Numbers vary greatly with architecture! Take with grain of salt.

Vertex Shaders Assembly Language Powerful SIMD instruction set Four operations simultaneously Many instructions (even branch & loops now!) Operate on scalar or 4-vector input Result in a vector or replicated scalar output Instruction Format: Opcode dst, [-]s0 [,[-]s1 [,[-]s2]]; #comment Instruction name Destination Register Source0 Register Source1 Register Source2 Register

Vertex Shaders Simple Example: MOV R1, R2; R1 x y z w 7.0 3.0 6.0 2.0 R2 x y z w 7.0 3.0 6.0 2.0 R1 x y z w 0.0 0.0 0.0 0.0 R2 x y z w 7.0 3.0 6.0 2.0 before after R1 x y z w R2 x y z w

Vertex Shaders Input Mapping: Negation Swizzling Smearing Example: before MOV R1, -R2.yzzx; after R1 R2 R1 R2 0.0 0.0 x y 7.0 3.0 x y -3.0-6.0 x y 7.0 3.0 x y 0.0 z 6.0 z -6.0 z 6.0 z 0.0 w 2.0 w -7.0 w 2.0 w

Vertex Shaders: Instruction Set ARL address relative MOV move MUL multiply ADD add MAD multiply and add RCP reciprocal RSQ recipr. sqrt DP3 dot product 3 DP4 dot product 4 DST distance MIN minimum MAX maximum SLT set on less than SGE set on greater than EXP exponential (base 2) LOG logarithm LIT do lighting and many more

Vertex Shaders Simple Specular and Diffuse Lighting!!VP1.0 # # c[0-3] = modelview projection (composite) matrix # c[4-7] = modelview inverse transpose # c[32] = eye-space light direction # c[33] = constant eye-space half-angle vector (infinite viewer) # c[35].x = pre-multiplied monochromatic diffuse light color & diffuse mat. # c[35].y = pre-multiplied monochromatic ambient light color & diffuse mat. # c[36] = specular color # c[38].x = specular power # outputs homogenous position and color # DP4 o[hpos].x, c[0], v[opos]; # Compute position. DP4 o[hpos].y, c[1], v[opos]; DP4 o[hpos].z, c[2], v[opos]; DP4 o[hpos].w, c[3], v[opos]; DP3 R0.x, c[4], v[nrml]; # Compute normal. DP3 R0.y, c[5], v[nrml]; DP3 R0.z, c[6], v[nrml]; # R0 = N' = transformed normal DP3 R1.x, c[32], R0; # R1.x = Ldir DOT N' DP3 R1.y, c[33], R0; # R1.y = H DOT N' MOV R1.w, c[38].x; # R1.w = specular power LIT R2, R1; # Compute lighting values MAD R3, c[35].x, R2.y, c[35].y; # diffuse + ambient MAD o[col0].xyz, c[36], R2.z, R3; # + specular END

Vertex processor capabilities 4-vector FP32 operations True data-dependent control flow Conditional branch instruction Subroutine calls, up to 4 deep Jump table (for switch statements) Condition codes New arithmetic instructions (e.g. COS) User clip-plane support

Vertex Processor Resource Limits 256 instructions per program (effectively much higher w/branching) 16 temporary 4-vector registers 256 uniform parameter registers 2 address registers (4-vector) Changes with every generation though!

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting 6. Geometry & Tesselation Shaders 7. Bottlenecks

Pixel Shaders Geometry & State Vertex Shader Fragment Assembly Pixel Shader Per Fragment Operations Frame-Buffer Pixel Shader must Calculate colour for pixel Does texture mapping (necessarily a per-pixel operation) Can do anything else it likes! BUT no access to any geometry information

Fragment Assembly Clipping and Rasterisation Vertices (and our calculated colors!) might not be inside the view volume Clipped to view volume and resulting polygons rasterised Fragments are all the values necessary to calculate a colour of a pixel gl_color that we receive is interpolated, by clipping and rasterisation

In: Static and Per Vertex State Out: Varying Per Vertex Geometry & State Vertex Shader Fragment Assembly Pixel Shader Per Fragment Operations In: Varying Per Pixel Out: Color Frame-Buffer

Pixel Shaders Little assembly program for every pixel What for? E.g. per-pixel lighting

Latest Pixel Shaders Support 16 textures Range: floating point [16/24/32bit] Texture lookup is just another command Result from texture lookup can be used to compute new tex coordinates dependent texture lookup Instruction set identical to vertex shader data-dependent loops dynamic branching (also from inside loops) Multiple rendering targets (e.g., 4 color buffers)

Fragment Processor / Tex-Mapping Texture reads are just another instruction (TEX, TXP, or TXD) Allows computed texture coordinates, nested to arbitrary depth Allows multiple uses of a single texture unit Optional LOD control specify filter extent Think of it as: A memory-read instruction, with optional user-controlled filtering

Additional Fragment Capabilities Read access to window-space position Read/write access to fragment Z Built-in derivative instructions Partial derivatives w.r.t. screen-space x or y Useful for anti-aliasing Conditional fragment-kill instruction FP32, FP24, FP16, and fixed-point data

Fragment Processor Limitations No indexed reads from registers Use texture reads instead No memory writes, except to frame-buffer Shaders have a cost!!! Especially reading randomly from textures is very expensive! Hardware relies on coherent memory accesses

Fragment Processor Resource Limits Example Limitations (changes rapidly): 1024 instructions 512 constants or uniform parameters Each constant counts as one instruction 16 texture units Reuse as many times as desired 8 FP32 x 4 perspective-correct inputs 128-bit framebuffer color output (use as 4 x FP32, 8 x FP16, etc )

32-bit IEEE Floating-Point 32bit throughout the pipeline: Framebuffer Textures Fragment processor Vertex processor Interpolants

Support for Multiple Data Types Supports 32-bit IEEE fp throughout pipeline Framebuffer, textures, computations, interpolants Fragment processor also supports: 16-bit half floating point, 12-bit fixed point These may be faster than 32-bit on some HW Framebuffer/textures also support: Large variety of fixed-point formats E.g., classical 8-bit per component RGBA, BGRA, etc. These formats use less memory bandwidth than FP32

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting (useful for coursework) 6. Geometry & Tesselation Shaders (optional: self-study) 7. Bottlenecks

What gets put down the pipe State which exists and is available throughout pipeline, changes slowly, if at all: GLfloat diffuse0[] = {1.0f, 1.0f, 0.0f, 1.0f}; GLfloat adcolor[] = { 0.3f, 0.3f, 0.3f, 1.0f }; glenable(gl_light0); gllightfv(gl_light0, GL_DIFFUSE, diffuse0); glmaterialfv(gl_front, GL_DIFFUSE, adcolor);

What gets put down the pipe Per polygon data, which changes rapidly float size = 1.0f; float scale = 0.2f; float delta = 0.1f; float I[3] = { 1.0f, 0.0f, 0.0f}; float A[3] = { size, size, size * scale + delta }; float B[3] = { size, size, -size * scale + delta }; float C[3] = { size, -size, -size * scale }; float D[3] = { size, -size, size * scale }; glbegin(gl_quads); glnormal3fv(i); glvertex3fv(d); glvertex3fv(c); glvertex3fv(b); glvertex3fv(a); glnormal3fv(k);

Vertex Shader vec4 Ambient; vec4 Diffuse; vec4 Specular; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { float NdotL; // N. light direction float NdotH; // N. light half vector float pf; // power factor vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; // normalize the vector from surface to light position L = normalize(l); H = normalize(l + E); NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); } Ambient += gl_lightsource[i].ambient; Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf; } void flight(in vec3 N, in vec3 vertex) { vec4 color; vec3 E; } E = vec3 (0.0, 0.0, 1.0); // Clear the light intensity accumulators Ambient = vec4 (0.0); Diffuse = vec4 (0.0); Specular = vec4 (0.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; color = clamp( color, 0.0, 1.0 ); gl_frontcolor = color; void main (void) { vec3 N; } // E-coordinate position of vertex, needed in various calculations vec3 vertex = vec3(gl_modelviewmatrix * gl_vertex); // Do fixed functionality vertex transform gl_position = ftransform(); N = gl_normalmatrix * gl_normal; N = normalize(n); flight(n, vertex);

Vertex Shader: Output of Shader color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; void flight(in vec3 N, in vec3 vertex) { vec4 color; vec3 E; color = clamp( color, 0.0, 1.0 ); gl_frontcolor = color; Note we accumulate all light before applying it to the material } E = vec3 (0.0, 0.0, 1.0); // Clear the light intensity accumulators Ambient = vec4 (0.0); Diffuse = vec4 (0.0); Specular = vec4 (0.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; color = clamp( color, 0.0, 1.0 ); gl_frontcolor = color; gl_position = ftransform(); void main (void) { vec3 N; // E-coordinate position of vertex, needed in various calculations vec3 vertex = vec3(gl_modelviewmatrix * gl_vertex); } // Do fixed functionality vertex transform gl_position = ftransform(); N = gl_normalmatrix * gl_normal; N = normalize(n); flight(n, vertex);

Vertex Shader: Iterate over two lights // Clear the light intensity accumulators Ambient = vec4 (0.0); Diffuse = vec4 (0.0); Specular = vec4 (0.0); void flight(in vec3 N, in vec3 vertex) { vec4 color; vec3 E; E = vec3 (0.0, 0.0, 1.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); // Clear the light intensity accumulators Ambient = vec4 (0.0); Diffuse = vec4 (0.0); Specular = vec4 (0.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); } color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; color = clamp( color, 0.0, 1.0 ); gl_frontcolor = color; void main (void) { vec3 N; // E-coordinate position of vertex, needed in various calculations vec3 vertex = vec3(gl_modelviewmatrix * gl_vertex); } // Do fixed functionality vertex transform gl_position = ftransform(); N = gl_normalmatrix * gl_normal; N = normalize(n); flight(n, vertex);

Vertex Shader vec4 Ambient; vec4 Diffuse; vec4 Specular; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { float NdotL; // N. light direction float NdotH; // N. light half vector float pf; // power factor vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; // normalize the vector from surface to light position L = normalize(l); H = normalize(l + E); NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); } Ambient += gl_lightsource[i].ambient; Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf; } L = vec3 (gl_lightsource[i].position) - vertex; L = normalize(l); H = normalize(l + E); NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); }

Vertex Shader vec4 Ambient; vec4 Diffuse; vec4 Specular; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { float NdotL; // N. light direction float NdotH; // N. light half vector float pf; // power factor vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; void flight(in vec3 N, in vec3 vertex) { vec4 color; vec3 E; E = vec3 (0.0, 0.0, 1.0); // Clear the light intensity accumulators Ambient = vec4 (0.0); Diffuse = vec4 (0.0); Specular = vec4 (0.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); // normalize the vector from surface to light position L = normalize(l); H = normalize(l + E); NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); } Ambient += gl_lightsource[i].ambient; Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf; } } color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; color = clamp( color, 0.0, 1.0 ); gl_frontcolor = color; void main (void) { vec3 N; // E-coordinate position of vertex, needed in various calculations vec3 vertex = vec3(gl_modelviewmatrix * gl_vertex); // Do fixed functionality vertex transform gl_position = ftransform(); N = gl_normalmatrix * gl_normal; N = normalize(n); flight(n, vertex); I } r = k a I a + I i (k d (n.l) + k s (h.n) m )

Fragment Shader void main (void) { vec4 color; color = gl_color; color = clamp( color, 0.0, 1.0 ); gl_fragcolor = color; }

Recall In: Static and Per Vertex State Out: Varying Per Vertex Geometry & State Vertex Shader Fragment Assembly Pixel Shader Per Fragment Operations In: Varying Per Pixel Out: Color Fragment assembly in this example does (implictly) Gouraud Shading Frame-Buffer If we want per-pixel lighting, need to pass more varying parameters

Per Pixel Lighting We want to calculate lighting per pixel so now we interpolate L, N and H across the polygon instead of just color N2 H2 N1 H1 L2 N3 L1 L3 H3

Per Pixel Lighting We want to calculate lighting per pixel so now we interpolate L, N and H across the polygon instead of just color Note the centre of triangle is brighter than any vertex N3 L3 H3

Vertex Shader 2 varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; L = normalize(l); // normalize the vector from surface to light position if (i==0) { L0 = normalize(l); H0 = normalize(l + E); } else { L1 = normalize(l); H1 = normalize(l + E); } void flight(in vec3 N, in vec3 vertex) { vec4 color; vec3 E; } E = vec3 (0.0, 0.0, 1.0); // Clear the light intensity accumulators Ambient = vec4 (0.0); pointlight(0, N, E, vertex); pointlight(1, N, E, vertex); void main (void) { // E-coordinate position of vertex, needed in various calculations vec3 vertex = vec3(gl_modelviewmatrix * gl_vertex); } // Do fixed functionality vertex transform gl_position = ftransform(); N = gl_normalmatrix * gl_normal; flight(n, vertex); } // Calculate ambient lighting only Ambient += gl_lightsource[i].ambient;

Vertex Shader 2 varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; } // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; L = normalize(l); // normalize the vector from surface to light position if (i==0) { L0 = normalize(l); H0 = normalize(l + E); } else { L1 = normalize(l); H1 = normalize(l + E); } // Calculate ambient lighting only Ambient += gl_lightsource[i].ambient;

Vertex Shader 2 varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; void pointlight(in int i, in vec3 N, in vec3 E, in vec3 vertex) { vec3 L; // direction from surface to light position vec3 H; // direction of maximum highlights // Calculate ambient lighting only Ambient += gl_lightsource[i].ambient Calculate L and H, for both lights, and send them down the pipeline as varying } // Compute vector from surface to light position L = vec3 (gl_lightsource[i].position) - vertex; L = normalize(l); // normalize the vector from surface to light position if (i==0) { L0 = normalize(l); H0 = normalize(l + E); } else { L1 = normalize(l); H1 = normalize(l + E); } // Calculate ambient lighting only Ambient += gl_lightsource[i].ambient;

Fragment Shader 2 varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; vec4 Specular; vec4 Diffuse; void pointlightfrag(in int i, in vec3 N, in vec3 L, in vec3 H) { float NdotL; // N. light direction float NdotH; // N. light half vector float pf; NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); void main (void) { vec4 color; vec3 N1; } Diffuse = vec4 (0.0); Specular = vec4 (0.0); N1 = normalize(n); pointlightfrag(0, N1, L0, H0); pointlightfrag(1, N1, L1, H1); color = Ambient * gl_frontmaterial.ambient + Diffuse * gl_frontmaterial.diffuse + Specular * gl_frontmaterial.specular; color = clamp( color, 0.0, 1.0 ); gl_fragcolor = color; } Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf; }

Fragment Shader 2 varying vec4 Ambient; varying vec3 N; varying vec3 L0; varying vec3 H0; varying vec3 L1; varying vec3 H1; vec4 Specular; vec4 Diffuse; void pointlightfrag(in int i, in vec3 N, in vec3 L, in vec3 H) { float NdotL; // N. light direction float NdotH; // N. light half vector float pf; NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); } Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf; } NdotL = max(0.0, dot(n, L)); NdotH = max(0.0, dot(n, H)); if (NdotL == 0.0) { pf = 0.0; } else { pf = pow(ndoth, gl_frontmaterial.shininess); } Diffuse += gl_lightsource[i].diffuse * NdotL; Specular += gl_lightsource[i].specular * pf;

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting (useful for coursework) 6. Geometry & Tesselation Shaders 7. Bottlenecks

Modern GPU Pipeline (~OpenGL3.3/4.0) Tesselation Shaders Control Point Shader Tesselation Control Shader Tesselation Evaluation Shader Geometry & State Vertex Shader Geometry Shader Fragment Assembly Pixel Shader Per Fragment Operations Frame-Buffer

Geometry Shaders New in DirectX10/OpenGL3.2 (or via OpenGL extensions) Takes geometry primitives (vertices, primitive state) and produces more (or zero) geometry primitives Get over a problem that Vertex Shaders are exactly one vertex in, over vertex out Useful for tesselation, shadow volume extrusion, etc.

Tesselation Shaders New extension in OpenGL3.3/4.1 Perform a similar function to Geometry shaders: Take geometry input and expand it Much more functionality Targeted at generating very smooth shapes on the screen given compact geometry representations

Adaptive PN Triangle Tesselation Tessellated & Phong shaded Original monkey head faceted mesh Wire frame of adaptive tessellation Visualization of barycentric weights used by tessellation evaluation shader From OpenGL 4 for 2010, Mark Kilgard, Barthold Lichtenbelt, SIGGRAPH 2010

From OpenGL 4 for 2010, Mark Kilgard, Barthold Lichtenbelt, SIGGRAPH 2010

Outline 1. Traditional OpenGL Pipeline: 2. Modern OpenGL Pipeline 3. Vertex Shaders 4. Pixel Shaders 5. Example: Per-Pixel Lighting (useful for coursework) 6. Geometry & Tesselation Shaders (optional: self-study) 7. Bottlenecks

Bottlenecks A GPU is a relatively complicated set of processors configured as a pipeline The most obvious bottleneck is #polygons, graphics cards are often rated by their polygon/s This ratings becoming obsolete as polygons get smaller, shaders get more complicated Sometimes mentions pixel bandwidth Or even FLOPs and compared to normal CPUs on certain benchmarks

Possible Bottlenecks CPU transfer transform raster texture fragment frame buffer CPU Geometry Storage Geometry Processor Rasterizer Fragment Processor Frame buffer CPU/Bus Bound Vertex Bound Texture Storage + Filtering Pixel Bound

Causes of Bottlenecks Saturating some compute or data resource Compute: cycles on shaders Data: CPU-GPU transfer, polygon assembly, texture fetch Note that Vertex Shader might be idle when Pixel Shader swamped Large polygons: polygon assembly is not a problem, but pixel shader might not deal with fragments

Fixes and Strategies Balance between shader effort How to tell if you program is vertex or polygon limited? Don t render anything you don t need to CPU clipping of objects In modern GPU & especially OpenGL2.0ES GPU-CPU bandwidth is helped by (exclusive) use of stored arrays, where vertex attributes are kept on GPU Two pass, deferred shading Render once with no pixel shader just to learn which fragments are on screen, then render with complex pixel shaders

Conclusions GPUs are: Fast Powerful Hard to program Many new features each generation, often vendorspecific Hard to keep up with these features Lesson: high-level languages needed for programming Solution: HLSL / CG