General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Size: px
Start display at page:

Download "General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)"

Transcription

1 ME 90-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 009 Lecture

2 Outline Last time Frame buffer operations GPU programming intro Linear algebra representations Flow control Today Reduce review Sorting Searching Cg

3 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together NxN + Credit: Mark Harris

4 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... N x( N /) + Credit: Mark Harris

5 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... N x( N /) + Credit: Mark Harris

6 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... Until we re left with a single row of texels Nx Requires log N steps Credit: Mark Harris

7 Reduce Any operation that computes a single result from a data set sum min max average product...

8 Max Reduction: Reduce float max(float texcoord : TEXCOORD0, { } uniform samplerrect img) : COLOR float a, b, c, d; a = ftexrect(img, texcoord); b = ftexrect(img, texcoord + float(0,)); c = ftexrect(img, texcoord + float(,0)); d = ftexrect(img, texcoord + float(,)); return max(max(a, b), max(c, d));, Ian Buck

9 Max Reduction O( log n) passes to reduce n^ elements can increase number of reductions in fragment program to reduce number passes Credit: Mark Harris, Tim Purcell, Ian Buck 9

10 Linear Algebra Representations Vector representation D textures best we can do High texture memory bandwidth Read-write access, dependent fetches N N Credit: Jens Krüger 0

11 The fragment pipeline Input: Fragment Attributes Input: Texture Image Color R G B A Position X Y Z W Texture coordinates X Y [Z] - Interpolated from vertex information Texture coordinates X Y [Z] - X Y Z W Each element of texture is D vector bits = float bits = half Credit: Suresh Venkatasubramanian

12 Outline Today Reduce review Sorting Searching Cg

13 Assumptions Data organized into D arrays Rendering pass == screen aligned quad Not using vertex shaders PS.0 GPU No data dependent branching at fragment level

14 Sorting Given an unordered list of elements, produce list ordered by key value Kernel: compare and swap Standard sort algorithms not suited to GPUs Look at parallel sort algorithms Bitonic merge sort [Batcher ] Periodic balanced sorting networks [Dowd 9]

15 Bitonic Merge Sort Overview Repeatedly build bitonic lists and then sort them Bitonic list is two monotonic lists concatenated together, one increasing and one decreasing. List A: (,,, ) List B: (,,, ) List AB: (,,,,,,, ) monotonically increasing monotonically decreasing bitonic

16 Bitonic Merge Sort x monotonic lists: () () () () () () () () x bitonic lists: (,) (,) (,) (,)

17 Bitonic Merge Sort Sort the bitonic lists

18 Bitonic Merge Sort x monotonic lists: (,) (,) (,) (,) x bitonic lists: (,,,) (,,,)

19 Bitonic Merge Sort Sort the bitonic lists 9

20 Bitonic Merge Sort Sort the bitonic lists 0

21 Bitonic Merge Sort Sort the bitonic lists

22 Bitonic Merge Sort Bitonic Merge Sort x monotonic lists: (,,,) (,,,) x bitonic list: (,,,,,,,)

23 Bitonic Merge Sort Sort the bitonic list

24 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list

25 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list

26 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list

27 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list

28 Bitonic Merge Sort Bitonic Merge Sort Done!

29 Bitonic Merge Sort Summary Separate rendering pass for each set of swaps O(log n) passes Each pass performs n compare/swaps Total compare/swaps: O(n log n) Limitations of GPU cost us factor of logn over best CPU-based sorting algorithms 9

30 Bitonic Merge Sort Helper Function float convertdtod(float coordd, float width) { float coordd; coordd.y = coordd/width; coordd.x = floor(frac(coordd.y) * width); coordd.y = floor(coordd.y); return coordd; } 0

31 Bitonic Merge Sort float BitonicSort(float elemd : WPOS, uniform float offset, // offset = ^(stage - ) uniform float pbufwidth, uniform float stageno, // stageno = ^stage uniform float stepno, // stepno = ^step uniform samplerrect sortedlist) : COLOR { elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; half csign = (fmod(elemd, stageno) < offset)? : -; half cdir = (fmod(floor(elemd/stepno), ) == 0)? : -; float adrd = csign * offset + elemd; float adrd = convertdtod(adrd, pbufwidth); float val0 = ftexrect(sortedlist, elemd); float val = ftexrect(sortedlist, adrd); float cmin = (val0 < val)? val0 : val; float cmax = (val0 > val)? val0 : val; return (csign == cdir)? cmin : cmax; }

32 Binary Sort float BinarySearch(float elemd : WPOS, uniform float stride, uniform float pbufwidth, uniform float sortbufwidth, uniform samplerrect sortlist) : COLOR {elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; float curpos = stride; //loop over (LOGN ) search passes for (int i = 0; i < LOGN - ; i++){ stride = floor(stride * 0.); curpos = Search(curpos, elemd, stride, sortlist, sortbufwidth); } //log nth pass curpos = Search(curpos, elemd,.0, srtlist, srtbufwidth); //cleanup pass curpos = SearchFin(curpos,elemd,.0,srtlist, srtbufwidth); return curpos;}

33 Making GPU Sorting Faster Draw several quads with similar computation instead of single quad Reduce decision making in fragment program Push work into vertex processor and interpolator Reduce computation in fragment program More than one compare/swap per sort kernel invocation Reduce computational complexity

34 Grouping Computation Grouping Computation

35 Implementation Details Specify interpolants for smaller quads down or up compare and swap distance to comparison partner Kipfer & Westermann in GPU Gems

36 Outline Today Reduce review Sorting Searching Cg

37 Types of Search Search for specific element Binary search Search for nearest element(s) k-nearest neighbor search Both searches require ordered data

38 Binary Search Find a specific element in an ordered list Implement just like CPU algorithm Assuming hardware supports long enough shaders Finds the first element of a given value If v does not exist, find next smallest element > v Search algorithm is sequential, but many searches can be executed in parallel Number of pixels drawn determines number of searches executed in parallel pixel == search v

39 Binary Search Search for v0 Initialize Search starts at center of sorted array v >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0 9

40 Binary Search Search for v0 Initialize Step v0 >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0 0

41 Binary Search Search for v0 Initialize Step Step v0 >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0

42 Binary Search Search for v0 Initialize Step Step Step 0 At this point, we either have found v0 or are element too far left One last step to resolve Sorted List v0 v0 v0 v v v v v 0

43 Binary Search Search for v0 Initialize Step Step Step Step 0 0 Done! Sorted List v0 v0 v0 v v v v v 0

44 Binary Search Search for v0 and v Initialize Search starts at center of sorted array Both searches proceed to the left half of the array Sorted List v0 v0 v0 v v v v v 0

45 Binary Search Search for v0 and v Initialize Step The search for v0 continues as before The search for v overshot, so go back to the right Sorted List v0 v0 v0 v v v v v 0

46 Binary Search Search for v0 and v Initialize Step We ve found the proper v, but are still looking for v0 Step Both searches continue Sorted List v0 v0 v0 v v v v v 0

47 Binary Search Search for v0 and v Initialize Step Step Step 0 Now, we ve found the proper v0, but overshot v The cleanup step takes care of this Sorted List v0 v0 v0 v v v v v 0

48 Binary Search Search for v0 and v Initialize Step Done! Both v0 and v are located properly Step Step 0 Step 0 Sorted List v0 v0 v0 v v v v v 0

49 Binary Search Summary Single rendering pass Fragment program Each pixel drawn performs independent search Iterates log n + times through list 9

50 Binary Search float BinarySearch(float elemd : WPOS, uniform float stride, uniform float pbufwidth, uniform float sortbufwidth, uniform samplerrect sortlist) : COLOR {elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; float curpos = stride; //loop over (LOGN ) search passes for (int i = 0; i < LOGN - ; i++){ stride = floor(stride * 0.); curpos = Search(curpos, elemd, stride, sortlist, sortbufwidth); } //log nth pass curpos = Search(curpos, elemd,.0, srtlist, srtbufwidth); //cleanup pass curpos = SearchFin(curpos,elemd,.0,srtlist, srtbufwidth); return curpos;} 0

51 Binary Search: Search Routines float Search(float curpos, float elem, float stride, uniform samplerrect data, float texw) { float adrd = convertdtod(curpos, texw); float val = ftexrect (data, adrd); float dir = (elem <= val)? -.0 :.0; return dir * stride + curpos; } or, for SearchFin: float dir = (elem <= val)? 0.0 :.0; instead.

52 Nearest Neighbor Search

53 Nearest Neighbor Search Given a sample point p, find the k points nearest p within a data set On the CPU, this is easily done with a heap or priority queue Can add or reject neighbors as search progresses Don t know how to build one efficiently on GPU knn-grid Can only add neighbors

54 knn-grid Algorithm sample point candidate neighbor neighbors found Want neighbors

55 knn-grid Algorithm Candidate neighbors must be within max search radius Visit voxels in order of distance to sample point sample point candidate neighbor neighbors found Want neighbors

56 knn-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius sample point candidate neighbor neighbors found Want neighbors

57 knn-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius sample point candidate neighbor neighbors found Want neighbors

58 knn-grid Algorithm Don t add neighbors outside maximum search radius Don t grow search radius when neighbor is outside maximum radius sample point candidate neighbor neighbors found Want neighbors

59 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors 9

60 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors 0

61 knn-grid Algorithm Don t expand search radius if enough neighbors already found sample point candidate neighbor neighbors found Want neighbors

62 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors

63 knn-grid Algorithm Visit all other voxels accessible within determined search radius Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors

64 knn-grid Summary sample point candidate neighbor neighbors found Want neighbors Finds all neighbors within a sphere centered about sample point May locate more than requested k-nearest neighbors Photon Mapping on Programmable Graphics Hardware, Purcell et al.

65 Outline Today Reduce review Sorting Searching Cg

66 Constant Parameters Fixed inside program Examples.9... Size of compute window Example declarations const float v = (.0,.0,.0,.0) const float pi =.9 Illegal pi =. float a = pi++

67 Uniform parameters Can be passed to a fragment program like normal parameters gets initial value from outside program before the fragment program executes Example: A counter that tracks which pass the algorithm is in. you are allowed to change uniform parameters within program

68 Math operators E.g. co s (x ) lo g (x ) po w(x,y) do t(a,b) m ul(v, M) s qrt(x ) cro s s (u, v) Using built-in ops is more efficient than writing your own

69 Swizzling and friends Swizzle v = (,-,,); // Initialize v = v.yx; // v = (-,) s = v.w; // s = Smear v = s.rrr; // v = (,,) can use xyzw or rgba, but not both at once Write masking : v = (,,,); v.ar = v; // v=(,,,-) 9

70 Swizzling and friends Swizzle v = (,-,,); v = v.yx; s = v.w; Smear v = s.rrr; can use xyzw or rgba, but not both at once Write masking v = (,,,); v.ar = v; : 0

71 The fragment pipeline float v = texd(img, float(x,y)) x Texture access is like an array lookup. The value in v can be used y to perform another lookup! This is called a dependent read Texture reads (and dependent reads) are expensive, and are limited in different GPUs. Use them wisely! Credit: Suresh Venkatasubramanian

72 The fragment pipeline Control flow: (<test>)?a:b operator. if-then-else conditional [nvx] Both branches are executed, and the condition code is used to decide which value is used to write the output register. [nv0] True conditionals for-loops and do-while [nvx] limited to what can be unrolled (i.e no variable loop limits) [nv0] True looping. WARNING: Even though nv0 has true flow control, performance will still suffer if there is no coherence Credit: Suresh Venkatasubramanian

73 The fragment pipeline out float result : COLOR // Do computation result = <final answer> Notes: Only output color can generally be modified (single float output on some GPUs) Setting different values in different channels of result can be useful for debugging limits # instructions both static (program length) and dynamic (number executed) Credit: Suresh Venkatasubramanian

74 Anatomy of a Cg Fragment Program Credit: Paul Kanyuk

75 The fragment pipeline What comes after fragment programs? Raster Operations Frame Buffer Depth/stencil happen after frag. program Blending and aggregation happen as usual Early z-culling: fragments that would have failed depth test are killed before executing fragment program. Optimization point: avoid work in the fragment program if possible. Credit: Suresh Venkatasubramanian

76 Getting data back I: Readbacks D API: OpenGL or DirectD GPU Front End Primitive Assembly Vertex Processor Readbacks transfer data from the frame buffer to the CPU. J They are very general (any buffer can be transferred) J Partial buffers can be transferred Credit: Suresh Venkatasubramanian Rasterization and Interpolation Raster Operations Frame Buffer Fragment Processor L They are slow: reverse data transfer across PCI/AGP bus is very, very slow L PCIe is better but still slow L Data mismatch: readbacks return image data, but the CPU expects vertex data (or has to load image into texture)

77 Getting data back II: Render-to-texturetexture GPU Front End Primitive Assembly Rasterization and Interpolation Raster Operations Vertex Processor Fragment Processor Render-to-texture renders directly into a texture. J J Transfer does not cross GPU- CPU boundary. Fastest way to transfer data to fragment processor L Only works with depth and color buffers (not stencil). Render-to-texture is the best method for reading data back after a computation. Credit: Suresh Venkatasubramanian

78 Using Render-to-texturetexture Using the render-texture extension is tricky. You have to set up a pbuffer context, bind an appropriate texture to it, and then render to this context. Then you have to change context and read the bound texture. You cannot write to a texture and read it simultaneously Mark Harris (NVIDIA) has written a RenderTexture class that wraps all of this. Credit: Suresh Venkatasubramanian

79 The vertex pipeline Input: vertices position, color, texture coords. Input: uniform and constant parameters. Matrices can be passed to a vertex program. Lighting/material parameters can also be passed. Credit: Suresh Venkatasubramanian 9

80 The vertex pipeline Operations: Math/swizzle ops Matrix operators Flow control (as before) [nvx] Output: No access to textures. Modified vertices (position, color) Vertex data transmitted to primitive assembly. Credit: Suresh Venkatasubramanian 0

81 Anatomy of a Cg Vertex Program Credit: Paul Kanyuk

82 Vertex programs are useful We can replace the entire geometry transformation portion of the fixedfunction pipeline. Vertex programs used to change vertex coordinates (move objects around) Shifting operations to vertex programs improves overall pipeline performance. Much of shader processing happens at vertex level. We have access to original scene geometry. Credit: Suresh Venkatasubramanian

83 Vertex programs are not useful Fragment programs allow us to exploit full parallelism of GPU pipeline ( a processor at every pixel ). Vertex programs can t read input! [nvx] Rule of thumb: If computation requires intensive calculation, it should probably be in the fragment processor. If it requires more geometric/graphic computing, it should be in the vertex processor. Credit: Suresh Venkatasubramanian

84 When might a VP need access to textures? n-body simulation: We have a force field in a texture Each vertex moves according to this force field. v = a t s = v t In each pass, all vertex coordinates are updated. New locations create new force field. How do we update vertex coordinates? Credit: Suresh Venkatasubramanian

85 Sending data back to vertex program Solution: [Pass ] Render all vertices to be stored in a texture. [Pass ] Compute force field in fragment program [Pass ] Update texture containing vertex coordinates in a fragment program using the force field. [Pass ] Retrieve vertex data from texture. How? Credit: Suresh Venkatasubramanian

86 Vertex/ ertex/pixel Buffer Objects V/P buffer objects are ways to transfer data between framebuffer/vertex arrays and GPU memory. Conceptually, V/PBO are like CPU memory, but on the GPU. Can use glreadpixels to read to PBO Can create vertex array from VBO Credit: Suresh Venkatasubramanian

87 Solution! GPU Front End Primitive Assembly Rasterization and Interpolation Programmable Fragment Processor Programmable Vertex Processor VBO/PBO Credit: Suresh Venkatasubramanian Raster Operations texture

88 NV0: Vertex programs can read textures GPU Front End Primitive Assembly Programmable Vertex Processor Rasterization and Interpolation Raster Operations Programmable Fragment Processor texture Credit: Suresh Venkatasubramanian

89 Summary of memory flow CPU Vertex program Fragment program Frame buffer Readback CPU Vertex program Fragment program Frame buffer Copy-to-Texture CPU Vertex program Fragment program Render-to-Texture Credit: Suresh Venkatasubramanian 9

90 Summary of memory flow Vertex program Fragment program VBO/PBO transfer Vertex program Fragment program nv0 texture ref in vertex program Credit: Suresh Venkatasubramanian 90

91 Acknowledgements Paul Kanyuk Suresh Venkatasubramanian Tim Purcell Mark Harris Jens Krüger Ian Buck 9

Sorting and Searching. Tim Purcell NVIDIA

Sorting and Searching. Tim Purcell NVIDIA Sorting and Searching Tim Purcell NVIDIA Topics Sorting Sorting networks Search Binary search Nearest neighbor search Assumptions Data organized into D arrays Rendering pass == screen aligned quad Not

More information

General Algorithm Primitives

General Algorithm Primitives General Algorithm Primitives Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Topics Two fundamental algorithms! Sorting Sorting

More information

GPU Memory Model. Adapted from:

GPU Memory Model. Adapted from: GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University

More information

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Performance: Bottlenecks Sources of bottlenecks CPU Transfer Processing Rasterizer

More information

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Lecture 7 Outline Last time Visibility Shading Texturing Today Texturing continued

More information

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)

More information

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Lecture 7 Outline Last time Visibility Shading Texturing Today Texturing continued

More information

The GPGPU Programming Model

The GPGPU Programming Model The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming

More information

UberFlow: A GPU-Based Particle Engine

UberFlow: A GPU-Based Particle Engine UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München Motivation Want to create, modify and render

More information

Could you make the XNA functions yourself?

Could you make the XNA functions yourself? 1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

The Way of the GPU (based on GPGPU SIGGRAPH Course)

The Way of the GPU (based on GPGPU SIGGRAPH Course) The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2016 Daniel G. Aliaga Department of Computer Science Purdue University Computer Graphics Pipeline Geometry (this is really from 20 years ago

More information

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation

Image Processing Tricks in OpenGL. Simon Green NVIDIA Corporation Image Processing Tricks in OpenGL Simon Green NVIDIA Corporation Overview Image Processing in Games Histograms Recursive filters JPEG Discrete Cosine Transform Image Processing in Games Image processing

More information

CS451Real-time Rendering Pipeline

CS451Real-time Rendering Pipeline 1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI Tutorial on GPU Programming #2 Joong-Youn Lee Supercomputing Center, KISTI Contents Graphics Pipeline Vertex Programming Fragment Programming Introduction to Cg Language Graphics Pipeline The process to

More information

1.2.3 The Graphics Hardware Pipeline

1.2.3 The Graphics Hardware Pipeline Figure 1-3. The Graphics Hardware Pipeline 1.2.3 The Graphics Hardware Pipeline A pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Lecture 2. Shaders, GLSL and GPGPU

Lecture 2. Shaders, GLSL and GPGPU Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Shaders (some slides taken from David M. course)

Shaders (some slides taken from David M. course) Shaders (some slides taken from David M. course) Doron Nussbaum Doron Nussbaum COMP 3501 - Shaders 1 Traditional Rendering Pipeline Traditional pipeline (older graphics cards) restricts developer to texture

More information

COMP371 COMPUTER GRAPHICS

COMP371 COMPUTER GRAPHICS COMP371 COMPUTER GRAPHICS SESSION 12 PROGRAMMABLE SHADERS Announcement Programming Assignment #2 deadline next week: Session #7 Review of project proposals 2 Lecture Overview GPU programming 3 GPU Pipeline

More information

Rendering Objects. Need to transform all geometry then

Rendering Objects. Need to transform all geometry then Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005

General Purpose computation on GPUs. Liangjun Zhang 2/23/2005 General Purpose computation on GPUs Liangjun Zhang 2/23/2005 Outline Interpretation of GPGPU GPU Programmable interfaces GPU programming sample: Hello, GPGPU More complex programming GPU essentials, opportunity

More information

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application

More information

Rendering Grass with Instancing in DirectX* 10

Rendering Grass with Instancing in DirectX* 10 Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

GPGPU: Parallel Reduction and Scan

GPGPU: Parallel Reduction and Scan Administrivia GPGPU: Parallel Reduction and Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 3 due Wednesday 11:59pm on Blackboard Assignment 4 handed out Monday, 02/14 Final Wednesday

More information

CIS 665 GPU Programming and Architecture

CIS 665 GPU Programming and Architecture CIS 665 GPU Programming and Architecture Homework #3 Due: June 6/09/09 : 23:59:59PM EST 1) Benchmarking your GPU (25 points) Background: GPUBench is a benchmark suite designed to analyze the performance

More information

E.Order of Operations

E.Order of Operations Appendix E E.Order of Operations This book describes all the performed between initial specification of vertices and final writing of fragments into the framebuffer. The chapters of this book are arranged

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

Rasterization Overview

Rasterization Overview Rendering Overview The process of generating an image given a virtual camera objects light sources Various techniques rasterization (topic of this course) raytracing (topic of the course Advanced Computer

More information

printf Debugging Examples

printf Debugging Examples Programming Soap Box Developer Tools Tim Purcell NVIDIA Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

12.2 Programmable Graphics Hardware

12.2 Programmable Graphics Hardware Fall 2018 CSCI 420: Computer Graphics 12.2 Programmable Graphics Hardware Kyle Morgenroth http://cs420.hao-li.com 1 Introduction Recent major advance in real time graphics is the programmable pipeline:

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

The Rasterization Pipeline

The Rasterization Pipeline Lecture 5: The Rasterization Pipeline (and its implementation on GPUs) Computer Graphics CMU 15-462/15-662, Fall 2015 What you know how to do (at this point in the course) y y z x (w, h) z x Position objects

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position

More information

Ciril Bohak. - INTRODUCTION TO WEBGL

Ciril Bohak. - INTRODUCTION TO WEBGL 2016 Ciril Bohak ciril.bohak@fri.uni-lj.si - INTRODUCTION TO WEBGL What is WebGL? WebGL (Web Graphics Library) is an implementation of OpenGL interface for cmmunication with graphical hardware, intended

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Programmable Graphics Hardware

Programmable Graphics Hardware CSCI 480 Computer Graphics Lecture 14 Programmable Graphics Hardware [Ch. 9] March 2, 2011 Jernej Barbic University of Southern California OpenGL Extensions Shading Languages Vertex Program Fragment Program

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Introduction to Shaders.

Introduction to Shaders. Introduction to Shaders Marco Benvegnù hiforce@gmx.it www.benve.org Summer 2005 Overview Rendering pipeline Shaders concepts Shading Languages Shading Tools Effects showcase Setup of a Shader in OpenGL

More information

Photon Mapping on Programmable Graphics Hardware

Photon Mapping on Programmable Graphics Hardware Graphics Hardware () M. Doggett, W. Heidrich, W. Mark, A. Schilling (Editors) Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell, Craig Donner, Mike Cammarano, Henrik Wann Jensen and Pat

More information

OpenGL Programmable Shaders

OpenGL Programmable Shaders h gpup 1 Topics Rendering Pipeline Shader Types OpenGL Programmable Shaders sh gpup 1 OpenGL Shader Language Basics h gpup 1 EE 4702-X Lecture Transparency. Formatted 9:03, 20 October 2014 from shaders2.

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

Programmable Graphics Hardware

Programmable Graphics Hardware Programmable Graphics Hardware Outline 2/ 49 A brief Introduction into Programmable Graphics Hardware Hardware Graphics Pipeline Shading Languages Tools GPGPU Resources Hardware Graphics Pipeline 3/ 49

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Matrix Operations on the GPU

Matrix Operations on the GPU Matrix Operations (thanks too ) Matrix Operations on the GPU CIS 665: GPU Programming and Architecture TA: Joseph Kider Slide information sources Suresh Venkatasubramanian CIS700 Matrix Operations Lectures

More information

Performance OpenGL Programming (for whatever reason)

Performance OpenGL Programming (for whatever reason) Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer

More information

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS

ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS Ashwin Prasad and Pramod Subramanyan RF and Communications R&D National Instruments, Bangalore 560095, India Email: {asprasad, psubramanyan}@ni.com

More information

Shader Programming 1. Examples. Vertex displacement mapping. Daniel Wesslén 1. Post-processing, animated procedural textures

Shader Programming 1. Examples. Vertex displacement mapping. Daniel Wesslén 1. Post-processing, animated procedural textures Shader Programming 1 Examples Daniel Wesslén, dwn@hig.se Per-pixel lighting Texture convolution filtering Post-processing, animated procedural textures Vertex displacement mapping Daniel Wesslén 1 Fragment

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

Lecture 13: OpenGL Shading Language (GLSL)

Lecture 13: OpenGL Shading Language (GLSL) Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 18, 2018 1/56 Motivation } Last week, we discussed the many of the new tricks in Graphics require low-level access to the Graphics

More information

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies How to Work on Next Gen Effects Now: Bridging DX10 and DX9 Guennadi Riguer ATI Technologies Overview New pipeline and new cool things Simulating some DX10 features in DX9 Experimental techniques Why This

More information

GPGPU: Beyond Graphics. Mark Harris, NVIDIA

GPGPU: Beyond Graphics. Mark Harris, NVIDIA GPGPU: Beyond Graphics Mark Harris, NVIDIA What is GPGPU? General-Purpose Computation on GPUs GPU designed as a special-purpose coprocessor Useful as a general-purpose coprocessor The GPU is no longer

More information

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies

More information

General-Purpose Computation on Graphics Hardware

General-Purpose Computation on Graphics Hardware General-Purpose Computation on Graphics Hardware Welcome & Overview David Luebke NVIDIA Introduction The GPU on commodity video cards has evolved into an extremely flexible and powerful processor Programmability

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

Programmable GPUs Outline

Programmable GPUs Outline papi 1 Outline References Programmable Units Languages Programmable GPUs Outline papi 1 OpenGL Shading Language papi 1 EE 7700-1 Lecture Transparency. Formatted 11:30, 25 March 2009 from set-prog-api.

More information

Mali Demos: Behind the Pixels. Stacy Smith

Mali Demos: Behind the Pixels. Stacy Smith Mali Demos: Behind the Pixels Stacy Smith Mali Graphics: Behind the demos Mali Demo Team: Doug Day Stacy Smith (Me) Sylwester Bala Roberto Lopez Mendez PHOTOGRAPH UNAVAILABLE These days I spend more time

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information

A GPU Accelerated Spring Mass System for Surgical Simulation

A GPU Accelerated Spring Mass System for Surgical Simulation A GPU Accelerated Spring Mass System for Surgical Simulation Jesper MOSEGAARD #, Peder HERBORG, and Thomas Sangild SØRENSEN # Department of Computer Science, Centre for Advanced Visualization and Interaction,

More information

Rationale for Non-Programmable Additions to OpenGL 2.0

Rationale for Non-Programmable Additions to OpenGL 2.0 Rationale for Non-Programmable Additions to OpenGL 2.0 NVIDIA Corporation March 23, 2004 This white paper provides a rationale for a set of functional additions to the 2.0 revision of the OpenGL graphics

More information

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research Applications of Explicit Early-Z Z Culling Jason Mitchell ATI Research Outline Architecture Hardware depth culling Applications Volume Ray Casting Skin Shading Fluid Flow Deferred Shading Early-Z In past

More information

Deconstructing Hardware Usage for General Purpose Computation on GPUs

Deconstructing Hardware Usage for General Purpose Computation on GPUs Deconstructing Hardware Usage for General Purpose Computation on GPUs Budyanto Himawan Dept. of Computer Science University of Colorado Boulder, CO 80309 Manish Vachharajani Dept. of Electrical and Computer

More information

Deferred Rendering Due: Wednesday November 15 at 10pm

Deferred Rendering Due: Wednesday November 15 at 10pm CMSC 23700 Autumn 2017 Introduction to Computer Graphics Project 4 November 2, 2017 Deferred Rendering Due: Wednesday November 15 at 10pm 1 Summary This assignment uses the same application architecture

More information

Ray Casting on Programmable Graphics Hardware. Martin Kraus PURPL group, Purdue University

Ray Casting on Programmable Graphics Hardware. Martin Kraus PURPL group, Purdue University Ray Casting on Programmable Graphics Hardware Martin Kraus PURPL group, Purdue University Overview Parallel volume rendering with a single GPU Implementing ray casting for a GPU Basics Optimizations Published

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex

More information

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming

More information

Supplement to Lecture 22

Supplement to Lecture 22 Supplement to Lecture 22 Programmable GPUs Programmable Pipelines Introduce programmable pipelines - Vertex shaders - Fragment shaders Introduce shading languages - Needed to describe shaders - RenderMan

More information

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game GDC Europe 2005 Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game Lars M. Bishop NVIDIA Embedded Developer Technology 1 Agenda GoForce 3D capabilities Strengths and weaknesses

More information

Sign up for crits! Announcments

Sign up for crits! Announcments Sign up for crits! Announcments Reading for Next Week FvD 16.1-16.3 local lighting models GL 5 lighting GL 9 (skim) texture mapping Modern Game Techniques CS248 Lecture Nov 13 Andrew Adams Overview The

More information

Comparing Reyes and OpenGL on a Stream Architecture

Comparing Reyes and OpenGL on a Stream Architecture Comparing Reyes and OpenGL on a Stream Architecture John D. Owens Brucek Khailany Brian Towles William J. Dally Computer Systems Laboratory Stanford University Motivation Frame from Quake III Arena id

More information

CS452/552; EE465/505. Clipping & Scan Conversion

CS452/552; EE465/505. Clipping & Scan Conversion CS452/552; EE465/505 Clipping & Scan Conversion 3-31 15 Outline! From Geometry to Pixels: Overview Clipping (continued) Scan conversion Read: Angel, Chapter 8, 8.1-8.9 Project#1 due: this week Lab4 due:

More information

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI

More information

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Changhao Jiang (cjiang@cs.uiuc.edu) Marc Snir (snir@cs.uiuc.edu) University of Illinois Urbana Champaign GPU becomes more powerful

More information

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for

More information

The Graphics Pipeline

The Graphics Pipeline The Graphics Pipeline Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel But you really want shadows, reflections, global illumination, antialiasing

More information

Applications of Explicit Early-Z Culling

Applications of Explicit Early-Z Culling Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of

More information

For example, could you make the XNA func8ons yourself?

For example, could you make the XNA func8ons yourself? 1 For example, could you make the XNA func8ons yourself? For the second assignment you need to know about the en8re process of using the graphics hardware. You will use shaders which play a vital role

More information

Programming with OpenGL Shaders I. Adapted From: Ed Angel Professor of Emeritus of Computer Science University of New Mexico

Programming with OpenGL Shaders I. Adapted From: Ed Angel Professor of Emeritus of Computer Science University of New Mexico Programming with OpenGL Shaders I Adapted From: Ed Angel Professor of Emeritus of Computer Science University of New Mexico Objectives Shader Programming Basics Simple Shaders Vertex shader Fragment shaders

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2016 Lectures 10 & 11 October 10th & 12th, 2016 1. Put a 3D primitive in the World Modeling 2. Figure out what color it should be 3. Position relative to the

More information

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015 CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015 Announcements Project 2 due tomorrow at 2pm Grading window

More information

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information