General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)
|
|
- Cory Dixon
- 6 years ago
- Views:
Transcription
1 ME 90-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 009 Lecture
2 Outline Last time Frame buffer operations GPU programming intro Linear algebra representations Flow control Today Reduce review Sorting Searching Cg
3 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together NxN + Credit: Mark Harris
4 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... N x( N /) + Credit: Mark Harris
5 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... N x( N /) + Credit: Mark Harris
6 Parallel Reductions D parallel reduction: sum N columns or rows in parallel add two halves of texture together repeatedly... Until we re left with a single row of texels Nx Requires log N steps Credit: Mark Harris
7 Reduce Any operation that computes a single result from a data set sum min max average product...
8 Max Reduction: Reduce float max(float texcoord : TEXCOORD0, { } uniform samplerrect img) : COLOR float a, b, c, d; a = ftexrect(img, texcoord); b = ftexrect(img, texcoord + float(0,)); c = ftexrect(img, texcoord + float(,0)); d = ftexrect(img, texcoord + float(,)); return max(max(a, b), max(c, d));, Ian Buck
9 Max Reduction O( log n) passes to reduce n^ elements can increase number of reductions in fragment program to reduce number passes Credit: Mark Harris, Tim Purcell, Ian Buck 9
10 Linear Algebra Representations Vector representation D textures best we can do High texture memory bandwidth Read-write access, dependent fetches N N Credit: Jens Krüger 0
11 The fragment pipeline Input: Fragment Attributes Input: Texture Image Color R G B A Position X Y Z W Texture coordinates X Y [Z] - Interpolated from vertex information Texture coordinates X Y [Z] - X Y Z W Each element of texture is D vector bits = float bits = half Credit: Suresh Venkatasubramanian
12 Outline Today Reduce review Sorting Searching Cg
13 Assumptions Data organized into D arrays Rendering pass == screen aligned quad Not using vertex shaders PS.0 GPU No data dependent branching at fragment level
14 Sorting Given an unordered list of elements, produce list ordered by key value Kernel: compare and swap Standard sort algorithms not suited to GPUs Look at parallel sort algorithms Bitonic merge sort [Batcher ] Periodic balanced sorting networks [Dowd 9]
15 Bitonic Merge Sort Overview Repeatedly build bitonic lists and then sort them Bitonic list is two monotonic lists concatenated together, one increasing and one decreasing. List A: (,,, ) List B: (,,, ) List AB: (,,,,,,, ) monotonically increasing monotonically decreasing bitonic
16 Bitonic Merge Sort x monotonic lists: () () () () () () () () x bitonic lists: (,) (,) (,) (,)
17 Bitonic Merge Sort Sort the bitonic lists
18 Bitonic Merge Sort x monotonic lists: (,) (,) (,) (,) x bitonic lists: (,,,) (,,,)
19 Bitonic Merge Sort Sort the bitonic lists 9
20 Bitonic Merge Sort Sort the bitonic lists 0
21 Bitonic Merge Sort Sort the bitonic lists
22 Bitonic Merge Sort Bitonic Merge Sort x monotonic lists: (,,,) (,,,) x bitonic list: (,,,,,,,)
23 Bitonic Merge Sort Sort the bitonic list
24 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list
25 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list
26 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list
27 Bitonic Merge Sort Bitonic Merge Sort Sort the bitonic list
28 Bitonic Merge Sort Bitonic Merge Sort Done!
29 Bitonic Merge Sort Summary Separate rendering pass for each set of swaps O(log n) passes Each pass performs n compare/swaps Total compare/swaps: O(n log n) Limitations of GPU cost us factor of logn over best CPU-based sorting algorithms 9
30 Bitonic Merge Sort Helper Function float convertdtod(float coordd, float width) { float coordd; coordd.y = coordd/width; coordd.x = floor(frac(coordd.y) * width); coordd.y = floor(coordd.y); return coordd; } 0
31 Bitonic Merge Sort float BitonicSort(float elemd : WPOS, uniform float offset, // offset = ^(stage - ) uniform float pbufwidth, uniform float stageno, // stageno = ^stage uniform float stepno, // stepno = ^step uniform samplerrect sortedlist) : COLOR { elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; half csign = (fmod(elemd, stageno) < offset)? : -; half cdir = (fmod(floor(elemd/stepno), ) == 0)? : -; float adrd = csign * offset + elemd; float adrd = convertdtod(adrd, pbufwidth); float val0 = ftexrect(sortedlist, elemd); float val = ftexrect(sortedlist, adrd); float cmin = (val0 < val)? val0 : val; float cmax = (val0 > val)? val0 : val; return (csign == cdir)? cmin : cmax; }
32 Binary Sort float BinarySearch(float elemd : WPOS, uniform float stride, uniform float pbufwidth, uniform float sortbufwidth, uniform samplerrect sortlist) : COLOR {elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; float curpos = stride; //loop over (LOGN ) search passes for (int i = 0; i < LOGN - ; i++){ stride = floor(stride * 0.); curpos = Search(curpos, elemd, stride, sortlist, sortbufwidth); } //log nth pass curpos = Search(curpos, elemd,.0, srtlist, srtbufwidth); //cleanup pass curpos = SearchFin(curpos,elemd,.0,srtlist, srtbufwidth); return curpos;}
33 Making GPU Sorting Faster Draw several quads with similar computation instead of single quad Reduce decision making in fragment program Push work into vertex processor and interpolator Reduce computation in fragment program More than one compare/swap per sort kernel invocation Reduce computational complexity
34 Grouping Computation Grouping Computation
35 Implementation Details Specify interpolants for smaller quads down or up compare and swap distance to comparison partner Kipfer & Westermann in GPU Gems
36 Outline Today Reduce review Sorting Searching Cg
37 Types of Search Search for specific element Binary search Search for nearest element(s) k-nearest neighbor search Both searches require ordered data
38 Binary Search Find a specific element in an ordered list Implement just like CPU algorithm Assuming hardware supports long enough shaders Finds the first element of a given value If v does not exist, find next smallest element > v Search algorithm is sequential, but many searches can be executed in parallel Number of pixels drawn determines number of searches executed in parallel pixel == search v
39 Binary Search Search for v0 Initialize Search starts at center of sorted array v >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0 9
40 Binary Search Search for v0 Initialize Step v0 >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0 0
41 Binary Search Search for v0 Initialize Step Step v0 >= v0 so search left half of sub-array Sorted List v0 v0 v0 v v v v v 0
42 Binary Search Search for v0 Initialize Step Step Step 0 At this point, we either have found v0 or are element too far left One last step to resolve Sorted List v0 v0 v0 v v v v v 0
43 Binary Search Search for v0 Initialize Step Step Step Step 0 0 Done! Sorted List v0 v0 v0 v v v v v 0
44 Binary Search Search for v0 and v Initialize Search starts at center of sorted array Both searches proceed to the left half of the array Sorted List v0 v0 v0 v v v v v 0
45 Binary Search Search for v0 and v Initialize Step The search for v0 continues as before The search for v overshot, so go back to the right Sorted List v0 v0 v0 v v v v v 0
46 Binary Search Search for v0 and v Initialize Step We ve found the proper v, but are still looking for v0 Step Both searches continue Sorted List v0 v0 v0 v v v v v 0
47 Binary Search Search for v0 and v Initialize Step Step Step 0 Now, we ve found the proper v0, but overshot v The cleanup step takes care of this Sorted List v0 v0 v0 v v v v v 0
48 Binary Search Search for v0 and v Initialize Step Done! Both v0 and v are located properly Step Step 0 Step 0 Sorted List v0 v0 v0 v v v v v 0
49 Binary Search Summary Single rendering pass Fragment program Each pixel drawn performs independent search Iterates log n + times through list 9
50 Binary Search float BinarySearch(float elemd : WPOS, uniform float stride, uniform float pbufwidth, uniform float sortbufwidth, uniform samplerrect sortlist) : COLOR {elemd = floor(elemd); float elemd = elemd.y * pbufwidth + elemd.x; float curpos = stride; //loop over (LOGN ) search passes for (int i = 0; i < LOGN - ; i++){ stride = floor(stride * 0.); curpos = Search(curpos, elemd, stride, sortlist, sortbufwidth); } //log nth pass curpos = Search(curpos, elemd,.0, srtlist, srtbufwidth); //cleanup pass curpos = SearchFin(curpos,elemd,.0,srtlist, srtbufwidth); return curpos;} 0
51 Binary Search: Search Routines float Search(float curpos, float elem, float stride, uniform samplerrect data, float texw) { float adrd = convertdtod(curpos, texw); float val = ftexrect (data, adrd); float dir = (elem <= val)? -.0 :.0; return dir * stride + curpos; } or, for SearchFin: float dir = (elem <= val)? 0.0 :.0; instead.
52 Nearest Neighbor Search
53 Nearest Neighbor Search Given a sample point p, find the k points nearest p within a data set On the CPU, this is easily done with a heap or priority queue Can add or reject neighbors as search progresses Don t know how to build one efficiently on GPU knn-grid Can only add neighbors
54 knn-grid Algorithm sample point candidate neighbor neighbors found Want neighbors
55 knn-grid Algorithm Candidate neighbors must be within max search radius Visit voxels in order of distance to sample point sample point candidate neighbor neighbors found Want neighbors
56 knn-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius sample point candidate neighbor neighbors found Want neighbors
57 knn-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius sample point candidate neighbor neighbors found Want neighbors
58 knn-grid Algorithm Don t add neighbors outside maximum search radius Don t grow search radius when neighbor is outside maximum radius sample point candidate neighbor neighbors found Want neighbors
59 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors 9
60 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors 0
61 knn-grid Algorithm Don t expand search radius if enough neighbors already found sample point candidate neighbor neighbors found Want neighbors
62 knn-grid Algorithm Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors
63 knn-grid Algorithm Visit all other voxels accessible within determined search radius Add neighbors within search radius sample point candidate neighbor neighbors found Want neighbors
64 knn-grid Summary sample point candidate neighbor neighbors found Want neighbors Finds all neighbors within a sphere centered about sample point May locate more than requested k-nearest neighbors Photon Mapping on Programmable Graphics Hardware, Purcell et al.
65 Outline Today Reduce review Sorting Searching Cg
66 Constant Parameters Fixed inside program Examples.9... Size of compute window Example declarations const float v = (.0,.0,.0,.0) const float pi =.9 Illegal pi =. float a = pi++
67 Uniform parameters Can be passed to a fragment program like normal parameters gets initial value from outside program before the fragment program executes Example: A counter that tracks which pass the algorithm is in. you are allowed to change uniform parameters within program
68 Math operators E.g. co s (x ) lo g (x ) po w(x,y) do t(a,b) m ul(v, M) s qrt(x ) cro s s (u, v) Using built-in ops is more efficient than writing your own
69 Swizzling and friends Swizzle v = (,-,,); // Initialize v = v.yx; // v = (-,) s = v.w; // s = Smear v = s.rrr; // v = (,,) can use xyzw or rgba, but not both at once Write masking : v = (,,,); v.ar = v; // v=(,,,-) 9
70 Swizzling and friends Swizzle v = (,-,,); v = v.yx; s = v.w; Smear v = s.rrr; can use xyzw or rgba, but not both at once Write masking v = (,,,); v.ar = v; : 0
71 The fragment pipeline float v = texd(img, float(x,y)) x Texture access is like an array lookup. The value in v can be used y to perform another lookup! This is called a dependent read Texture reads (and dependent reads) are expensive, and are limited in different GPUs. Use them wisely! Credit: Suresh Venkatasubramanian
72 The fragment pipeline Control flow: (<test>)?a:b operator. if-then-else conditional [nvx] Both branches are executed, and the condition code is used to decide which value is used to write the output register. [nv0] True conditionals for-loops and do-while [nvx] limited to what can be unrolled (i.e no variable loop limits) [nv0] True looping. WARNING: Even though nv0 has true flow control, performance will still suffer if there is no coherence Credit: Suresh Venkatasubramanian
73 The fragment pipeline out float result : COLOR // Do computation result = <final answer> Notes: Only output color can generally be modified (single float output on some GPUs) Setting different values in different channels of result can be useful for debugging limits # instructions both static (program length) and dynamic (number executed) Credit: Suresh Venkatasubramanian
74 Anatomy of a Cg Fragment Program Credit: Paul Kanyuk
75 The fragment pipeline What comes after fragment programs? Raster Operations Frame Buffer Depth/stencil happen after frag. program Blending and aggregation happen as usual Early z-culling: fragments that would have failed depth test are killed before executing fragment program. Optimization point: avoid work in the fragment program if possible. Credit: Suresh Venkatasubramanian
76 Getting data back I: Readbacks D API: OpenGL or DirectD GPU Front End Primitive Assembly Vertex Processor Readbacks transfer data from the frame buffer to the CPU. J They are very general (any buffer can be transferred) J Partial buffers can be transferred Credit: Suresh Venkatasubramanian Rasterization and Interpolation Raster Operations Frame Buffer Fragment Processor L They are slow: reverse data transfer across PCI/AGP bus is very, very slow L PCIe is better but still slow L Data mismatch: readbacks return image data, but the CPU expects vertex data (or has to load image into texture)
77 Getting data back II: Render-to-texturetexture GPU Front End Primitive Assembly Rasterization and Interpolation Raster Operations Vertex Processor Fragment Processor Render-to-texture renders directly into a texture. J J Transfer does not cross GPU- CPU boundary. Fastest way to transfer data to fragment processor L Only works with depth and color buffers (not stencil). Render-to-texture is the best method for reading data back after a computation. Credit: Suresh Venkatasubramanian
78 Using Render-to-texturetexture Using the render-texture extension is tricky. You have to set up a pbuffer context, bind an appropriate texture to it, and then render to this context. Then you have to change context and read the bound texture. You cannot write to a texture and read it simultaneously Mark Harris (NVIDIA) has written a RenderTexture class that wraps all of this. Credit: Suresh Venkatasubramanian
79 The vertex pipeline Input: vertices position, color, texture coords. Input: uniform and constant parameters. Matrices can be passed to a vertex program. Lighting/material parameters can also be passed. Credit: Suresh Venkatasubramanian 9
80 The vertex pipeline Operations: Math/swizzle ops Matrix operators Flow control (as before) [nvx] Output: No access to textures. Modified vertices (position, color) Vertex data transmitted to primitive assembly. Credit: Suresh Venkatasubramanian 0
81 Anatomy of a Cg Vertex Program Credit: Paul Kanyuk
82 Vertex programs are useful We can replace the entire geometry transformation portion of the fixedfunction pipeline. Vertex programs used to change vertex coordinates (move objects around) Shifting operations to vertex programs improves overall pipeline performance. Much of shader processing happens at vertex level. We have access to original scene geometry. Credit: Suresh Venkatasubramanian
83 Vertex programs are not useful Fragment programs allow us to exploit full parallelism of GPU pipeline ( a processor at every pixel ). Vertex programs can t read input! [nvx] Rule of thumb: If computation requires intensive calculation, it should probably be in the fragment processor. If it requires more geometric/graphic computing, it should be in the vertex processor. Credit: Suresh Venkatasubramanian
84 When might a VP need access to textures? n-body simulation: We have a force field in a texture Each vertex moves according to this force field. v = a t s = v t In each pass, all vertex coordinates are updated. New locations create new force field. How do we update vertex coordinates? Credit: Suresh Venkatasubramanian
85 Sending data back to vertex program Solution: [Pass ] Render all vertices to be stored in a texture. [Pass ] Compute force field in fragment program [Pass ] Update texture containing vertex coordinates in a fragment program using the force field. [Pass ] Retrieve vertex data from texture. How? Credit: Suresh Venkatasubramanian
86 Vertex/ ertex/pixel Buffer Objects V/P buffer objects are ways to transfer data between framebuffer/vertex arrays and GPU memory. Conceptually, V/PBO are like CPU memory, but on the GPU. Can use glreadpixels to read to PBO Can create vertex array from VBO Credit: Suresh Venkatasubramanian
87 Solution! GPU Front End Primitive Assembly Rasterization and Interpolation Programmable Fragment Processor Programmable Vertex Processor VBO/PBO Credit: Suresh Venkatasubramanian Raster Operations texture
88 NV0: Vertex programs can read textures GPU Front End Primitive Assembly Programmable Vertex Processor Rasterization and Interpolation Raster Operations Programmable Fragment Processor texture Credit: Suresh Venkatasubramanian
89 Summary of memory flow CPU Vertex program Fragment program Frame buffer Readback CPU Vertex program Fragment program Frame buffer Copy-to-Texture CPU Vertex program Fragment program Render-to-Texture Credit: Suresh Venkatasubramanian 9
90 Summary of memory flow Vertex program Fragment program VBO/PBO transfer Vertex program Fragment program nv0 texture ref in vertex program Credit: Suresh Venkatasubramanian 90
91 Acknowledgements Paul Kanyuk Suresh Venkatasubramanian Tim Purcell Mark Harris Jens Krüger Ian Buck 9
Sorting and Searching. Tim Purcell NVIDIA
Sorting and Searching Tim Purcell NVIDIA Topics Sorting Sorting networks Search Binary search Nearest neighbor search Assumptions Data organized into D arrays Rendering pass == screen aligned quad Not
More informationGeneral Algorithm Primitives
General Algorithm Primitives Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Topics Two fundamental algorithms! Sorting Sorting
More informationGPU Memory Model. Adapted from:
GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University
More informationGeneral Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)
ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Performance: Bottlenecks Sources of bottlenecks CPU Transfer Processing Rasterizer
More informationGeneral Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)
ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Lecture 7 Outline Last time Visibility Shading Texturing Today Texturing continued
More informationData-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology
Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)
More informationGeneral Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)
ME 290-R: General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing) Sara McMains Spring 2009 Lecture 7 Outline Last time Visibility Shading Texturing Today Texturing continued
More informationThe GPGPU Programming Model
The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming
More informationUberFlow: A GPU-Based Particle Engine
UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München Motivation Want to create, modify and render
More informationCould you make the XNA functions yourself?
1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which
More informationThe Application Stage. The Game Loop, Resource Management and Renderer Design
1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationThe Way of the GPU (based on GPGPU SIGGRAPH Course)
The Way of the GPU (based on GPGPU SIGGRAPH Course) CS535 Fall 2016 Daniel G. Aliaga Department of Computer Science Purdue University Computer Graphics Pipeline Geometry (this is really from 20 years ago
More informationImage Processing Tricks in OpenGL. Simon Green NVIDIA Corporation
Image Processing Tricks in OpenGL Simon Green NVIDIA Corporation Overview Image Processing in Games Histograms Recursive filters JPEG Discrete Cosine Transform Image Processing in Games Image processing
More informationCS451Real-time Rendering Pipeline
1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does
More informationGraphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics
Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationTutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI
Tutorial on GPU Programming #2 Joong-Youn Lee Supercomputing Center, KISTI Contents Graphics Pipeline Vertex Programming Fragment Programming Introduction to Cg Language Graphics Pipeline The process to
More information1.2.3 The Graphics Hardware Pipeline
Figure 1-3. The Graphics Hardware Pipeline 1.2.3 The Graphics Hardware Pipeline A pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior
More informationNext-Generation Graphics on Larrabee. Tim Foley Intel Corp
Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationLecture 2. Shaders, GLSL and GPGPU
Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders
More informationCS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable
More informationShaders (some slides taken from David M. course)
Shaders (some slides taken from David M. course) Doron Nussbaum Doron Nussbaum COMP 3501 - Shaders 1 Traditional Rendering Pipeline Traditional pipeline (older graphics cards) restricts developer to texture
More informationCOMP371 COMPUTER GRAPHICS
COMP371 COMPUTER GRAPHICS SESSION 12 PROGRAMMABLE SHADERS Announcement Programming Assignment #2 deadline next week: Session #7 Review of project proposals 2 Lecture Overview GPU programming 3 GPU Pipeline
More informationRendering Objects. Need to transform all geometry then
Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationGeneral Purpose computation on GPUs. Liangjun Zhang 2/23/2005
General Purpose computation on GPUs Liangjun Zhang 2/23/2005 Outline Interpretation of GPGPU GPU Programmable interfaces GPU programming sample: Hello, GPGPU More complex programming GPU essentials, opportunity
More informationReal - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský
Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application
More informationRendering Grass with Instancing in DirectX* 10
Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article
More informationX. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1
X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores
More informationGPGPU: Parallel Reduction and Scan
Administrivia GPGPU: Parallel Reduction and Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 3 due Wednesday 11:59pm on Blackboard Assignment 4 handed out Monday, 02/14 Final Wednesday
More informationCIS 665 GPU Programming and Architecture
CIS 665 GPU Programming and Architecture Homework #3 Due: June 6/09/09 : 23:59:59PM EST 1) Benchmarking your GPU (25 points) Background: GPUBench is a benchmark suite designed to analyze the performance
More informationE.Order of Operations
Appendix E E.Order of Operations This book describes all the performed between initial specification of vertices and final writing of fragments into the framebuffer. The chapters of this book are arranged
More informationShaders. Slide credit to Prof. Zwicker
Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?
More informationRasterization Overview
Rendering Overview The process of generating an image given a virtual camera objects light sources Various techniques rasterization (topic of this course) raytracing (topic of the course Advanced Computer
More informationprintf Debugging Examples
Programming Soap Box Developer Tools Tim Purcell NVIDIA Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More information12.2 Programmable Graphics Hardware
Fall 2018 CSCI 420: Computer Graphics 12.2 Programmable Graphics Hardware Kyle Morgenroth http://cs420.hao-li.com 1 Introduction Recent major advance in real time graphics is the programmable pipeline:
More informationScanline Rendering 2 1/42
Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms
More informationLecture 25: Board Notes: Threads and GPUs
Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel
More informationThe Rasterization Pipeline
Lecture 5: The Rasterization Pipeline (and its implementation on GPUs) Computer Graphics CMU 15-462/15-662, Fall 2015 What you know how to do (at this point in the course) y y z x (w, h) z x Position objects
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position
More informationCiril Bohak. - INTRODUCTION TO WEBGL
2016 Ciril Bohak ciril.bohak@fri.uni-lj.si - INTRODUCTION TO WEBGL What is WebGL? WebGL (Web Graphics Library) is an implementation of OpenGL interface for cmmunication with graphical hardware, intended
More informationReal - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský
Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationProgrammable Graphics Hardware
CSCI 480 Computer Graphics Lecture 14 Programmable Graphics Hardware [Ch. 9] March 2, 2011 Jernej Barbic University of Southern California OpenGL Extensions Shading Languages Vertex Program Fragment Program
More informationDirect Rendering of Trimmed NURBS Surfaces
Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationIntroduction to Shaders.
Introduction to Shaders Marco Benvegnù hiforce@gmx.it www.benve.org Summer 2005 Overview Rendering pipeline Shaders concepts Shading Languages Shading Tools Effects showcase Setup of a Shader in OpenGL
More informationPhoton Mapping on Programmable Graphics Hardware
Graphics Hardware () M. Doggett, W. Heidrich, W. Mark, A. Schilling (Editors) Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell, Craig Donner, Mike Cammarano, Henrik Wann Jensen and Pat
More informationOpenGL Programmable Shaders
h gpup 1 Topics Rendering Pipeline Shader Types OpenGL Programmable Shaders sh gpup 1 OpenGL Shader Language Basics h gpup 1 EE 4702-X Lecture Transparency. Formatted 9:03, 20 October 2014 from shaders2.
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationProgrammable Graphics Hardware
Programmable Graphics Hardware Outline 2/ 49 A brief Introduction into Programmable Graphics Hardware Hardware Graphics Pipeline Shading Languages Tools GPGPU Resources Hardware Graphics Pipeline 3/ 49
More informationGraphics Hardware. Instructor Stephen J. Guy
Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!
More informationMatrix Operations on the GPU
Matrix Operations (thanks too ) Matrix Operations on the GPU CIS 665: GPU Programming and Architecture TA: Joseph Kider Slide information sources Suresh Venkatasubramanian CIS700 Matrix Operations Lectures
More informationPerformance OpenGL Programming (for whatever reason)
Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer
More informationReal-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS
ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS Ashwin Prasad and Pramod Subramanyan RF and Communications R&D National Instruments, Bangalore 560095, India Email: {asprasad, psubramanyan}@ni.com
More informationShader Programming 1. Examples. Vertex displacement mapping. Daniel Wesslén 1. Post-processing, animated procedural textures
Shader Programming 1 Examples Daniel Wesslén, dwn@hig.se Per-pixel lighting Texture convolution filtering Post-processing, animated procedural textures Vertex displacement mapping Daniel Wesslén 1 Fragment
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationLecture 13: OpenGL Shading Language (GLSL)
Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 18, 2018 1/56 Motivation } Last week, we discussed the many of the new tricks in Graphics require low-level access to the Graphics
More informationHow to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies
How to Work on Next Gen Effects Now: Bridging DX10 and DX9 Guennadi Riguer ATI Technologies Overview New pipeline and new cool things Simulating some DX10 features in DX9 Experimental techniques Why This
More informationGPGPU: Beyond Graphics. Mark Harris, NVIDIA
GPGPU: Beyond Graphics Mark Harris, NVIDIA What is GPGPU? General-Purpose Computation on GPUs GPU designed as a special-purpose coprocessor Useful as a general-purpose coprocessor The GPU is no longer
More informationCopyright Khronos Group, Page Graphic Remedy. All Rights Reserved
Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies
More informationGeneral-Purpose Computation on Graphics Hardware
General-Purpose Computation on Graphics Hardware Welcome & Overview David Luebke NVIDIA Introduction The GPU on commodity video cards has evolved into an extremely flexible and powerful processor Programmability
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics
More informationProgrammable GPUs Outline
papi 1 Outline References Programmable Units Languages Programmable GPUs Outline papi 1 OpenGL Shading Language papi 1 EE 7700-1 Lecture Transparency. Formatted 11:30, 25 March 2009 from set-prog-api.
More informationMali Demos: Behind the Pixels. Stacy Smith
Mali Demos: Behind the Pixels Stacy Smith Mali Graphics: Behind the demos Mali Demo Team: Doug Day Stacy Smith (Me) Sylwester Bala Roberto Lopez Mendez PHOTOGRAPH UNAVAILABLE These days I spend more time
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:
More informationA GPU Accelerated Spring Mass System for Surgical Simulation
A GPU Accelerated Spring Mass System for Surgical Simulation Jesper MOSEGAARD #, Peder HERBORG, and Thomas Sangild SØRENSEN # Department of Computer Science, Centre for Advanced Visualization and Interaction,
More informationRationale for Non-Programmable Additions to OpenGL 2.0
Rationale for Non-Programmable Additions to OpenGL 2.0 NVIDIA Corporation March 23, 2004 This white paper provides a rationale for a set of functional additions to the 2.0 revision of the OpenGL graphics
More informationApplications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research
Applications of Explicit Early-Z Z Culling Jason Mitchell ATI Research Outline Architecture Hardware depth culling Applications Volume Ray Casting Skin Shading Fluid Flow Deferred Shading Early-Z In past
More informationDeconstructing Hardware Usage for General Purpose Computation on GPUs
Deconstructing Hardware Usage for General Purpose Computation on GPUs Budyanto Himawan Dept. of Computer Science University of Colorado Boulder, CO 80309 Manish Vachharajani Dept. of Electrical and Computer
More informationDeferred Rendering Due: Wednesday November 15 at 10pm
CMSC 23700 Autumn 2017 Introduction to Computer Graphics Project 4 November 2, 2017 Deferred Rendering Due: Wednesday November 15 at 10pm 1 Summary This assignment uses the same application architecture
More informationRay Casting on Programmable Graphics Hardware. Martin Kraus PURPL group, Purdue University
Ray Casting on Programmable Graphics Hardware Martin Kraus PURPL group, Purdue University Overview Parallel volume rendering with a single GPU Implementing ray casting for a GPU Basics Optimizations Published
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex
More informationBringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games
Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming
More informationSupplement to Lecture 22
Supplement to Lecture 22 Programmable GPUs Programmable Pipelines Introduce programmable pipelines - Vertex shaders - Fragment shaders Introduce shading languages - Needed to describe shaders - RenderMan
More informationFeeding the Beast: How to Satiate Your GoForce While Differentiating Your Game
GDC Europe 2005 Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game Lars M. Bishop NVIDIA Embedded Developer Technology 1 Agenda GoForce 3D capabilities Strengths and weaknesses
More informationSign up for crits! Announcments
Sign up for crits! Announcments Reading for Next Week FvD 16.1-16.3 local lighting models GL 5 lighting GL 9 (skim) texture mapping Modern Game Techniques CS248 Lecture Nov 13 Andrew Adams Overview The
More informationComparing Reyes and OpenGL on a Stream Architecture
Comparing Reyes and OpenGL on a Stream Architecture John D. Owens Brucek Khailany Brian Towles William J. Dally Computer Systems Laboratory Stanford University Motivation Frame from Quake III Arena id
More informationCS452/552; EE465/505. Clipping & Scan Conversion
CS452/552; EE465/505 Clipping & Scan Conversion 3-31 15 Outline! From Geometry to Pixels: Overview Clipping (continued) Scan conversion Read: Angel, Chapter 8, 8.1-8.9 Project#1 due: this week Lab4 due:
More informationSqueezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques
Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI
More informationAutomatic Tuning Matrix Multiplication Performance on Graphics Hardware
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Changhao Jiang (cjiang@cs.uiuc.edu) Marc Snir (snir@cs.uiuc.edu) University of Illinois Urbana Champaign GPU becomes more powerful
More informationDX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology
DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for
More informationThe Graphics Pipeline
The Graphics Pipeline Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel Ray Tracing: Why Slow? Basic ray tracing: 1 ray/pixel But you really want shadows, reflections, global illumination, antialiasing
More informationApplications of Explicit Early-Z Culling
Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of
More informationFor example, could you make the XNA func8ons yourself?
1 For example, could you make the XNA func8ons yourself? For the second assignment you need to know about the en8re process of using the graphics hardware. You will use shaders which play a vital role
More informationProgramming with OpenGL Shaders I. Adapted From: Ed Angel Professor of Emeritus of Computer Science University of New Mexico
Programming with OpenGL Shaders I Adapted From: Ed Angel Professor of Emeritus of Computer Science University of New Mexico Objectives Shader Programming Basics Simple Shaders Vertex shader Fragment shaders
More informationEvolution of GPUs Chris Seitz
Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Fall 2016 Lectures 10 & 11 October 10th & 12th, 2016 1. Put a 3D primitive in the World Modeling 2. Figure out what color it should be 3. Position relative to the
More informationCSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015
CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015 Announcements Project 2 due tomorrow at 2pm Grading window
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationProgramming Graphics Hardware
Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline
More informationReal-Time Rendering (Echtzeitgraphik) Michael Wimmer
Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key
More information