SCIENTIFIC COMPUTING ON COMMODITY GRAPHICS HARDWARE

Size: px
Start display at page:

Download "SCIENTIFIC COMPUTING ON COMMODITY GRAPHICS HARDWARE"

Transcription

1 SCIENTIFIC COMPUTING ON COMMODITY GRAPHICS HARDWARE RUIGANG YANG Department of Computer Science, University of Kentucky, One Quality Street, Suite 854, Lexington, KY 40507, USA Driven by the need for interactive entertainment, modern PCs are equipped with specialized graphics processors (GPUs) for creation and display of images. These GPUs have become increasingly programmable, to the point that they now are capable of efficiently executing a significant number of computational kernels from non-graphical applications. In this introductory paper we first present a highlevel overview of modern graphics hardware s architecture, then introduce several applications in scientific computing that can be efficiently accelerated by GPUs. Finally we list programming tools available for application development on GPUs. 1. Introduction As the mass-market emphasis in computing has shifted from word processing and spreadsheets to interactive entertainment, computer hardware has evolved to better support these new applications. Most of the performancelimiting processing today involves creation and display of images; thus, a new entity has appeared within most computer systems. Between the system s general-purpose processor and the video frame buffer, there is now a specialized Graphic Processing Unit (GPU). Early GPUs were not really processors, but hardwired pipelines for each of the most common rendering tasks. As more complex 3D-transformations have become common in a wide range of applications, GPUs have become increasingly programmable, to the point that they now are capable of efficiently executing a significant number of computational kernels from non-graphical applications. A GPU is simpler and more efficient than a conventional PC processor (CPU) because a GPU only needs to perform a relatively simple set of array processing operations (but at a very high speed). Many problems in scientific computing, such as physically-based simulation, information retrieval, and data mining, can boil down to relatively simple matrix operations. This characteristic makes these problems ideal candidates for GPU acceleration. In this introductory paper we first present a high-level overview of modern 1

2 2 graphics hardware s architecture and its phenomenal development in recent years. Then we introduce a large array of non-graphical computational tasks, in particular, linear algebra operations, that have been successfully implemented on GPUs and obtained significant performance improvements. Finally we list programming tools available for application development on GPUs. Some of them are designed to allow programming GPUs with familiar C-like constructs and syntax, without worrying about the details of the hardware. They hold the promise of bringing the vast computational power in GPUs to the broad scientific computing community. 2. A Brief Overview of GPUs In this section, we will explain the basic architecture of GPUs and the potential advantages of using GPUs to solve scientific problems The Rendering Pipeline GPUs are dedicated processors designed specifically to handle the intense computational requirements of display graphics, i.e., rendering texts or images over 30 frames per second. As depicted in Figure 1, a modern GPU can be abstracted as a rendering pipeline for 3D computer graphics (2D graphics is just a special case) 20. Geom etric prim itives Vertex Processing Rasterization Fragm ent Processing Fram e buffer Figure 1. Rendering Pipeline The inputs to the pipeline are geometric primitives, i.e., points, lines, polygons; and the output is the framebuffer a two-dimensional array of pixels that will be displayed on screen. The first stage operates on geometric primitives described by vertices. In this vertex-processing stage vertices are transformed and lit, and primitives are clipped to a viewing volume in preparation for the next stage, rasterization. The rasterizer produces a series of framebuffer addresses and color values, each is called a fragment that represents a portion of a primitive that corresponds to a pixel in the framebuffer. Each fragment is fed to the next fragment processing stage before it finally alters the framebuffer. Operations in this stage include texture mapping, depth test, alpha blending, etc.

3 Recent Trend in GPUs Until a few years ago, commercial GPUs, such as the RealityEngine from SGI 2, implement in hardware a fixed rendering pipeline with configurable parameters. As a result their applications are restricted to graphical computations. Driven by the market demand for better realism, the current generation of commercial GPUs such as the NVIDIA GeForce FX 19 and the ATI Radeon added significant programmable functionalities in both the vertex and the fragment processing stage(stages with double-lines in Figure 1). They allow developers to write a sequence of instructions to modify the vertex or fragment output. These programs are directly executed on the GPUs to achieve comparable performance to fixed-function GPUs. In addition to programable functionalities in modern GPUs, their support for floating point output has been improving. GPUs on the market today support up to 32-bit floating point output. Such a precision is usable for many diverse applications other than computer graphics Radeon Spec int200 Benchmark P3-733Mhz Voodoo3 GeForce 256 P4-1.7Ghz Radeon 8500 P4-3.2Ghz CPU GPU Millions of Triangles per Second 1 Jul-98 Feb-99 Aug-99 Mar-00 Oct-00 Apr-01 Nov-01 May-02 Dec-02 Jun-03 Jan-04 Date Introduced Figure 2. A graph of performance increase over time for CPUs and GPUs. GPU performance has increased at a faster rate than CPUs. (Data courtesy of Anselmo Lastra). GPUs have also demonstrated a rapid improvement in performance during the past few years. In Figure 2, we plot the performance increase of both GPUs and commodity Central Processor Units (CPUs). Similar to the number of integer operations per second for CPUs, a typical benchmark to gauge a GPU s performance is the number of triangles it can process every second. We can see that GPUs have maintained a performance improvement rate of approximately 3X/year, which exceeds the performance improvement of CPUs at 1.6X/year. This is because CPUs are designed for low latency computations, while GPUs are optimized for high throughput of vertices and fragments 14 ). Low latency on memory-intensive applications typically requires large caches, 1

4 4 which use a large silicon area. Additional transistors are used to greater effect in GPU architectures because they are applied to additional functional units that increase throughput Applications of GPUs for General-Purpose Computation With the wide deployment of inexpensive yet powerful GPUs in the last several years, we have seen a surge of experimental research in using GPUs for tasks other than rendering. For example, Yang et. al. have experimented with using GPUs to solve computer visions problems 24,23 ; Holzschuch and Alonso to speed visibility queries 7 ; Hoff et. al. to compute generalized Voronoi Diagrams 8 and proximity information 9 ; and Lok to reconstruct an object s visual hull given live video from multiple cameras 15. Each of these applications obtained significant performance improvements by exploiting the speed and the inherent parallelism in modern graphics hardware. For the scope of this paper, we introduce several representative approaches to accelerate linear algebra operations on GPUs. Larsen and McAllister present a technique for large matrix-matrix multiplies using low cost graphics hardware 12. The method is an adaptation of the technique from parallel computing of distributing the computation over a logically cube-shaped lattice of processors and performing a portion of the computation at each processor. Graphics hardware is specialized in a manner that makes it well suited to this particular problem, giving faster results in some cases than using a general-purpose processor. A more complete and up-to-date implementation of dense matrix algebra is presented by Moravánszky 1. The paper of Bolz et al. shows two basic, broadly useful, computational kernels implemented on GPUs: a sparse matrix conjugate gradient solver, and a regular-grid multigrid solver 4. Performance analysis with realistic applications shows that a GPU-based implementation compares favorable over its CPU counterpart. A similar framework for implementation of linear algebra operators on GPUs is by Krüger and Westermann 11, which focuses on sparse and banded matrices. There are many other algorithms for scientific computing that have been implemented on GPUs, including FFT 17,level set 22,13, and various types of physically-based simulations 6,10,21. Interested readers are referred to for other general-purpose applications on GPUs. 4. GPU Programming Languages While many non-graphical applications on GPUs have obtained encouraging results by exploiting GPU s fast speed and high bandwidth, the development process is not trivial. Many of the existing applications are written using low level assemble languages that are directly executed on the GPU. There-

5 5 fore, novice developers are faced with a steep learning curve to master a thorough understanding of the graphics hardware and its programming interfaces, namely OpenGL 20 and DirectX 16. Fortunately, this is rapidly changing with several high-level languages available. The first is Cg a system for programming graphics hardware in a C-like language 18. It is, however, still a programming language geared towards rendering tasks and tightly coupled with graphics hardware. There are other high-level languages, such as Brook for GPUs and Sh, which allow programming GPUs with familiar constructs and syntax, without worrying about the details of the hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. Sh is a metaprogramming language that offers the convenient syntax of C++ and takes the burden of register allocation and other low-level issues away from the programmer. While these languages are not fully mature yet, they are the most promising ones to allow non-graphics researchers or developers to tap into the vast computational power in GPUs. 5. Conclusion The versatile programmability and improved floating-point precisions now available in GPUs make them useful coprocessors for scientific computing. Many non-trivial computational kernels have been successfully implemented on GPUs to receive significant acceleration. As graphics hardware continues to evolve at a faster speed than CPUs and more user-friendly high-level programming languages are becoming available, we believe communities outside computer graphics can also benefit from the fast processing speed and high bandwidth that GPUs offer. We hope this introductory paper will encourage further thinking along this direction. Acknowledgments The author would like to thank Hank Dietz for providing some of the materials in this paper. This work is supported in part by fund from the office of research at the University of Kentucky and Kentucky Science & Engineering Foundation (RDE-005). References 1. Ádám Moravánszky. Dense Matrix Algebra on the GPU. In Shaderx2: Shader Programming Tips & Tricks With Directx 9. Wordware, K. Akeley. Realityengine graphics. In Proceedings of SIGGRAPH, ATI Technologies Inc. ATI Radeon 9800, Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schrder. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. ACM Transactions on Graphics (SIGGRAPH 2003), 22(3), 2003.

6 6 5. M. Harris. Real-Time Cloud Simulation and Rendering. PhD thesis, Department of Computer Science, Univ. of North Carolina at Chapel Hill, M. Harris, W. Baxter, T. Scheuermann, and A. Lastra. Simulation of Cloud Dynamics on Graphics Hardware. In Proceedings of Graphics Hardware, pages , Nicolas Holzschuch and Laurent Alonso. Using graphics hardware to speed-up visibility queries. Journal of Graphics Tools, 5(2):33 47, Kenneth E. Hoff III, John Keyser, Ming C. Lin, Dinesh Manocha, and Tim Culver. Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware. In Proceeding of SIGGRAPH 99, pages , August Kenneth E. Hoff III, Andrew Zaferakis, Ming C. Lin, and Dinesh Manocha. Fast and simple 2d geometric proximity queries using graphics hardware. In 2001 ACM Symposium on Interactive 3D Graphics, pages , March ISBN T. Kim and M. Lin. Visual Simulation of Ice Crystal Growth. In Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2003, pages , Jens Krger and Rdiger Westermann. Linear Algebra Operators for GPU Implementation of Numerical Algorithms. ACM Transactions on Graphics (SIG- GRAPH 2003), 22(3), E. Scott Larsen and David K. McAllister. Fast Matrix Multiplies using Graphics Hardware. In Proceeding of Super Computer 2001, November A. E. Lefohn, J. Kniss, C. Hansen, and R. T. Whitaker. Interactive Deformation and Visualization of Level Set Surfaces Using Graphics Hardware. In Proceedings of IEEE Visualization, E. Lindholm, M. Kilgard, and H. Moreton. A User Programmable Vertex Engine. In Proceedings of SIGGRAPH, pages , B. Lok. Online Model Reconstruction for Interactive Virtual Environments. In Proceedings 2001 Symposium on Interactive 3D Graphics, pages 69 72, Chapel Hill, North Carolina, March Microsoft. DirectX, K. Moreland and E. Angel. The FFT on a GPU. In SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, pages , NVIDIA. Cg: C for Graphics, NVIDIA. GeForce FX, desktop.html. 20. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 1.3), S.Tomov, M.McGuigan, R.Bennett, G.Smith, and J.Spiletic. Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards. Computers & Graphics, R. Strzodka and M. Rumpf. Level set segmentation in graphics hardware. In Proceedings of the International Conference on Image Processing, Ruigang Yang and Marc Pollefeys. Multi-Resolution Real-Time Stereo on Commodity Graphics Hardware. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), pages , Ruigang Yang and Greg Welch. Fast Image Segmentation and Smoothing Using Commodity Graphics Hardware. Journal of Graphics Tools, special issue on Hardware-Accelerated Rendering Techniques, 7(4):91 100, 2003.

The GPGPU Programming Model

The GPGPU Programming Model The Programming Model Institute for Data Analysis and Visualization University of California, Davis Overview Data-parallel programming basics The GPU as a data-parallel computer Hello World Example Programming

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

A BRIEF HISTORY OF GPGPU. Mark Harris Chief Technologist, GPU Computing UNC Ph.D. 2003

A BRIEF HISTORY OF GPGPU. Mark Harris Chief Technologist, GPU Computing UNC Ph.D. 2003 A BRIEF HISTORY OF GPGPU Mark Harris Chief Technologist, GPU Computing UNC Ph.D. 2003 2 A BRIEF HISTORY OF GPGPU fd General-Purpose computation on Graphics Processing Units 3 THE FIRST GPGPU: IKONAS RDS-3000

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

ACCELERATING ROUTE PLANNING AND COLLISION DETECTION FOR COMPUTER GENERATED FORCES USING GPUS

ACCELERATING ROUTE PLANNING AND COLLISION DETECTION FOR COMPUTER GENERATED FORCES USING GPUS ACCELERATING ROUTE PLANNING AND COLLISION DETECTION FOR COMPUTER GENERATED FORCES USING GPUS David Tuft, Russell Gayle, Brian Salomon, Naga Govindaraju, Ming Lin, and Dinesh Manocha University of North

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

What s New with GPGPU?

What s New with GPGPU? What s New with GPGPU? John Owens Assistant Professor, Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Microprocessor Scaling is Slowing

More information

ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS

ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS ACCELERATING SIGNAL PROCESSING ALGORITHMS USING GRAPHICS PROCESSORS Ashwin Prasad and Pramod Subramanyan RF and Communications R&D National Instruments, Bangalore 560095, India Email: {asprasad, psubramanyan}@ni.com

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Practical Shadow Mapping

Practical Shadow Mapping Practical Shadow Mapping Stefan Brabec Thomas Annen Hans-Peter Seidel Max-Planck-Institut für Informatik Saarbrücken, Germany Abstract In this paper we propose several methods that can greatly improve

More information

Rasterization Overview

Rasterization Overview Rendering Overview The process of generating an image given a virtual camera objects light sources Various techniques rasterization (topic of this course) raytracing (topic of the course Advanced Computer

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Exploiting Graphics Hardware for Haptic Authoring

Exploiting Graphics Hardware for Haptic Authoring Book Title Book Editors IOS Press, 2003 1 Exploiting Graphics Hardware for Haptic Authoring Minho Kim a,1, Sukitti Punak a, Juan Cendan b, Sergei Kurenov b, and Jörg Peters a a Dept. CISE, University of

More information

A MATLAB Interface to the GPU

A MATLAB Interface to the GPU A MATLAB Interface to the GPU Second Winter School Geilo, Norway André Rigland Brodtkorb SINTEF ICT Department of Applied Mathematics 2007-01-24 Outline 1 Motivation and previous

More information

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS GPU-Based Volume Rendering of João L. D. Comba Cláudio T. Silva Steven P. Callahan Unstructured Grids UFRGS University of Utah University of Utah Fábio F. Bernardon UFRGS Natal - RN - Brazil XVIII Brazilian

More information

Hardware-Assisted Relief Texture Mapping

Hardware-Assisted Relief Texture Mapping EUROGRAPHICS 0x / N.N. and N.N. Short Presentations Hardware-Assisted Relief Texture Mapping Masahiro Fujita and Takashi Kanai Keio University Shonan-Fujisawa Campus, Fujisawa, Kanagawa, Japan Abstract

More information

GPU-Assisted Z-Field Simplification

GPU-Assisted Z-Field Simplification GPU-Assisted Z-Field Simplification Alexander Bogomjakov Craig Gotsman Center for Graphics and Geometric Computing, Computer Science Dept. Technion Israel Institute of Technology Haifa 32000, Israel {alexb

More information

Graphics and Imaging Architectures

Graphics and Imaging Architectures Graphics and Imaging Architectures Kayvon Fatahalian http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/ About Kayvon New faculty, just arrived from Stanford Dissertation: Evolving real-time graphics

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

GPU Memory Model. Adapted from:

GPU Memory Model. Adapted from: GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University

More information

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Lecture 2: Graphics Pipeline Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices - raster displays show images as a rectangular array

More information

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices Hearn, Baker, Carithers Raster Display Transmissive vs. Emissive Display anode

More information

General Purpose Computing on Graphical Processing Units (GPGPU(

General Purpose Computing on Graphical Processing Units (GPGPU( General Purpose Computing on Graphical Processing Units (GPGPU( / GPGP /GP 2 ) By Simon J.K. Pedersen Aalborg University, Oct 2008 VGIS, Readings Course Presentation no. 7 Presentation Outline Part 1:

More information

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

DiFi: Distance Fields - Fast Computation Using Graphics Hardware DiFi: Distance Fields - Fast Computation Using Graphics Hardware Avneesh Sud Dinesh Manocha UNC-Chapel Hill http://gamma.cs.unc.edu/difi Distance Fields Distance Function For a site a scalar function f:r

More information

Advanced Computer Graphics (CS & SE ) Lecture 7

Advanced Computer Graphics (CS & SE ) Lecture 7 Advanced Computer Graphics (CS & SE 233.420) Lecture 7 CREDITS Bill Mark, NVIDIA Programmable Graphics Technology, SIGGRAPH 2002 Course. David Kirk, GPUs and CPUs:The Uneasy Alliance, Panel Presentation,

More information

Partitioning Programs for Automatically Exploiting GPU

Partitioning Programs for Automatically Exploiting GPU Partitioning Programs for Automatically Exploiting GPU Eric Petit and Sebastien Matz and Francois Bodin epetit, smatz,bodin@irisa.fr IRISA-INRIA-University of Rennes 1 Campus de Beaulieu, 35042 Rennes,

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

A MATLAB Interface to the GPU

A MATLAB Interface to the GPU Introduction Results, conclusions and further work References Department of Informatics Faculty of Mathematics and Natural Sciences University of Oslo June 2007 Introduction Results, conclusions and further

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

The Problem: Difficult To Use. Motivation: The Potential of GPGPU CGC & FXC. GPGPU Languages

The Problem: Difficult To Use. Motivation: The Potential of GPGPU CGC & FXC. GPGPU Languages Course Introduction GeneralGeneral-Purpose Computation on Graphics Hardware Motivation: Computational Power The GPU on commodity video cards has evolved into an extremely flexible and powerful processor

More information

Fast Matrix Multiplies using Graphics Hardware

Fast Matrix Multiplies using Graphics Hardware Fast Matrix Multiplies using Graphics Hardware E. Scott Larsen Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3175 USA larsene@cs.unc.edu David McAllister

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Applications of Explicit Early-Z Culling

Applications of Explicit Early-Z Culling Applications of Explicit Early-Z Culling Jason L. Mitchell ATI Research Pedro V. Sander ATI Research Introduction In past years, in the SIGGRAPH Real-Time Shading course, we have covered the details of

More information

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI Tutorial on GPU Programming #2 Joong-Youn Lee Supercomputing Center, KISTI Contents Graphics Pipeline Vertex Programming Fragment Programming Introduction to Cg Language Graphics Pipeline The process to

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

Consumer graphics cards for fast image processing based on the Pixel Shader 3.0 standard

Consumer graphics cards for fast image processing based on the Pixel Shader 3.0 standard Consumer graphics cards for fast image processing based on the Pixel Shader 3.0 standard G. Monti, C. Lindner, F. Puente León, A. W. Koch Technische Universität München, Institute for Measurement Systems

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

2225 High Speed Machine Vision Sensing of Cotton Lint Trash

2225 High Speed Machine Vision Sensing of Cotton Lint Trash 2225 High Speed Machine Vision Sensing of Cotton Lint Trash M. G. Pelletier Agricultural Engineer USDA, ARS Cotton Production and Processing Research Unit Lubbock, TX Abstract As machine design in the

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

2.11 Particle Systems

2.11 Particle Systems 2.11 Particle Systems 320491: Advanced Graphics - Chapter 2 152 Particle Systems Lagrangian method not mesh-based set of particles to model time-dependent phenomena such as snow fire smoke 320491: Advanced

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

NVIDIA nfinitefx Engine: Programmable Pixel Shaders

NVIDIA nfinitefx Engine: Programmable Pixel Shaders NVIDIA nfinitefx Engine: Programmable Pixel Shaders The NVIDIA nfinitefx Engine: The NVIDIA nfinitefx TM engine gives developers the ability to program a virtually infinite number of special effects and

More information

1.2.3 The Graphics Hardware Pipeline

1.2.3 The Graphics Hardware Pipeline Figure 1-3. The Graphics Hardware Pipeline 1.2.3 The Graphics Hardware Pipeline A pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior

More information

From Brook to CUDA. GPU Technology Conference

From Brook to CUDA. GPU Technology Conference From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i

More information

Fast Image Segmentation and Smoothing Using Commodity Graphics Hardware

Fast Image Segmentation and Smoothing Using Commodity Graphics Hardware Fast Image Segmentation and Smoothing Using Commodity Graphics Hardware Ruigang Yang and Greg Welch Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, North Carolina

More information

Image Processing on the GPU: Implementing the Canny Edge Detection Algorithm

Image Processing on the GPU: Implementing the Canny Edge Detection Algorithm Image Processing on the GPU: Implementing the Canny Edge Detection Algorithm Yuko Roodt Highquest, Johannesburg yuko@highquest.co.za Willem Visser University of Johannesburg glasoog@gmail.com Willem A.

More information

Introduction to Shaders.

Introduction to Shaders. Introduction to Shaders Marco Benvegnù hiforce@gmx.it www.benve.org Summer 2005 Overview Rendering pipeline Shaders concepts Shading Languages Shading Tools Effects showcase Setup of a Shader in OpenGL

More information

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model

Goal. Interactive Walkthroughs using Multiple GPUs. Boeing 777. DoubleEagle Tanker Model Goal Interactive Walkthroughs using Multiple GPUs Dinesh Manocha University of North Carolina- Chapel Hill http://www.cs.unc.edu/~walk SIGGRAPH COURSE #11, 2003 Interactive Walkthrough of complex 3D environments

More information

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)

More information

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 Announcements Homework project #2 due this Friday, October

More information

General-Purpose Computation on Graphics Hardware

General-Purpose Computation on Graphics Hardware General-Purpose Computation on Graphics Hardware Welcome & Overview David Luebke NVIDIA Introduction The GPU on commodity video cards has evolved into an extremely flexible and powerful processor Programmability

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

CS452/552; EE465/505. Clipping & Scan Conversion

CS452/552; EE465/505. Clipping & Scan Conversion CS452/552; EE465/505 Clipping & Scan Conversion 3-31 15 Outline! From Geometry to Pixels: Overview Clipping (continued) Scan conversion Read: Angel, Chapter 8, 8.1-8.9 Project#1 due: this week Lab4 due:

More information

Chapter 1 Introduction

Chapter 1 Introduction Graphics & Visualization Chapter 1 Introduction Graphics & Visualization: Principles & Algorithms Brief History Milestones in the history of computer graphics: 2 Brief History (2) CPU Vs GPU 3 Applications

More information

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer

More information

A Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware

A Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware A Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware Darren M. Chitty QinetiQ Malvern Malvern Technology Centre St Andrews Road, Malvern Worcestershire, UK WR14 3PS dmchitty@qinetiq.com

More information

Neural Network Implementation using CUDA and OpenMP

Neural Network Implementation using CUDA and OpenMP Neural Network Implementation using CUDA and OpenMP Honghoon Jang, Anjin Park, Keechul Jung Department of Digital Media, College of Information Science, Soongsil University {rollco82,anjin,kcjung}@ssu.ac.kr

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

Static Scene Reconstruction

Static Scene Reconstruction GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1 Capture on campus

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V: Principles in Computer Architecture Parallelism and Locality Fall 2008 Lecture 10 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should we

More information

Programming Graphics Hardware

Programming Graphics Hardware Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline

More information

Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics

Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics Computer Graphics (CS 543) Lecture 1 (Part 1): Introduction to Computer Graphics Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) What is Computer Graphics (CG)? Computer

More information

LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware Nico Galoppo Naga K. Govindaraju Michael Henson Dinesh Manocha University of North Carolina at Chapel Hill {nico,naga,henson,dm}@cs.unc.edu

More information

Shadow Mapping for Hemispherical and Omnidirectional Light Sources

Shadow Mapping for Hemispherical and Omnidirectional Light Sources Shadow Mapping for Hemispherical and Omnidirectional Light Sources Abstract Stefan Brabec Thomas Annen Hans-Peter Seidel Computer Graphics Group Max-Planck-Institut für Infomatik Stuhlsatzenhausweg 85,

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

An Improved Study of Real-Time Fluid Simulation on GPU

An Improved Study of Real-Time Fluid Simulation on GPU An Improved Study of Real-Time Fluid Simulation on GPU Enhua Wu 1, 2, Youquan Liu 1, Xuehui Liu 1 1 Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences, Beijing, China 2 Department

More information

Fast HDR Image-Based Lighting Using Summed-Area Tables

Fast HDR Image-Based Lighting Using Summed-Area Tables Fast HDR Image-Based Lighting Using Summed-Area Tables Justin Hensley 1, Thorsten Scheuermann 2, Montek Singh 1 and Anselmo Lastra 1 1 University of North Carolina, Chapel Hill, NC, USA {hensley, montek,

More information

COMP Preliminaries Jan. 6, 2015

COMP Preliminaries Jan. 6, 2015 Lecture 1 Computer graphics, broadly defined, is a set of methods for using computers to create and manipulate images. There are many applications of computer graphics including entertainment (games, cinema,

More information

Real-Time Video-Based Rendering from Multiple Cameras

Real-Time Video-Based Rendering from Multiple Cameras Real-Time Video-Based Rendering from Multiple Cameras Vincent Nozick Hideo Saito Graduate School of Science and Technology, Keio University, Japan E-mail: {nozick,saito}@ozawa.ics.keio.ac.jp Abstract In

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Overview. A real-time shadow approach for an Augmented Reality application using shadow volumes. Augmented Reality.

Overview. A real-time shadow approach for an Augmented Reality application using shadow volumes. Augmented Reality. Overview A real-time shadow approach for an Augmented Reality application using shadow volumes Introduction of Concepts Standard Stenciled Shadow Volumes Method Proposed Approach in AR Application Experimental

More information

A Data-Parallel Genealogy: The GPU Family Tree

A Data-Parallel Genealogy: The GPU Family Tree A Data-Parallel Genealogy: The GPU Family Tree Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Outline Moore s Law brings

More information

1 Hardware virtualization for shading languages Group Technical Proposal

1 Hardware virtualization for shading languages Group Technical Proposal 1 Hardware virtualization for shading languages Group Technical Proposal Executive Summary The fast processing speed and large memory bandwidth of the modern graphics processing unit (GPU) will make it

More information

Three-Dimensional Image Warping on Programmable Graphics Hardware

Three-Dimensional Image Warping on Programmable Graphics Hardware Three-Dimensional Image Warping on Programmable Graphics Hardware Zhongding Jiang Tien-Tsin Wong Hujun Bao State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, 310027, China Department of Computer

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

CS451Real-time Rendering Pipeline

CS451Real-time Rendering Pipeline 1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does

More information

Parallel Genetic Algorithms on Programmable Graphics Hardware

Parallel Genetic Algorithms on Programmable Graphics Hardware Parallel Genetic Algorithms on Programmable Graphics Hardware Qizhi Yu 1, Chongcheng Chen 2,andZhigengPan 1,2 1 College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China qizhi.yu@gmail.com

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

Programmable Graphics Hardware

Programmable Graphics Hardware Programmable Graphics Hardware Outline 2/ 49 A brief Introduction into Programmable Graphics Hardware Hardware Graphics Pipeline Shading Languages Tools GPGPU Resources Hardware Graphics Pipeline 3/ 49

More information

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J. Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering

More information

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 4: Processing Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today Key per-primitive operations (clipping, culling) Various slides credit John Owens, Kurt Akeley,

More information

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Automatic Tuning Matrix Multiplication Performance on Graphics Hardware Changhao Jiang (cjiang@cs.uiuc.edu) Marc Snir (snir@cs.uiuc.edu) University of Illinois Urbana Champaign GPU becomes more powerful

More information

Screen Space Ambient Occlusion TSBK03: Advanced Game Programming

Screen Space Ambient Occlusion TSBK03: Advanced Game Programming Screen Space Ambient Occlusion TSBK03: Advanced Game Programming August Nam-Ki Ek, Oscar Johnson and Ramin Assadi March 5, 2015 This project report discusses our approach of implementing Screen Space Ambient

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

Real-Time Graphics Architecture

Real-Time Graphics Architecture Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Geometry Outline Vertex and primitive operations System examples emphasis on clipping Primitive

More information

Rendering Objects. Need to transform all geometry then

Rendering Objects. Need to transform all geometry then Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform

More information

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU for 3D Texture-based Volume Visualization on GPU Won-Jong Lee, Tack-Don Han Media System Laboratory (http://msl.yonsei.ac.k) Dept. of Computer Science, Yonsei University, Seoul, Korea Contents Background

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Computer Graphics CS 543 Lecture 1 (Part I) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics CS 543 Lecture 1 (Part I) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI) Computer Graphics CS 543 Lecture 1 (Part I) Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) About This Course Computer graphics: algorithms, mathematics, data structures..

More information

Shading Languages. Ari Silvennoinen Apri 12, 2004

Shading Languages. Ari Silvennoinen Apri 12, 2004 Shading Languages Ari Silvennoinen Apri 12, 2004 Introduction The recent trend in graphics hardware has been to replace fixed functionality in vertex and fragment processing with programmability [1], [2],

More information

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL Grafica Computazionale: Lezione 30 Grafica Computazionale lezione30 Introduction to OpenGL Informatica e Automazione, "Roma Tre" May 20, 2010 OpenGL Shading Language Introduction to OpenGL OpenGL (Open

More information

Rendering Subdivision Surfaces Efficiently on the GPU

Rendering Subdivision Surfaces Efficiently on the GPU Rendering Subdivision Surfaces Efficiently on the GPU Gy. Antal, L. Szirmay-Kalos and L. A. Jeni Department of Algorithms and their Applications, Faculty of Informatics, Eötvös Loránd Science University,

More information