Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Similar documents
Standard Graphics Pipeline

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Pipeline Operations. CS 4620 Lecture 14

Spring 2009 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Graphics Hardware. Instructor Stephen J. Guy

Scheduling the Graphics Pipeline on a GPU

Pipeline Operations. CS 4620 Lecture 10

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Mattan Erez. The University of Texas at Austin

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Lecture 25: Board Notes: Threads and GPUs

Shaders. Slide credit to Prof. Zwicker

2.11 Particle Systems

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

GeForce3 OpenGL Performance. John Spitzer

GeForce4. John Montrym Henry Moreton

Graphics Processing Unit Architecture (GPU Arch)

GPU Architecture. Michael Doggett Department of Computer Science Lund university

CS4620/5620: Lecture 14 Pipeline

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Comparing Reyes and OpenGL on a Stream Architecture

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Windowing System on a 3D Pipeline. February 2005

GPU Architecture and Function. Michael Foster and Ian Frasch

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Could you make the XNA functions yourself?

CS427 Multicore Architecture and Parallel Computing

frame buffer depth buffer stencil buffer

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Portland State University ECE 588/688. Graphics Processors

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

OpenGL Programmable Shaders

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

The Rasterization Pipeline

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

CS452/552; EE465/505. Clipping & Scan Conversion

Optimisation. CS7GV3 Real-time Rendering

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

The NVIDIA GeForce 8800 GPU

Rasterization. CS4620 Lecture 13

Real-Time Graphics Architecture

Antonio R. Miele Marco D. Santambrogio

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

CS451Real-time Rendering Pipeline

Efficient and Scalable Shading for Many Lights

The Rasterization Pipeline

Working with Metal Overview

The Graphics Pipeline

Rasterization Overview

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

GPGPU introduction and network applications. PacketShaders, SSLShader

Scanline Rendering 2 1/42

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Surface shading: lights and rasterization. Computer Graphics CSE 167 Lecture 6

The Application Stage. The Game Loop, Resource Management and Renderer Design

EECS 487: Interactive Computer Graphics

Readings on graphics architecture for Advanced Computer Architecture class

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

High-Quality Surface Splatting on Today s GPUs

CSE4030 Introduction to Computer Graphics

Drawing Fast The Graphics Pipeline

Lecturer Athanasios Nikolaidis

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Introduction to Modern GPU Hardware

Drawing Fast The Graphics Pipeline

PowerVR Series5. Architecture Guide for Developers

Lecture 2. Shaders, GLSL and GPGPU

Programming Graphics Hardware

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Ciril Bohak. - INTRODUCTION TO WEBGL

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Evolution of GPUs Chris Seitz

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Deferred Rendering Due: Wednesday November 15 at 10pm

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

The Pixel-Planes Family of Graphics Architectures

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

Programmable Graphics Hardware

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV

Real-Time Graphics Architecture

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Transcription:

Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008 Steve Marschner 4

NASA Ames Army Research Lab IES NIST 2008 Steve Marschner 5 2008 Steve Marschner 6 Autodesk Maya (Wikimedia Aaron1a12) 2008 Steve Marschner 7 ID Software Quake 4 (2005) 2008 Steve Marschner 8

Valve Half Life 2 (2006) TimeGate Studios F.E.A.R. Extraction Point (2006) 2008 Steve Marschner 9 2008 Steve Marschner 11 2008 Steve Marschner 10 Ubisoft Assassin s Creed (2007) Valve Portal (2007) 2008 Steve Marschner 12

How To Draw a Triangle, c. 1985 Transform vertices to screen coordinates Fill all the pixels with the triangle s color 2008 Steve Marschner 14 How To Draw a Triangle, c. 1988 How To Draw a Triangle, c. 1992 Perform lighting calculations to find vertex colors Transform vertices to screen coordinates Fill all unoccluded pixels with the interpolated vertex colors and depth Perform lighting calculations to find vertex colors Transform vertices to screen coordinates Look up a texture map value Fill all unoccluded pixels with a function of the texture and the interpolated vertex colors, as well as the depth 2008 Steve Marschner 15 2008 Steve Marschner 16

How To Draw a Triangle, c. 1999 Pixar s RenderMan Perform elaborate lighting calculations to find vertex colors Transform vertices to screen coordinates Look up a value from one or more 1D, 2D, or 3D texture maps Fill all unoccluded pixels with a complicated, adjustable function of the textures and the interpolated vertex colors, as well as the depth 2008 Steve Marschner 17 Pixar Ratatouille (2007) 2008 Steve Marschner 18 How To Draw a Triangle in 2008 Execute a vertex program over all the vertices Execute a fragment program over all those pixels Fill all unoccluded pixels with the resulting color and depth 2008 Steve Marschner 19

Development of Hardware Capabilities Workstation era 85 87: transform and render flat-shaded points, lines, polygons (no z buffer) 88 91: transform, light, and render smooth shaded polygons 92 : transform, light, and render texture-mapped, antialiased polygons PC era 95 98: render texture-mapped polygons 99 00: transform, light, and render texture-mapped, antialiased polygons 01 06: execute vertex and fragment shaders over antialiased polygons 07 : execute geometry, vertex, and fragment shaders over antialiased polygons App Vertex Rasterize Fragment Blend Frame buffer [International Technology Roadmap for Semiconductors 2007] 2008 Steve Marschner 21 2008 Steve Marschner 22 SGI RealityEngine Architecture (1992) [International Technology Roadmap for Semiconductors 2007] System Bus Command Processor Engines Triangle Bus Fragment Generators Image Engines raster memory board display generator board geometry board raster memory board video Figure 1. Board-level block diagram of an intermediate configuration with 8 Engines on the geometry board, 2 raster memory boards, and a display generator board. 2008 Steve Marschner 23 2008 Steve Marschner 24

Engine processors in a MIMD arrangement. Each Raster Memory board comprises a single fragment generator with a single copy of texture memory, 80 image engines, and enough framebuffer memory to allocate 512 bits per pixel to a 1280x1024 framebuffer. The display generator board contains hardware to drive up to eight display output channels, each with its own video timing generator, video resize hardware, gamma correction, and digital-to-analog conversion hardware. programmed I/O (PIO). With the InfiniteReality system, display list processing is handled in two ways. First, compiled display list objects are stored in host memory in such a way that leaf display objects can be pulled into the graphics subsystem using DMA transfers set up by the Host Interface Processor (Figure 1). Because DMA transfers are faster and more efficient than PIO, this technique significantly reduces the computational load on the host processor so it can be better utilized for application computations. However, on the original Onyx system, DMA transfers alone were not fast enough to feed the graphics pipe at the rate at which it could consume data. The solution was to incorporate local display list processing into the design. SGI InfiniteReality Architecture (1996) Systems can be configured with one, two or four raster memory boards, resulting in one, two, or four fragment generators and 80, 160, or 320 image engines. Host System Bus Board 430_gems2_ch30_new.qxp 1/31/2005 6:56 PM Page 474 NVIDIA GeForce 6 Architecture (2005) Host Interface Processor Engine Engine Engine Engine!Raster FIFO Vertex Bus Fragment Generator Fragment Generator Fragment Generator Fragment Generator Raster Memory Board Raster Memory Board Raster Memory Board Raster Memory Board Image Engines Generator Board De!Interleaver Vid Figure 1: Board-level block diagram of the maximum configuration with 4 Engines, 4 Raster Memory boards, and a Generator board with 8 output channels. Attached to the Host Interface Processor is 16MB of synchronous dynamic RAM (SDRAM). Approximately 15MB of this memory is available to cache leaf display list objects. Locally stored display lists are traversed and processed by an embedded RISC core. Based on a priority specified using an OpenGL extension and the size of the display list object, the OpenGL display list manager determines whether or not a display list object should be cached locally on the board. Locally cached display lists are read at the maximum rate that can be consumed by the remainder of the InfiniteReality pipeline. As a result, the local display list provides a mechanism to mitigate the host to graphics I/O bottleneck of the original Onyx. Note that if the total size of leaf display list objects exceeds the resident 15MB limit, then some number of objects will be pulled from host memory at the reduced rate. 2.2 Distribution The Distributor (Figure 1) passes incoming data and commands from the Host Interface Processor to individual Engines for further processing. The hardware supports both round-robin and least-busy distribution schemes. Since geometric processing requirements can vary from one vertex to another, a least-busy distribution scheme has a slight performance advantage 2008 Steve Marschner 25 command, an identifiercornell CS569 SpringFigure 200830-3. Lecture A Block1Diagram of the GeForce 6 Series Architecture over round-robin. With each is included which the -Raster FIFO (Figure 1) uses to recreate the original order of incoming primitives. 2.3 Engines When we began the design of the InfiniteReality system, it became apparent that no commercial off-the-shelf floating point processors were being developed which would offer suitable price/performance. As a result, we chose to implement the Engine Processor as a semicustom application specific integrated circuit (ASIC). NVIDIA GeForce 8800 Architecture (2007)! Figure 1. The vertex processors (sometimes called vertex shaders ), shown in Figure 30-4, allow for a program to be applied to each vertex in the object, performing transformations, ATI Radeon HD 2900 Architecture (2007) skinning, and any other per-vertex operation the user specifies. For the first time, a 474 Chapter 30 The GeForce 6 Series GPU Architecture Excerpted from GPU Gems 2 Copyright 2005 by NVIDIA Corporation [NVIDIA Corporation] The heart of the Engine is a single instruction multiple datapath (SIMD) arrangement of three floating point cores, each of which comprises an ALU and a multiplier plus a 32 word register 2008 Steve Marschner 26 First, commands, textures, and vertex data are received from the host CPU through shared buffers in system memory or local frame-buffer memory. A command stream is written by the CPU, which initializes and modifies state, sends rendering commands, and references the texture and vertex data. Commands are parsed, and a vertex fetch unit is used to read the vertices referenced by the rendering commands. The commands, vertices, and state changes flow downstream, where they are used by subsequent pipeline stages. 2.1 Host Interface There were significant system constraints that influenced the architectural design of InfiniteReality. Specifically, the graphics system had to be capable of working on two generations of host platforms. NVIDIA GeForce 8800 Architecture Technical Briefsignificantly from the shared memory multiprothe Onyx2 differs cessor Onyx in that it is a distributed shared memory multiprocessor system with cache-coherent non-uniform memory access. The! most significant difference in the graphics system design is that the Onyx2 provides twice the host-to-graphics bandwidth (400MB/sec vs. 200MB/sec) as does Onyx. Our challenge was to design a system that would be matched to the host-to-graphics data rate of the Onyx2, but still provide similar performance with the limited I/O capabilities of Onyx. [GPU Gems 2] Distributor GeForce 8800 GTX block diagram 2008 Steve Marschner 27 "#$%&!'#(()*'+%,,!-+./(01+!2,,!&3+!4#()!-+&20,.!#5!6047(+!8!9+()!.3#(&,):!;#<=2(+-! &#!&3+!>+6#(/+!?@AA!>BCD!2!.0$4,+!>+6#(/+!EEAA!>BC!>FG!-+,09+(.!HI!&3+! =+(5#(<2$/+!#$!/7((+$&!2==,0/2&0#$.D!'0&3!7=!&#!88I!./2,0$4!<+2.7(+-!0$!/+(&20$!.32-+(!#=+(2&0#$.J!K.!57&7(+!42<+.!1+/#<+!<#(+!.32-+(!0$&+$.09+D!'+!+L=+/&!&3+! >+6#(/+!EEAA!>BC!&#!.7(=2..!"0(+/&C!@M/#<=2&01,+!>FG!2(/30&+/&7(+.!0$! 2008 Steve Marschner 28

The Digital Michelangelo Project Stanford University 2008 Steve Marschner 30 The Digital Michelangelo Project Stanford University The Digital Michelangelo Project Stanford University 2008 Steve Marschner 31 2008 Steve Marschner 32