Performance OpenGL Programming (for whatever reason)

Similar documents
Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Clipping & Culling. Lecture 11 Spring Trivial Rejection Outcode Clipping Plane-at-a-time Clipping Backface Culling

Spring 2011 Prof. Hyesoon Kim

Sign up for crits! Announcments

Graphics Processing Unit Architecture (GPU Arch)

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

E.Order of Operations

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Spring 2009 Prof. Hyesoon Kim

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

Graphics Hardware. Instructor Stephen J. Guy

VISIBILITY & CULLING. Don t draw what you can t see. Thomas Larsson, Afshin Ameri DVA338, Spring 2018, MDH

Pipeline Operations. CS 4620 Lecture 14

Hardware-driven Visibility Culling Jeong Hyun Kim

CS451Real-time Rendering Pipeline

Could you make the XNA functions yourself?

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Adaptive Point Cloud Rendering


Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Optimisation. CS7GV3 Real-time Rendering

Pipeline Operations. CS 4620 Lecture 10

Triangle Rasterization

Today s Agenda. Basic design of a graphics system. Introduction to OpenGL

CS 498 VR. Lecture 18-4/4/18. go.illinois.edu/vrlect18

Visibility and Occlusion Culling

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

2.11 Particle Systems

LOD and Occlusion Christian Miller CS Fall 2011

Motivation. Culling Don t draw what you can t see! What can t we see? Low-level Culling

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Shadows in the graphics pipeline

Deferred Rendering Due: Wednesday November 15 at 10pm

CS452/552; EE465/505. Clipping & Scan Conversion

The Application Stage. The Game Loop, Resource Management and Renderer Design

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Models and Architectures

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

CS 4620 Program 3: Pipeline

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

CS4620/5620: Lecture 14 Pipeline

1.2.3 The Graphics Hardware Pipeline

Texture Mapping. Mike Bailey.

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

OpenGL: Open Graphics Library. Introduction to OpenGL Part II. How do I render a geometric primitive? What is OpenGL

CS 381 Computer Graphics, Fall 2008 Midterm Exam Solutions. The Midterm Exam was given in class on Thursday, October 23, 2008.

Computer Graphics. Bing-Yu Chen National Taiwan University The University of Tokyo

Sung-Eui Yoon ( 윤성의 )

CSE4030 Introduction to Computer Graphics

Chapter 1 Introduction

3D Rendering Pipeline

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Wed, October 12, 2011

Drawing Fast The Graphics Pipeline

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Projection Matrix Tricks. Eric Lengyel

CSE 167: Introduction to Computer Graphics Lecture #4: Vertex Transformation

CS535 Fall Department of Computer Science Purdue University

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

TDA362/DIT223 Computer Graphics EXAM (Same exam for both CTH- and GU students)

Lecture 2. Shaders, GLSL and GPGPU

Course Recap + 3D Graphics on Mobile GPUs

CS 381 Computer Graphics, Fall 2012 Midterm Exam Solutions. The Midterm Exam was given in class on Tuesday, October 16, 2012.

The Rendering Pipeline (1)

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Culling. Computer Graphics CSE 167 Lecture 12

Rendering Objects. Need to transform all geometry then

Graphics Pipeline & APIs

Introduction to Computer Graphics with WebGL

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

3D Programming. 3D Programming Concepts. Outline. 3D Concepts. 3D Concepts -- Coordinate Systems. 3D Concepts Displaying 3D Models

Rasterization Overview

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Scanline Rendering 2 1/42

Class of Algorithms. Visible Surface Determination. Back Face Culling Test. Back Face Culling: Object Space v. Back Face Culling: Object Space.

To Do. Computer Graphics (Fall 2008) Course Outline. Course Outline. Methodology for Lecture. Demo: Surreal (HW 3)

CHAPTER 1 Graphics Systems and Models 3

CS 130 Exam I. Fall 2015

Intro to OpenGL III. Don Fussell Computer Science Department The University of Texas at Austin

Drawing Fast The Graphics Pipeline

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

CS 354R: Computer Game Technology

CSE 167: Introduction to Computer Graphics Lecture #5: Projection. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2017

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Computer Graphics. Shadows

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Hidden surface removal. Computer Graphics

Computing Visibility. Backface Culling for General Visibility. One More Trick with Planes. BSP Trees Ray Casting Depth Buffering Quiz

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

Graphics Pipeline & APIs

Shadow Techniques. Sim Dietrich NVIDIA Corporation

Introduction to Shaders.

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

GeForce4. John Montrym Henry Moreton

Transcription:

Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer if the CPU cannot compute the data, read the data, uncompress the data, or call the graphics routines fast enough, then it doesn t matter how fast your graphics card is. 2. The bus a slow bus will choke down transmission of graphics from the CPU to the graphics card. AGP is dead. Long live PCI Express! :-) Type of Board Speed to the Board Speed from the Board PCI 132 Mb/sec 132 Mb/sec AGP 8X 2 Gb/sec 264 Mb/sec PCI Express 4 Gb/sec 4 Gb/sec 3. The vertex processing a graphics scene will bottleneck here if it has lots of small primitives. This is often how CAD-type applications are characterized. mjb -- January 31, 2009 1

4. The rasterizer and fragment processing (i.e., per-pixel operations) a graphics scene will bottleneck here if it has a small number of large primitives. Games and flight simulators generally work this way. Part of the trick is to achieve a balance of the load between these four elements. General Performance Principles glfinish() blocks until the graphics card is finished with what has been sent to it so far. Use it to generate benchmark timing information. Take it out for production. Never, ever, use it in production code. Understand the nature of your scene against the four places a graphics scene can become bottlenecked. Benchmark if you are not sure. Vertex Processing Principles Quickly pre-eliminate as much scenery as possible. (There is a dividing line here. If it takes too much computation time to eliminate a section of the scene, it might be better to display it and let the graphics hardware do it.) Cull based on comparing bounding volumes with the viewing frustum -- try to achieve quick trivial rejections F L R N The bounding volume is out-of-bounds if all 8 points are out-of-bounds for the same reason The Cohen-Sutherland algorithm: Clip Code L R B T N F 32 16 8 4 2 1 mjb -- January 31, 2009 2

If you inclusive-or ( ) all 8 clip codes together and get 0, then the bounding box is completely inside the viewing volume If you and ( & ) all 8 clip codes together and get! 0, then the bounding box is completely outside the viewing volume. Sometimes a bounding sphere or bounding cylinder makes more sense. It depends on what bounding geometry most tightly fits your scene. Break big things into smaller things so that more can be cleanly culled hierarchical culling. Remove part of the scene while making the player think it is part of the game :-) Minimize all state changes. Group similar-attribute primitives (e.g., color) together. Note: this sometimes flies in the face of object-oriented programming. State: current setting of all OpenGL attributes, such as color, lighting, texture, transformation, etc. Pipeline: graphics hardware is a multi-step process. It is good for performance if you can keep all steps busy. Changing state usually causes the pipeline to completely clear out before any new geometry can enter, which kills performance. Group similar primitives in the same glbegin() glend(). mjb -- January 31, 2009 3

Disable lighting, or simulate it in a texture Use directional lights, not positional Use quad strips instead of quads transform less points Use triangle strips instead of triangles, and instead of quad anythings. (The vendors say this, but I am not sure it is true ) Better: use vertex arrays only transform each vertex once Best: use vertex objects vertex data gets stored on the graphics card. Maximize the size of vertex array/object blocks Small batches of geometry can kill performance 100 triangles/batch should be a minimum >= 500 would be better Some game development companies say they try to use a size of 10,000 vertices or more Use multiple levels of detail, depending on how much of the screen a set of primitives occupies. Use occlusion testing to determine if this is necessary. Pre-transform objects which are undergoing constant ( static ) transformations. E.g., your vector cloud. (I.e., don t use a pile of glpushmatrix( ) - glrotatef( ) glcalllist( ) glpopmatrix( ) calls.) Pre-compute as much geometry as you can Use display lists (some drivers implement them in host memory, some in graphics card memory) Let one frame finish drawing while setting up to draw the next one, i.e., don t use glfinish( ). Level of Detail How far away is the object from the viewer? If it s far away, you don t need to use as much geometric detail to display it. Occlusion querying can help. (Occlusion query lets you pretend-render a bounding box, and the graphics processor will tell you how many pixels it would have occupied.) Don t use up your polygon budget if you don t need to Keep several representations for different components of the scene and switch between them. mjb -- January 31, 2009 4

Example #1: Different resolutions of GLUT spheres Example #2: Polygon mesh decimation Level-of-detailing can also be done as a function of the speed with which an object is moving in the scene. Creative Texturing Reduces polygon count without apparent loss of detail Textures can be used in Level of Detail analysis Make plywood sets instead of 3D scene detail E.g., a distant forest or mountains Can bring in full 3D detail as you get closer Build lighting into the texture image, and then display it as a GL_REPLACE texture Render 3D scenes into an image and use it as a texture This is referred to as render-to-texture. E.g., creating the distant forest or mountains Fog Tricks Use fog to cover up a close-in far clipping plane fog and haze can hide the fact that you are far-clipping away a lot of scene detail. And you thought it was just part of the game :-) mjb -- January 31, 2009 5

Real Object color Fog color eye Far clipping plane Distance object is in front of the eye Also, an object in the haze can be displayed with less scene detail Another way to get away with a close-in far clipping plane: bring distant detail in with alpha blending (transparency). Fragment Processing Principles Quickly pre-eliminate as much scenery as possible. (There is a dividing line here. If it takes too much computation to eliminate a section of the scene, it might be better to display it and let the graphics hardware do it.) Use unsigned bytes for pixel formats (not floating point, even though you can do more with floating point pixel formats). Cull backfaces Use Texture Objects (load all textures into the graphics card memory, then query to see how they fit) Assign intelligent priorities to textures Use alpha testing to reject transparent pixels Depth sort primitives front-to-back so maximum number of pixels will get depth-rejected Use texture objects. mjb -- January 31, 2009 6

Benchmarking Run timing tests by using glfinish() in Display()to completely clear out the pipeline. glfinish(); int t0 = glutget( GLUT_ELAPSED_TIME ); << All Graphics Calls except Swapping the Double Buffers>> glfinish(); int t1 = glutget( GLUT_ELAPSED_TIME ); glutswapbuffers(); fprintf( stderr, "One display frame took %d milliseconds\n", t1 t0 ); Definitely remove this for production runs!! Graphics cards today are really fast. Make your test size really big so that the precision of the system clock is not an issue. Be careful about just putting a for( ) loop around a set of code. Some compilers will optimize that down to just one loop. If the time seems to be independent of the number of loops, fool the compiler by translating the scene based on the loop index. Benchmarking is worthwhile to do. You sometimes get exactly what you thought you d get. But, sometimes you get wildly unexpected results, such as this 56 52 Sys 48 AGP 44 40 36 32 28 24 20 16 12 8 4 0 MTri / Sec 1 OpenGL Shaders 8 64 200 400 600 900 2000 Strip Width 4096 7000 9000 16384 30000 40000 55000 65536 Credit: Daniel New Be aware that you can now place your own code in the per-vertex and per-fragment parts of the graphics hardware. This can allow you to move operations to the graphics card that used to need to be on the CPU. mjb -- January 31, 2009 7

Application OpenGL Command Vertex Processing Rasterizing Fragment Processing Display Geometry Processing This can be used, for example, to: Create geometry from an equation. That is, start with a mesh of points and displace each one according to a displacement equation. Add detail to geometry. Start with a simple object and add detail depending on what is needed for how far you are zoomed in. mjb -- January 31, 2009 8

Create apparent geometry from an equation. Perturb the normals so that the lighting makes it look like there is bumpy geometry (= bump mapping ). This has been used for mountains on a globe, ripples in a pond, etc. Create special surface effects, such as clouds, wood grain, marble, a screen, or corrosion.. You can also do interesting visualization things in shaders and take advantage of GPU speed-ups: mjb -- January 31, 2009 9

Etc, etc. We will talk lots more about this in CS 519 next quarter. mjb -- January 31, 2009 10