POWERVR MBX. Technology Overview

Similar documents
PowerVR Series5. Architecture Guide for Developers

PowerVR Hardware. Architecture Overview for Developers

PowerVR Performance Recommendations. The Golden Rules

Hidden surface removal. Computer Graphics

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

PowerVR Performance Recommendations. The Golden Rules

Drawing Fast The Graphics Pipeline

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

PVRTC & Texture Compression. User Guide

Edge Detection. Whitepaper

Hardware-driven visibility culling

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1

PowerVR Performance Recommendations The Golden Rules. October 2015

Drawing Fast The Graphics Pipeline

CS4620/5620: Lecture 14 Pipeline

High Performance Visibility Testing with Screen Segmentation

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Rendering Grass with Instancing in DirectX* 10

Parallax Bumpmapping. Whitepaper

POWERVR MBX & SGX OpenVG Support and Resources

The Rasterization Pipeline

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Drawing Fast The Graphics Pipeline

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Pipeline Operations. CS 4620 Lecture 10

CS427 Multicore Architecture and Parallel Computing

Module 13C: Using The 3D Graphics APIs OpenGL ES

Shadows in the graphics pipeline

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Visibility: Z Buffering

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

The Traditional Graphics Pipeline

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Resolve your Resolves Jon Story Holger Gruen AMD Graphics Products Group

CS 498 VR. Lecture 18-4/4/18. go.illinois.edu/vrlect18

EECE 478. Learning Objectives. Learning Objectives. Rasterization & Scenes. Rasterization. Compositing

Performance Analysis and Culling Algorithms

E.Order of Operations

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

FROM VERTICES TO FRAGMENTS. Lecture 5 Comp3080 Computer Graphics HKBU

The Traditional Graphics Pipeline

VISIBILITY & CULLING. Don t draw what you can t see. Thomas Larsson, Afshin Ameri DVA338, Spring 2018, MDH

CS 354R: Computer Game Technology

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Working with Metal Overview

Computer Graphics. Shadows

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Page 1. Area-Subdivision Algorithms z-buffer Algorithm List Priority Algorithms BSP (Binary Space Partitioning Tree) Scan-line Algorithms

The Traditional Graphics Pipeline

Enabling immersive gaming experiences Intro to Ray Tracing

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Class of Algorithms. Visible Surface Determination. Back Face Culling Test. Back Face Culling: Object Space v. Back Face Culling: Object Space.

Computer Graphics 10 - Shadows

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

RADEON X1000 Memory Controller

3D Rasterization II COS 426

Wed, October 12, 2011

Triangle Rasterization

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

CS452/552; EE465/505. Clipping & Scan Conversion

From Vertices to Fragments: Rasterization. Reading Assignment: Chapter 7. Special memory where pixel colors are stored.

Graphics Processing Unit Architecture (GPU Arch)

Computer Graphics. Lecture 9 Environment mapping, Mirroring

Visible Surface Detection. (Chapt. 15 in FVD, Chapt. 13 in Hearn & Baker)

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Pipeline Operations. CS 4620 Lecture 14

Computer Graphics. Lecture 9 Hidden Surface Removal. Taku Komura

Hidden Surfaces II. Week 9, Mon Mar 15

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Course Recap + 3D Graphics on Mobile GPUs

1.2.3 The Graphics Hardware Pipeline

Programming Graphics Hardware

Optimisation. CS7GV3 Real-time Rendering

8. Hidden Surface Elimination

The Application Stage. The Game Loop, Resource Management and Renderer Design

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you

Graphics for VEs. Ruth Aylett

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Real-Time Graphics Architecture

PVRTC Specification and User Guide

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Textures. Texture coordinates. Introduce one more component to geometry

Performance OpenGL Programming (for whatever reason)

COMP30019 Graphics and Interaction Scan Converting Polygons and Lines

Spring 2009 Prof. Hyesoon Kim

Introduction to Computer Graphics with WebGL

Render-To-Texture Caching. D. Sim Dietrich Jr.

Hidden Surface Removal

GeForce4. John Montrym Henry Moreton

8/5/2012. Introduction. Transparency. Anti-Aliasing. Applications. Conclusions. Introduction

Real-Time Shadows. Last Time? Textures can Alias. Schedule. Questions? Quiz 1: Tuesday October 26 th, in class (1 week from today!

Transcription:

POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind. Imagination Technologies and the Imagination Technologies logo are trademarks or registered trademarks of Imagination Technologies Limited. All other logos, products, trademarks and registered trademarks are the property of their respective owners. Filename : POWERVR MBX.Technology Overview.1.7f.External.doc Version : 1.7f External Issue (Package: POWERVR SDK 2.05.25.0804) Issue Date : 07 Jul 2009 Author : POWERVR POWERVR MBX 1 Revision 1.7f

Contents 1. Introduction...3 1.1. What is POWERVR?...3 2. POWERVR MBX Architecture...4 2.1. Structure...4 2.2. Tile Processing...4 2.3. TA Tile Accelerator...5 2.4. ISP - Image Synthesis Processor...6 2.5. TSP - Texture and Shading Processor...7 3. Internal True Colour...9 4. Full Scene Anti-Aliasing...10 5. Depth Complexity...11 6. Features only for Tile Based Render...13 7. Comparison with brute force...14 7.1. Traditional renderer...14 7.2. POWERVR renderer...14 7.3. Comparison...15 8. Answering public comments...16 8.1. Tile-based render parameters...16 9. Conclusion...17 List of Figures Figure 1: POWERVR chip structure...4 Figure 2: Tile Processing...5 Figure 3: A typical strip...5 Figure 4: Tile buffer...6 Figure 5: Tile data...6 Figure 6: ISP and Texture Grouping module...7 Figure 7: Texture and Shading Processor...8 Figure 8: Escape from Monkey Island Normal and 4x FSAA ed...10 Figure 9: Quake 3 Arena screenshot...11 Figure 10: Same Quake 3 Arena screenshot rendered translucently...12 Figure 11: Traditional renderer's pixel pipeline...14 Revision 1.7f 2 Technology Overview

1. Introduction Most 3D graphic hardware uses the brute force approach to render polygons on screen. Tile-based rendering is an alternative 3D rendering technique that is used in POWERVR graphic chips. This document explains why tile-based rendering and its particular POWERVR implementation allow for great performance, image quality and low power consumption and why tile-based rendering is very well suited for present and future real-time 3D graphic applications in embedded systems. 1.1. What is POWERVR? POWERVR is a 3D technology developed by Imagination Technologies. It is a display list renderer using a tile-based approach. This 3D technology is radically different from existing solutions as it uses a more subtle approach to rendering than traditional renderers. The main differences are in the way pixels are rendered on the screen: while a traditional renderer (or brute force renderer) renders everything on screen and relies on a Z-Buffer to sort the end results, POWERVR determines up-front what is visible or not and only renders what is necessary. This ingenious approach makes great savings in memory bandwidth and thus enables modern games and other graphics rich applications to run at optimal performance in memory and power limited environments. POWERVR MBX 3 Revision 1.7f

2. POWERVR MBX Architecture 2.1. Structure A chip based on POWERVR architecture is mainly composed of three modules. These modules will interact with on-board memory to convert 3D data sent by the CPU to on-screen pixels. These three modules are the Tile Accelerator (TA), the Image Synthesis Processor (ISP) and the Texture and Shading Processor (TSP). CPU TA ISP TSP MEMORY Figure 1: POWERVR chip structure TA The Tile Accelerator has the responsibility of storing the scene data and dividing the screen into tiles. This process consists of distributing the 3D data among the screen tiles. ISP The Image Synthesis Processor is the module that performs Hidden Surface Removal (HSR). It determines which pixels are visible or not. TSP The Texture and Shading Processor has the responsibility of shading and texturing pixels before they are drawn onto the frame buffer. 2.2. Tile Processing Instead of rendering each polygon on the screen until the full image is rendered, POWERVR will render each screen tile one after the other until the full image is rendered. The TA will create 3D parameters for each tile of the screen until the scene has been completely sent to the 3D hardware. The actual 3D rendering starts once all tile data is in memory. Each tile is then processed in turn: first the ISP will determine which are the visible pixels in the tile and then only these visible pixels will be rendered by the TSP. After the current tile has finished rendering the ISP starts working on the next tile until the scene is completely rendered. Revision 1.7f 4 Technology Overview

Local Memory First tile data ISP TSP Render tile Scene data TA Second tile data Third tile data ISP ISP TSP TSP Render tile Render tile Last tile data ISP TSP Render complete Figure 2: Tile Processing 2.3. TA Tile Accelerator Tiling The main advantage of tiling a scene is the reduced external memory bandwidth requirement. Other advantages such as great cache efficiency and parallel processing of localized data are also important factors. The data for each tile is stored in board memory and contains all the polygons which affect this tile The choice of a tile size corresponds to the best compromise considering current application technology, hardware design and overall cost. Storing parameters The TA creates a display list for each tile on the screen. Each tile s display list will eventually include references to all 3D data contained within the tile s boundaries. This process is performed on the fly, as each new primitive call will cause the TA to update the display lists affected by this call. The display lists contain all relevant scene data, including triangles, texture handles, multitexturing arguments, render states, etc. The scene is effectively captured to be rendered later on, hence the term scene-capture renderer. Vertex stripping In order to maximize memory usage efficiency, the TA automatically converts all primitives to strips. Strips are a primitive type requiring less memory than triangle lists. Using strips, n triangles are defined by n+2 vertices. The following figure shows a 6-triangle strip defined by 8 vertices: Figure 3: A typical strip POWERVR MBX 5 Revision 1.7f

Tile Buffer This term refers to the portion of memory that contains pointers to the triangles present in the tile. The tile buffer consists in a linked list of pointers pointing to strip data in local memory. The same strip can cross a tile boundary; in this case a pointer to the strip data will be added to the display list of both tiles. Object data (1,1) (2,1) (x-1,1) (x,1 Strip 4 (1,2) (2,2) (x-1,2) (x,2 Screen of X*Y tiles Strip n-2 Strip n-1 Strip n (1,y) (2,y) (x-1,y) (x,y) Figure 4: Tile buffer 2.4. ISP - Image Synthesis Processor Visible Pixel Determination The ISP has the responsibility of determining which pixels in a tile are visible. Hidden Surface Removal is performed on a tile per tile basis, with each tile s HSR results sent to the TSP for rasterization of visible pixels. Only the triangles affecting a tile will be processed. If a strip is shared between two tiles, say, then only the triangles contained within the current tile will be processed. Tile Buffer Object data Strip 4 Strip n-2 Strip n-1 Strip n Figure 5: Tile data Revision 1.7f 6 Technology Overview

The ISP processes all triangles affecting a tile one by one. Calculating the triangle equation and projecting a ray at each position in the triangle return accurate depth information for all pixels. This depth information is then compared with the values in the tile s depth buffer to determine whether these pixels are visible or not. HSR is done at very high speed, as each line in a tile is processed in one clock (32 pixels). This is an obvious advantage compared to immediate mode renderers, which have to perform per-pixel depth interpolation for each polygon. Another advantage of this technique is that vertical and horizontal clipping (XY clipping) is free, as HSR is only performed on the pixels we re interested in (i.e. onscreen pixels). ISP Output and Texture Grouping After the ISP has finished processing all triangles in a tile, each pixel has a tag attached to it whose role is to identify the polygon properties that should be used to texture this pixel. However, texturing is always better done when dealing with whole polygons rather than individual pixels. Indeed dealing with whole polygons sharing the same texture properties greatly improves cache memory efficiency. Because the ISP outputs spans of HSR result on a per-pixel basis, a texture grouping module has the responsibility of re-ordering this data into whole polygons. All pixels of the same texture handle are grouped together to maximize cache effectiveness. This module also acts as an input FIFO to the TSP so that both ISP and TSP are kept busy at all times. The following diagram illustrates this process on an arbitrary 8x4 tile: ISP Texture Grouping Spans HSR results Tag A Tag B Tag C Figure 6: ISP and Texture Grouping module 2.5. TSP - Texture and Shading Processor Shading and Texturing The shading and texturing module in the POWERVR pipeline behaves much like a traditional shading and texturing engine. The output spans from the Texture Grouping module tell the TSP what are the lighting and texturing parameters, and what pixels they affect in the tile. POWERVR MBX 7 Revision 1.7f

Parameters Local Memory Textures Spans TSP Frame Buffer Figure 7: Texture and Shading Processor A great feature of the TSP is that all shading and texturing takes place with 32 bits of colour precision, independent of the frame buffer colour depth. This feature is called Internal True Colour (more on this later in this document). Revision 1.7f 8 Technology Overview

3. Internal True Colour Internal True Colour is a term that refers to the internal processing of all blending operations in 32 bits colour precision. This feature is only possible on tile-based architectures where all colour processing is done in the 32 bits tile buffer. On immediate mode renderers, image quality is a direct result of the frame buffer colour depth. On a 16 bits colour depth frame buffer, multiple reads and writes will result in a loss of precision for the final colour. This problem usually creates a banding effect on the screen. While dithering is often used to try to alleviate this problem, it can potentially create worse result by giving the image a grainy look. On POWERVR, each pixel is only written once to the frame buffer, hence the final render looks much better. To illustrate this feature, here are two Serious Sam screenshots taken on a 16-bit and a 32-bit internal colour cards. Internal True Colour comparison POWERVR MBX 9 Revision 1.7f

4. Full Scene Anti-Aliasing Super-sampling is the POWERVR method for Full Scene Anti-Aliasing; it is one of the best antialiasing methods available. It produces crisp results and greatly enhances the overall look of the final render. Super-sampling is the process of rendering a frame at a higher resolution and filtering it down to the frame buffer size. POWERVR supports 2x horizontal, 2x vertical and 4x anti-aliasing. Because POWERVR is a tile-based renderer, no extra memory is required for the frame and depth buffers. i.e. they will keep their original size during the whole process. What happens in the hardware is that more tiles are rendered to match the super-sampled size but the scaling-down process is transparent. This is a clear advantage compared to immediate more renderers, which have to allocate precious video memory to cater for the super-sampled frame and depth buffers. The following are two portions of screenshot taken from LucasArts Escape from Monkey Island. The left image is normal while the right one is 4x super-sampled. Both images were scaled up 2x to allow for better comparison: Figure 8: Escape from Monkey Island Normal and 4x FSAA ed Revision 1.7f 10 Technology Overview

5. Depth Complexity What is Depth Complexity? Depth complexity (also sometimes called overdraw ) is the average number of times pixels overwrite each other in a scene. Depth complexity varies from one game to another, depending on the scene and rendering algorithms used. Fill rate Nowadays, fill rate is very often used to compare capabilities of different 3D graphics processors. Most of the time, fill rate is the limiting factor that prevents a scene to be rendered at full speed. In these cases the CPU is idle, waiting for the render to finish before it can send the parameters for the new scene to the 3D processor. A game suffering from this behaviour is called fill-rate limited. POWERVR super fill rate Because POWERVR performs HSR up-front in the pipeline, it will only render what is visible on screen. No hidden pixels will be rendered, thus enabling an effective and intelligent use of fill rate. This architecture enables POWERVR to beat the traditional bandwidth barrier and reach super fill rates. Real game example The following are two screenshots from Quake 3. The first screenshot is a normal screenshot from a scene in Quake 3 while the second renders everything translucently to get an idea of the overdraw of the scene: Figure 9: Quake 3 Arena screenshot POWERVR MBX 11 Revision 1.7f

Figure 10: Same Quake 3 Arena screenshot rendered translucently As you can see there is much more on the screen than one would think there is! Although Quake 3 uses BSP and portal techniques to attempt to reduce overdraw, the scene complexity is such that overdraw is very often quite high. As an example we measured overdraw of the Quake 3 demo001 at 3.39 (i.e. each pixel on the screen is drawn an average of 3.39 times). Some immediate mode renderers are implementing some techniques to try to reduce overdraw. Early Z test (the action of testing and rejecting a pixel very early in the pixel pipeline) is one of these techniques but because one cannot predict the rendering order of polygons on the screen, this method is still quite ineffective. The overdraw value for the same scene using early Z test is 3.14, i.e. only a small improvement compared to the normal value. What about the future? As game complexity increases, overdraw will too. Far clipping planes will become larger and larger and as such more objects will be rendered onto the screen. Real-world type environments like cities or outdoor terrains cannot benefit from efficient occlusion detection algorithms, and as such will generate a great amount of overdraw which will impact the fill-rate of immediate mode renderers. Revision 1.7f 12 Technology Overview

6. Features only for Tile Based Render There are features that naturally scale themselves very well to a tile-based architecture. These optional implementations can be incorporated depending on product design, platform and cost. True Scalability POWERVR is a very scalable architecture. For instance MBX is POWERVR implementation for handheld devices. Sega s Naomi 2 uses two POWERVR chips in parallel and these are used to render alternate tiles in the frame buffer. Super Fast front-end Super-Sampling FSAA A tile-based architecture is ideal to perform very fast FSAA. The fetching of samples would be done in the HSR module for optimal image quality and performance. Because the tile buffer memory is very fast, samples can be fetch quicker than on a traditional architecture. There are other features that scale themselves very well to a tile-based architecture but this document does not cover these. POWERVR MBX 13 Revision 1.7f

7. Comparison with brute force 7.1. Traditional renderer Let s examine the pixel pipeline of a traditional, immediate mode renderer: TEXTURE MEMORY STENCIL / DEPTH BUFFER FRAME BUFFER TEXTURING ALPHA TEST STENCIL / DEPTH TEST BLENDING Current pixel colour Figure 11: Traditional renderer's pixel pipeline The most important thing to notice in this diagram is that Texturing is performed before the depth test, which means that all pixels will be textured even if they turn out to be masked in the depth buffer (i.e. they fail the depth test). Memory accesses will be performed during the depth test and blending with the frame buffer, even if the current pixel doesn t turn out to be the final pixel at this screen location. All these memory accesses are the reason why immediate mode renderers are hitting the bandwidth barrier. 7.2. POWERVR renderer The following is a simplified pixel pipeline for POWERVR: TILE BUFFER TEXTURE MEMORY ACC. BUFFER STENCIL / DEPTH TEST TEXTURING ALPHA TEST BLENDING Final pixel colour Punch-through feedback Figure 12: POWERVR pixel pipeline The main difference resides in the fact that HSR (depth test) is performed up-front. All HSR results are kept in chip (tile) memory. Once visible pixels have been determined, texturing can take place, then alpha testing. The feedback loop to the depth test is there so that pixels succeeding the alpha test are flagged as transparent by the depth test module. Then blending with the current contents of the tile buffer is performed. Finally the pixel colour at the current screen location will be written (once) to the frame buffer. Localising data by tiling the screen enables the use of ultra-fast on-chip memory and thus to break the traditional memory bandwidth barrier. Revision 1.7f 14 Technology Overview

7.3. Comparison Imagination Technologies Copyright The following is a table comparing traditional and tile-based architectures: Immediate mode rendering For each polygon process all the pixels Texel fetch for every pixel in a polygon POWERVR Tile-based rendering For each pixel, process all polygons in a tile Paint by numbers fetch only the visible texels Multiple frame buffer accesses for blending Expensive design (embedded memory, number of pins ) Single frame buffer access to output final colour Cost-effective architecture Table 1: Immediate mode rendering vs. POWERVR Tile-based rendering POWERVR MBX 15 Revision 1.7f

8. Answering public comments In this section we ll be trying to answer common questions that sometimes get raised about tile-based rendering or POWERVR products. 8.1. Tile-based render parameters As complexity increases, tile-based renderers will run out of space to store scene parameters This assumption is usually raised when dealing with tile-based rendering. Because scene-capture renders effectively capture the whole scene parameters, some people fear that they will run out of memory to contain all the data. A key aspect of POWERVR technology is parameter management. Minimising parameter data is an important part of the technology since less memory transfers equates to more performance! Possible techniques to permit efficient storage of parameter data are culling, stripping and indexing. There are different techniques to trade overdraw for parameter space while still retaining some degree of architectural advantage. Revision 1.7f 16 Technology Overview

9. Conclusion POWERVR s tile-based rendering allow for great performance and image quality on any platform. Its intelligent architecture minimises external memory accesses and thus manages to break the traditional memory bandwidth barrier. Great features like multitexturing, internal true colour, bump mapping and texture compression enable current games to reach an unrivalled image quality, even in 16 bits colour depth. This intelligent architecture is affordable thanks to the correct choice of features and a cost-effective design. POWERVR has already solved the memory bandwidth problem when immediate mode renderers are still struggling with it. Future hardware will have to go tile-based if they want to stay competitive, as adding more pipes, memory and even chips will only succeed in dramatically increasing the cost. POWERVR is already proving today that tile-based rendering is a solution for 3D graphics; in the future it will be the only one POWERVR MBX 17 Revision 1.7f