Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

Similar documents
A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

An Effective Pixel Rasterization Pipeline Architecture for 3D Rendering Processors

A New Bandwidth Reduction Method for Distributed Rendering Systems

An Effective Hardware Architecture for Bump Mapping Using Angular Operation

Spring 2009 Prof. Hyesoon Kim

Module 13C: Using The 3D Graphics APIs OpenGL ES

Spring 2011 Prof. Hyesoon Kim

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Hot Chips Bringing Workstation Graphics Performance to a Desktop Near You. S3 Incorporated August 18-20, 1996

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

Graphics Processing Unit Architecture (GPU Arch)

Graphics Hardware. Instructor Stephen J. Guy

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

PowerVR Hardware. Architecture Overview for Developers

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don

Hardware-driven visibility culling

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST

CIS 581 Interactive Computer Graphics

Lecture 2. Shaders, GLSL and GPGPU

PowerVR Series5. Architecture Guide for Developers

Design and Optimization of Geometry Acceleration for Portable 3D Graphics

Hardware-driven Visibility Culling Jeong Hyun Kim

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

3-D Accelerator on Chip

CS427 Multicore Architecture and Parallel Computing

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

Rasterization Overview

A Bandwidth Reduction Scheme for 3D Texture-Based Volume Rendering on Commodity Graphics Hardware

POWERVR MBX. Technology Overview

Drawing Fast The Graphics Pipeline

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

CS451Real-time Rendering Pipeline

High-Quality Surface Splatting on Today s GPUs

Drawing Fast The Graphics Pipeline

The Rasterization Pipeline

Vertex Shader Design I

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Programming Graphics Hardware

Texture Mapping and Special Effects

2.11 Particle Systems

Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo

CS4620/5620: Lecture 14 Pipeline

3D Rendering Pipeline

CIS 581 Interactive Computer Graphics (slides based on Dr. Han-Wei Shen s slides) Requirements. Reference Books. Textbook

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

1.2.3 The Graphics Hardware Pipeline

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Computer Graphics. Shadows

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

Models and Architectures

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

The Application Stage. The Game Loop, Resource Management and Renderer Design

COMP30019 Graphics and Interaction Rendering pipeline & object modelling

Lecture outline. COMP30019 Graphics and Interaction Rendering pipeline & object modelling. Introduction to modelling

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside

Surface Graphics. 200 polys 1,000 polys 15,000 polys. an empty foot. - a mesh of spline patches:

CS 130 Final. Fall 2015

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

E.Order of Operations

ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

GeForce4. John Montrym Henry Moreton

Lecture 25: Board Notes: Threads and GPUs

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Textures. Texture coordinates. Introduce one more component to geometry

Graphics and Interaction Rendering pipeline & object modelling

CSE4030 Introduction to Computer Graphics

Evolution of GPUs Chris Seitz

Drawing Fast The Graphics Pipeline

Pipeline Operations. CS 4620 Lecture 10

CHAPTER 1 Graphics Systems and Models 3

CS GAME PROGRAMMING Question bank


Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

A New Bandwidth Reduction Method for Distributed Rendering System

Dave Shreiner, ARM March 2009

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Real-Time Graphics Architecture

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Triangle Rasterization

Sung-Eui Yoon ( 윤성의 )

Enabling immersive gaming experiences Intro to Ray Tracing

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Lets assume each object has a defined colour. Hence our illumination model is looks unrealistic.

Mattan Erez. The University of Texas at Austin

Standard Graphics Pipeline

Windowing System on a 3D Pipeline. February 2005

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Transcription:

A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim, Won-Ho Chun, Won-Suk SkKim, Tack-Don Han, Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim Media System Lab. Yonsei University Seoul, Korea E-mail : kiwh@kurene.yonsei.ac.kr kr

-2- Outline Introduction David Simulator High g Performance 3D Graphics Rasterizer with Effective Memory Structure (David Rasterizer) Performance Analysis Conclusions

Introduction

NRL Project Yearly Research Plan Title The Design of High Performance 3D Graphics Accelerator for Realistic Image Properties Institution for bringing up an excellent lab. with a core technology A Government-initiated Project Basic Environment & Research Study Simulation Environment 3D GA Simulator development Survey of Related Works Technology Prevalent 3D Graphic Accelerator Propose an Effective Architecture Performance Evaluation IP Co-Development (15m million tex xtured poly ygons) Technolog gy Prevale ent 1st Year 2nd Year Necessities Leadership of an advanced research area Synergy effect through research interchanges High Performance 3D Graphic Accelerator High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture High Pe erformanc ce (25m million textured polygons s) 3rd Year 5th Year

The Design of A High Performance 3D Graphics Accelerator for Realistic Image & Building Core IPs -5- Technology Prevalent 3D Graphics Accelerator High Performance 3D Graphics Accelerator Architecture Research Design Research SW Research Verification /Integration Geometry Processing Unit, Rendering Unit Realization Mapping Unit P-M Architecture, Memory Architecture for 3D GA Cache & Memory Hierarchy Parallel 3D Rendering System Execution Model(VLIW, SIMD, RISC etc.), Control & Interface Appliance to other system library by implementing VHDL API & Rendering Algorithm Geometry Compression / Modeling Construction of Simulation Environment & Verification by Simulation Prototype System

Current Research Work -6- Arithmetic ti Unit Geometry Unit Rendering Unit Realization Unit Memory Unit Major Research High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider Overlapped light geometry processing(olgp) New method for topological compression Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data Related Works & Basic Research David Simulator Vertex data(x,y,z,w) Model-view Transform ( Lighting ) Clipping Projection Divide by w Viewport Transform Triangle Setup Scan-conversion Bump map. Perspectivee Texture Mapping Fog / Alpha blending Anti-aliasing Pixel data Z-buffering

David Simulator

David Simulator Simulation Work Flow Model Data -8- Parser Obj format OpenGL format Mesa Library Call Rasterizer Simulator Call Setup pipeline Rasterizer Geometry Simulator Call FPU Instructions Geometry Engine Edge work pipeline Span processing Texture cache Vertex Buffer Inst. & Data transfer & Store to Local Mapping unit (texture, bump, environment, displacement) ZC Compare Cl Color Blend memory In nstruction Fetch Decode Execute #1 Execute #2 Execute #3 Write Back Write data to local memory MC for frame buffer access MC for image map access Performance Result Performance Result

Z-test pipeline Setup Edge Walk Span Processing Mapping pipeline Gradient value of U and V, W, D -9- David Rasterizer Block Diagram X, Y Z R, G, B, α Lambda Lambda Calculation Calculation Address Lambda Calculation Calculation Texel Addresses, BumXel Addresses, EXel Address Cache Addresses Tags Miss Addresses Request FIFO Reorder Buffer ZB Z-Buffer Compare Fragment FIFO Cache Addresses Texture Cache Texels, BumXel, EXel Interpolation Interpolation Address Frame buffer Address Z-Buffer Bump Engine Color Blend Env. Engine Pixel Cache Memory Controller Address 24bits Data 128bits Frame Buffer/ Texture Memory / Bump Map / Environment Map

High Performance 3D Graphics Rasterizer with Effective Memory Structure (David Rasterizer)

Architectural features of David rasterizer -11 - Performing g z-test pipeline pp before TBE(Texture, Bump, and Environment) mapping completion Saving memory bandwidth d Solve the incosistency problem with tagging scheme for pixel cache Texture cache sharing with BE(Bump and Environment) mapping Efficient structure

Pixel information Rasterizer model : Neon, S3 Neon S3 Pixel information -12 - Texture read / filter Texture cache Z read Texture blend Z test Z read Z write Z test Alpha test memory Texture read / filter Texture blend Texture cache memory Z write Alpha test Destination read Destination read Alpha blend Alpha blend Destination write Destination write

David Rasterizer -13 - Pixel information Tag test and Z read Pixel cache rate ide sepa Z test Texture read / filter Texture blend Texture cache memory wi Alpha test Tag update and Z write Destination read Alpha blend Pixel cache Destination write

-14 - Architecture Comparison Neon S3 David When is texture mapping performed? before Z test after Z test after Z test OpenGL semantics for perfectly transparent texture Support Not support Support Advantages Support OpenGL semantics No wasting bandwidth No fetching texture data that are obscured Simple scheme No wasting bandwidth No fetching texture data that are obscured Support OpenGL semantics Disadvantages Wasting bandwidth Unable to support OpenGL semantics Wide separation Inconsistency problem Solve it using additional flag bits in a pixel cache

-15 - Texture Cache Sharing Current 3D Architecture David 3D Architecture Texture Mapping #1 Cache DRAM Texture Mapping #2 Texture Mapping #1 Shared H/W Bump Mapping DRAM Bump Mapping Texture Mapping #2 Shared Cache DRAM Environment Mapping DRAM Shared H/W Environment Mapping Mapping Hardware Features Cache Size Current Architecture Independent H/W BE Mapping : DRAM Access Same David Architecture Shared H/W Reduce H/W cost (about 30%) BE Mapping : Cache Access Remove Pipeline Stalls due to DRAM Access Same # of Read Port in Cache 2 2 Throughput 1 Cycle Texture Mapping : 1 Cycle BE Mapping : 2 Cycles (infrequent operations)

Performance Analysis

Environments for Performance Evaluation -17 - Model Data OpenGL GLformat Mesa Library Call Trace Generation Rasterizer Simulator Call Setup pipeline Rasterizer Texture Cache Simulator Edge work pipeline Span processing Z Compare Color Blend MC for frame buffer access Texture cache Mapping unit (texture, bump, environment, displacement) MC for image map access Pixel Cache Simulator Miss Ratio, Performance Z Depth Complexity, # of Z Test Fails

-18 - Model Data SPECviewperf ProCDRS Lightscape Crystal Space (Game engine) Blocks Flarge

Bandwidth Saving in Texture Data(ProCDRS) -19 - Depth Complexity Bandwidth Saving Depth Co omplexity, % Bandwidt th Saving 25 20 15 10 5 0 Frame Number Depth Complexity Bandwidth Saving Average 2.22 21.55%

Bandwidth Saving in Texture Data(Light) -20 - Depth Complexity Bandwidth Saving Depth C omplexity, % Bandwidt th Saving 45 42 39 36 33 30 27 24 21 18 15 12 9 6 3 0 Frame Number Depth Complexity Bandwidth Saving Average 2.31 29.16%

Bandwidth Saving in Texture Data(Blocks) -21 - Depth Complexity Bandwidth Saving Depth Co omplexity, % Bandwidth Saving 9 8 7 6 5 4 3 2 1 0 Frame Number Depth Complexity Bandwidth Saving Average 1.43 4.76%

Bandwidth Saving in Texture Data(Flarge) -22 - Depth Complexity Bandwidth Saving Depth Co omplexity, % Bandwidth Saving 25 20 15 10 5 0 Frame Number Depth Complexity Bandwidth Saving Average 1.31 4.93%

Conclusions

-24 - Conclusions Simulation Environment (David Simulator) Evaluation for 3D graphics accelerator architecture Performance comparison David Architecture Performing z-test before TBE mapping completion 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity As the depth complexity grows, the amount of bandwidth savings become large Recently, a 3D graphic application shows high depth complexity Texture cache sharing with BE mapping Hardware saving from sharing and hardware reduction for BE mappings