Computer Graphics Hardware An Overview

Similar documents
游戏设计与开发. Outline. Game Programming Topics. Building A Game

Multiprocessors. HPC Prof. Robert van Engelen

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Texture Mapping. Jian Huang. This set of slides references the ones used at Ohio State for instruction.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

Isn t It Time You Got Faster, Quicker?

Appendix D. Controller Implementation

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Computer Systems - HS

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

Vision & Perception. Simple model: simple reflectance/illumination model. image: x(n 1,n 2 )=i(n 1,n 2 )r(n 1,n 2 ) 0 < r(n 1,n 2 ) < 1

Instruction and Data Streams

Computer Architecture

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Python Programming: An Introduction to Computer Science

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

CSE 305. Computer Architecture

UNIVERSITY OF MORATUWA

Normals. In OpenGL the normal vector is part of the state Set by glnormal*()

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Elementary Educational Computer

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Outline n Introduction n Background o Distributed DBMS Architecture

. Perform a geometric (ray-optics) construction (i.e., draw in the rays on the diagram) to show where the final image is formed.

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Basic Design Principles

ELEG 5173L Digital Signal Processing Introduction to TMS320C6713 DSK

Lecture 1: Introduction and Fundamental Concepts 1

Final Exam information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Design of Digital Circuits Lecture 21: SIMD Processors II and Graphics Processing Units

Overview. Overview. Mathematical Primitives. Robert Strzodka. Fragment Processor Functionality as seen from a High Level Language

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

Lighting and Shading. Outline. Raytracing Example. Global Illumination. Local Illumination. Radiosity Example

Analysis of Algorithms

Design of Digital Circuits Lecture 22: GPU Programming. Dr. Juan Gómez Luna Prof. Onur Mutlu ETH Zurich Spring May 2018

Chapter 3. Floating Point Arithmetic

Efficiency and Fitness of Embedded Flash Storage

Introduction to SWARM Software and Algorithms for Running on Multicore Processors

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Parabolic Path to a Best Best-Fit Line:

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

CS2410 Computer Architecture. Flynn s Taxonomy

Stevina Dias* Sherrin Benjamin* Mitchell D silva* Lynette Lopes* *Assistant Professor Dwarkadas J Sanghavi College of Engineering, Vile Parle

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Outline. CSCI 4730 Operating Systems. Questions. What is an Operating System? Computer System Layers. Computer System Layers

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

3D Model Retrieval Method Based on Sample Prediction

Chapter 3 Classification of FFT Processor Algorithms

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

Overview Chapter 12 A display model

Data Structures and Algorithms. Analysis of Algorithms

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Introduction CHAPTER Computers

Fast Interpolation of Grid Data at a Non-Grid Point

Transforming Irregular Algorithms for Heterogeneous Computing - Case Studies in Bioinformatics

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Graphics (Output) Primitives. Chapters 3 & 4

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Fast Fourier Transform (FFT) Algorithms

Cache and Bandwidth Aware Matrix Multiplication on the GPU

Efficient Hough transform on the FPGA using DSP slices and block RAMs

Baan Finance Financial Statements

Τεχνολογία Λογισμικού

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Chapter 4 The Datapath

An array based design for Real-Time Volume Rendering

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

Scanline Rendering 2 1/42

System and Software Architecture Description (SSAD)

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Graphics Hardware. Instructor Stephen J. Guy

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

One advantage that SONAR has over any other music-sequencing product I ve worked

1 Enterprise Modeler

condition w i B i S maximum u i

Using a Dynamic Interval Type-2 Fuzzy Interpolation Method to Improve Modeless Robots Calibrations

Analysis of Algorithms

Polymorph: Morphing Among Multiple Images

A collection of open-sourced RISC-V processors

Derivation of perspective stereo projection matrices with depth, shape and magnification consideration

Transcription:

Computer Graphics Hardware A Overview

Graphics System Moitor Iput devices CPU/Memory GPU

Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture) cosists of discrete pixels, ad each pixel has a small display area video cotroller A Frame buffer Moitor

Frame Buffer Frame buffer: the memory to hold the pixel properties (color, alpha, depth, stecil mask, etc) Properties of a frame buffer that affect the graphics performace: Size: scree resolutio Depth: color level 1 bit/pixel: black ad white 8 bits/pixel: 256 levels of gray or color pallet idex 24 bits/pixel: 16 millio colors Speed: refresh speed

Graphics Accelerator Graphics Memory/ Frame buffer A dedicated processor for graphics processig Graphics Processor Video Cotroller CPU Mai Memory System bus

Graphics Bus Iterface PCI based techology Graphics Memory/ Frame buffer Graphics Processor Video Cotroller Other Peripherals PCIe (8 GB/s) System Bus CPU Mai Memory

Graphics Accelerators

What do GPUs do? Graphics processig uits (GPUs) are massively parallel processors Process geometry/pixels ad produce images to be displayed o the scree Ca also be used to perform geeral purpose computatio (via CUDA/OpeGL) Evolved from simple video sca cotrollers, to special purpose processors that implemet a simple pipelie with fixed graphics fuctioality, to complex may-core architectures that cotai several deep parallel pipelies Example: vidia s Kepler GK110 cotais 15x192 cores ad 7.1 billios trasistors A graphics card ca easily have more tha 2GB of video memory

Computer Graphics Hardware A Overview

CPU/GPU Performace Gap

Vidia Kepler GK110 (2012)

Vidia TITAN X (2018) 12 B Trasistors 28 SMXs 11 TFlops 3 MB L2 Cache 384-bit GDDR5 PCI Express Ge3

Vidia Latest GPUs

SMX or SM (Streammig Processor)

Why are GPU s so fast? Etertaimet Idustry has drive the ecoomy of these chips? Males age 15-35 buy $10B i video games / year Moore s Law ++ Simplified desig (stream processig) Sigle-chip desigs.

Moder GPU has more ALU s

A Specialized Processor Very Efficiet For Fast Parallel Floatig Poit Processig Sigle Istructio Multiple Data Operatios High Computatio per Memory Access Not As Efficiet For Double Precisio Logical Operatios o Iteger Data Brachig-Itesive Operatios Radom Access, Memory-Itesive Operatios

The Rederig Pipelie The process to geerate two-dimesioal images from give virtual cameras ad 3D objects The pipelie stages implemet various core graphics rederig algorithms Why should you kow the pipelie? Necessary for programmig GPUs Uderstad various graphics algorithms Aalyze performace bottleeck host iterface vertex processig triagle setup pixel processig memory iterface

The Rederig Pipelie The basic costructio three coceptual stages Each stage is a pipelie ad rus i parallel Graphics performace is determied by the slowest stage Moder graphics systems: Software hardware Applicatio Geometry Rasteriazer Image

Host Iterface The host iterface is the commuicatio bridge betwee the CPU ad the GPU It receives commads from the CPU ad also pulls geometry iformatio from system memory It outputs a stream of vertices i object space with all their associated iformatio (ormals, texture coordiates, per vertex color etc) host iterface vertex processig triagle setup pixel processig memory iterface

Vertex Processig The vertex processig stage receives vertices from the host iterface i object space ad outputs them i scree space This may be a simple liear trasformatio, or a complex operatio ivolvig morphig effects Normals, texcoords etc are also trasformed No ew vertices are created i this stage, ad o vertices are discarded (iput/output has 1:1 mappig) host iterface vertex processig triagle setup pixel processig memory iterface

Triagle setup I this stage geometry iformatio becomes raster iformatio (scree space geometry is the iput, pixels are the output) Prior to rasterizatio, triagles that are backfacig or are located outside the viewig frustrum are rejected Some GPUs also do some hidde surface removal at this stage host iterface vertex processig triagle setup pixel processig memory iterface

Triagle Setup (cot) A fragmet is geerated if ad oly if its ceter is iside the triagle Every fragmet geerated has its attributes computed to be the perspective correct iterpolatio of the three vertices that make up the triagle host iterface vertex processig triagle setup pixel processig memory iterface

Fragmet Processig Each fragmet provided by triagle setup is fed ito fragmet processig as a set of attributes (positio, ormal, texcoord etc), which are used to compute the fial color for this pixel The computatios takig place here iclude texture mappig ad math operatios Typically the bottleeck i moder applicatios host iterface vertex processig triagle setup pixel processig memory iterface

Memory Iterface Fragmet colors provided by the previous stage are writte to the framebuffer Before the fial write occurs, some fragmets are rejected by the zbuffer, stecil ad alpha tests O moder GPUs, z ad color are compressed to reduce framebuffer badwidth (but ot size) host iterface vertex processig triagle setup pixel processig memory iterface

Programmability i the GPU Vertex ad fragmet processig, ad ow triagle set-up, are programmable The programmer ca write programs that are executed for every vertex as well as for every fragmet This allows fully customizable geometry ad shadig effects that go well beyod the geeric look ad feel of older 3D applicatios host iterface vertex processig triagle setup pixel processig memory iterface

The Graphics Pipelie

Diagram of a moder GPU Iput from CPU Host iterface Vertex processig Triagle setup Pixel processig Memory Iterface 64bits to memory 64bits to memory 64bits to memory 64bits to memory

The Quest for Realism (courtesy: vidia)