A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system
|
|
- Bertha Dalton
- 5 years ago
- Views:
Transcription
1 MS Thesis A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system Min-wuk Lee Semiconductor System Laboratory Department Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology [KAIST] Min-wuk Lee 1
2 Introduction Motivation Outline MobileGL: Mobile 3D graphics library Energy-efficient CPU cache Energy-efficient texture cache Conclusion Min-wuk Lee 2
3 Introduction(1/2) Embedded mobile system Mobile 3D graphics system Optimized code for speed Draw off the best H/W performance Model Interface Transformation Gouraud Shading Depth Compare Input Software system Hardware system Output Lighting Perspective Projection Screen Clipping Alpha Blending Triangle Setup Texture Mapping Low energy consumption Good quality, high performance Model CPU MobileGL 3D Rendering Engine Memory Pixel Performance-energy co-optimization for mobile 3D graphics Software system : High speed graphics library (MobileGL) Hardware system : Energy-efficient Cache architecture Min-wuk Lee 3
4 Target system Introduction(2/2) Application processor Graphics library CPU cache Mem Low-cost target Application processor Graphics library CPU cache Texture cache Frame cache Depth cache 3D graphics SoC Low-cost target High speed graphics library Energy-efficient CPU cache system High quality target High speed and good quality graphics library Energy-efficient CPU cache, texture cache system R.E. System bus controller High quality target Mem Min-wuk Lee 4
5 For low-cost target Previous work PC, workstation platform graphics library Too huge GL supported by FPU, special graphics engine Embedded platform graphics library : Fixed point arithmetic Yoshida s work[1] Limited operation (Without texturing) No research on memory bandwidth bottleneck Previous work in our group Analysis with memory-only, without cache system For high quality target Texture cache in PC platform Hakura s work[2] Analysis based on miss rate Did not consider energy, execution time, system limitation [1] : K.Yoshida, Consumer Electronics, IEEE Transactions on,1998. [2] : Ziyad s. Hakura, ISCA,1997. Min-wuk Lee 5
6 Motivation Graphics library of this work Extended operation Lighting, Texturing, Alpha blending, Face culling, etc. Optimization of memory transaction 3D graphics characteristic Analysis with cache system Texture cache of this work Energy-efficient texture cache in embedded system With negligible performance degradation Min-wuk Lee 6
7 Introduction Motivation Outline MobileGL: Mobile 3D graphics library Energy-efficient CPU cache Energy-efficient texture cache Conclusion Min-wuk Lee 7
8 Mobile 3D graphics library A fixed-point arithmetic 32bit integer Optimized memory transaction To reduce instruction and data traffic Selective pipeline Applications To reduce branch 45KB total code size MobileGL(1/6) Model Lighting enable Lighting disable View transformation Lighting Perspective projection Clipping Perspective division Screen mapping Cull face Rendering stage Lighting and Texturing Lighting only Pixel View transformation X Perspective projection Clipping Perspective division Screen mapping Cull face Rendering stage Texture-only Include r,g,b,a calculation Exclude r,g,b,a calculation MobileGL block diagram Min-wuk Lee 8
9 MobileGL(2/6) Disabling option for perspective correction of texture address Due to small screen size Trade-off between correctness and speedup View transformation X Perspective projection Clipping /Perspective divison u,v/ w u,v On Off Screen mapping Cull face Triangle setup execution time /1K Polygons % reduction Horizontal setup 0 Pixel interpolation u,v/w u,v Triangle setup Horizontal setup Texturing Conventional This work StrongARM at 200 MHz Min-wuk Lee 9
10 MobileGL(3/6) Division reduction in interpolation Use shift instead of reciprocal High probability of 1,2 or 4 in denominator value 1st Top Direction_y 1 Mid Line1 3rd Line3 Line2 start Line4 Direction_x end Bot 2nd is 1, 2 or 4, using shift Execution time(ms) per 1000 Polygons ROD Triangle setup Horizontal setup Texturing 27 % reduction 200 MHz Min-wuk Lee 10
11 Z comparison in advance To avoid unnecessary shading and texturing[3] MobileGL(4/6) Selective precision of matrix multiplication 67% stage #2 #1 Unnecessary operation for #2 Should extend to 64bit for result 32 bit 4bit A Texturing Depth test Blending Standard OpenGL pipeline 64 bit B A MULL B Depth test Texturing Blending Z comparison in advance 32 bit 4bit Speed improvement A Z_fail / Z_access B (%) A MUL B [3] : Ramchan Woo, ISSCC 2003 Min-wuk Lee 11
12 Library performance MobileGL(5/6) 67K texture only application due to several optimization steps Original C code 67K Polygons/sec Polyons / milli sec Optimized code 6.7 times Performance improvement 0 80MHz 200MHz 200MHz Previous work[1] Min-wuk Lee 12
13 Implementation result MobileGL(6/6) Min-wuk Lee 13
14 Introduction Motivation Outline MobileGL: Mobile 3D graphics library Energy-efficient CPU cache Energy-efficient texture cache Conclusion Min-wuk Lee 14
15 Energy-efficient efficient CPU cache (1/4) Simulation environment Application Programs 3D Graphics Library From ARM SDK : 1, memory transaction From cache model using memory transaction : 2, 3 ARM SDK ARM Processor Memory CPU execution time Memory transaction file CACHE_MODEL Target Hardware Platform TOTAL EXECUTION TIME Memory_access_time T T exe_ total exe_ CPU = T = exe_ CPU instruction _ counts + memory T 3 _ access _ time exe_ instruction K = 1 K = cache_ hit _ counts CPU 2 _ cycle Min-wuk Lee 15
16 Energy-efficient efficient CPU cache (2/4) Cache model : about execution time Processor core hit_time Data cache Instruction cache Memory memory_access_time hit _ time = clock _ period core memory _ access _ time trcd + CAS _ latency + clock _ period == clock _ period L( burst _ access) mem mem L( non _ squential _ access) miss _ time == memory _ access _ time + hit _ timek( read) memory _ access _ timel( write) Min-wuk Lee 16
17 Energy-efficient efficient CPU cache (3/4) Energy modeling Tool and research documentation based model Cache hit energy : from CACTI 3.0 [4] Cache miss energy : from Power & Energy Characterization of the Itsy Pocket Computer by Compaq Western Research Laboratory 4.70nJ / bus_clock [5], [6] [4] : CACTI 3.0 : An integrated cache timing, power, and area model, Compaq Western Research Laboratory [5] : Power and energy characterization of the Itsy pocket computer [6] : A simulation framework for energy-consumption analysis of OS-driven embedded applications, TCAS 2003 Min-wuk Lee 17
18 Energy-efficient efficient CPU cache (4/4) Simulation results Direct mapped data cache, 8E/line (32B line size) Miss rate(%) Normalized execution time Normlizated@ 1 Normalized energy consumption Normlizated@ KB 4KB 8KB 16KB 2KB 4KB 8KB 16KB 2KB 4KB 8KB 16KB cache size cache size cache size 16KB data cache, 32B line size Miss rate(%) Normalized execution time Normlizated@ 1 1% performance degradation Normlizated@ Normalized energy consumption 13% energy saving DM 2WAY 4WAY 8WAY DM 2WAY 4WAY 8WAY DM 2WAY 4WAY 8WAY Using 2-way cache, 13% energy saving, 1% performance degradation compared with conventional 4-way cache Min-wuk Lee 18
19 Introduction Motivation Outline MobileGL: Mobile 3D graphics library Energy-efficient CPU cache Energy-efficient texture cache Conclusion Min-wuk Lee 19
20 Energy-efficient efficient texture cache (1/12) Texture mapping (Introduction) F(x,y,z) = (s,t) Map from 3D surface to 2D texel domain (image) Texture coordinate Lookup color in image y z x t s Lookup method 2D texture diagram Nearest texel Interpolation of surrounding texles MIPMAP Image pyramid Level 0 d axis Image pyramid Min-wuk Lee 20
21 Energy-efficient efficient texture cache (2/12) Texture filtering methods (Introduction) Point sampling, Bilinear filtering, Bilinear MIPMAP, Trilinear MIPMAP LOD 0 1st 1. Point sampling LOD 0 1st LOD = 1.XX 2nd 3rd 2nd 1st 2nd 3rd LOD 1 LOD 2 LOD 3 3rd Bilinear interpolation 2. Bilinear filtering 3. Bilinear MIPMAP Bilinear interpolation Bilinear interpolation 4. Trilinear MIPMAP Linear interpolation Texture space Screen space Texture space Min-wuk Lee 21
22 Energy-efficient efficient texture cache (3/12) Obstacle of texture mapping Requirement of extremely high bandwidth Texture cache To reduce the off-chip memory access bottlenecks Image conversion (texture map representation) : Reduce conflict miss Address conversion unit (A few logical operations and two additions) External memory 3D Rendering engine Address conversion Texture cache Image conversion Texture cache system Min-wuk Lee 22
23 Energy-efficient efficient texture cache (4/12) Simulation models Tiny Stealth Alien 6833 polygons 542 polygons 854 polygons Tiny :LOD[0:1] 80%, LOD[1:2] 10% Stealth :LOD[0:1] 67%, LOD[1:2] 15% Alien :LOD[0:1] 48%, LOD[1:2] trilinear MIPMAP Min-wuk Lee 23
24 Energy-efficient efficient texture cache (5/12) Proposed texture map representation Reduce conflict miss at bank change Miss rate reduction, energy saving (17.4%), execution time reduction (15.2%) Blocked representation Recursive Sub Block Min-wuk Lee 24
25 Energy-efficient efficient texture cache (6/12) Address conversion unit for RSB2X2 Use one-to-one correspondence and find rule Hardware implementation : only thirteen 2:1mux in trilinear MIPMAP Address conversion unit of this work : RSB 2X2 old11 old9 old7 old5 old3 old1 old10 old8 old6 old4 old2 old0 old11 old9 old7 old5 old3 old1 old10 old8 old6 old4 old2 old0 core request address core request address converted address new10 new8 new6 new4 new2 new0 new11 new9 new7 new5 new3 new1 converted address new10 new8 new6 new4 new2 new0 new11 new9 new7 new5 new3 new1 256 X 256, RSB 2X2 64 X 64, RSB 2X2 old11 old9 old7 old5 old3 old1 old10 old8 old6 old4 old2 old0 old11 old9 old7 old5 old3 old1 old10 old8 old6 old4 old2 old0 core request address core request address converted address new10 new8 new6 new4 new2 new0 new11 new9 new7 new5 new3 new1 converted address new10 new8 new6 new4 new2 new0 new11 new9 new7 new5 new3 new1 128 X 128, RSB 2X2 32 X 32, RSB 2X2 Min-wuk Lee 25
26 Energy-efficient efficient texture cache (7/12) Texture cache model using bank interleaved A0 Texture cache (1 bank) A0 A1 A2 A3 Texture cache (4 bank) Texture cache for even, odd LOD (4 bank) A0 A1 A2 A3 A4 A5 A6 A7 EvenLOD$ OddLOD$ D0 Point sampling D3 D2 D1 D0 Bilinear filtering Bilinear MIPMAP D7 D6 D5 D4 D3 D2 D1 D0 Trilinear MIPMAP Morton order representation previous work Proposed RSB2X2 also free from bank conflict Min-wuk Lee 26
27 Energy-efficient efficient texture cache (8/12) Performance and Energy comparison between filtering method Energy consumption, Execution time Point sampling < Bilinear filtering < Bilinear MIPMAP < Trilinear MIPMAP Trade off point : Image quality (aliasing criterion) Normalized energy P.S. B.F. B.M. T.M. 2KB, 16entries/line, Tiny_model D.M. 2WAY 4WAY Min-wuk Lee 27
28 Energy-efficient efficient texture cache (9/12) Image quality analysis Textile model LOD[0:1] : 44%, LOD[1:2] : 40% in MIPMAP Point smapling Bilinear filtering Bilinear mipmap Trilinear mipmap DCT analysis Low frequency term in top-left Point smapling Bilinear filtering Bilinear mipmap Trilinear mipmap Min-wuk Lee 28
29 Energy-efficient efficient texture cache (10/12) Image quality metric in terms of aliasing criterion image_ quality _ 0.5 π / 2 π / 2 fx= 0 fy= 0 = π π fx= 0 fy= 0 amplitude amplitude 0 image _ quality _ 0.75 PI fx 3π 4 3π 4 fx = 0 fy = 0 = π π fx = 0 fy = 0 amplitude amplitude Index Q, Index E To find relative value PI Normalize from 0 to 1 Index Q = cur Q max Q min min Q Q fy Index E = max max E E cur min E E Min-wuk Lee 29
30 Energy-efficient efficient texture cache (11/12) Index = Index Q + Index E Almost same quality between B.M. and T.M. in QVGA Large different energy between B.M. and T.M. Poor image quality in P.S. Bilinear MIPMAP get the largest score. Index Q 1 8E,(Q_0.5) 16E,(Q_0.75) P.S. B.F. B.M. T.M Index E P.S. B.F. B.M. T.M. 8E 16E Index Q +Index E P.S. B.F. B.M. T.M. 8E,(Q_0.5) 16E,(Q_0.5) 8E,(Q_0.75) 16E,(Q_0.75) 2-way set associative, 2KB texture cache Min-wuk Lee 30
31 Energy-efficient efficient texture cache (12/12) Simulation results 1.2 Normalized energy Tiny Stealth Alien E 16E K 2K 4K 8K 1K 2K 4K 8K 1K 2K 4K 8K Energy comparison while changing cache 2-way, using bilinear MIPMAP 4KB texture cache, 16B line size (2B per 1texel) energy-efficient, low cost, high-quality Min-wuk Lee 31
32 Introduction Motivation Outline MobileGL: Mobile 3D graphics library Energy-efficient CPU cache Energy-efficient texture cache Conclusion Min-wuk Lee 32
33 Conclusion For performance-energy co-optimization in Mobile3D graphics MobileGL / Cache architecture MobileGL : Mobile 3D graphics library 67K polygons/sec 66.1% performance improvement in average Energy-efficient CPU cache 2-way set associative cache to save energy Energy-efficient texture cache Proposed texture map representation Bilinear MIPMAP shows good quality to energy ratio 16B line size, 4KB size cache is the optimal point Min-wuk Lee 33
34 Supplemental Materials Min-wuk Lee 34
35 Geometry stage Graphics pipeline Rendering stage Camera direction 1st Top Camera position z z x View frustum Unit-cube x z x View transform Projection Clipping 1/w Screen mapping Direction_y Mid 3rd Line1 Line2 start Line3 Direction_x end 2nd Bot Triangle setup : For line1, line2, line3 using1st, 2nd, 3rd Horizontal setup : For line_x using start, end Line_x Rendering stage Pixel interpolation : Each pixel shading, texturing Min-wuk Lee 35
36 Energy portion of cache ARM920T and M*CORE : Caches consume 50% of total pr ocessor system power (Segars 01,Lee et.al. 99) >50% Min-wuk Lee 36
37 Blocked representation Conventional texture map representation (16X16blocked) Conflict block change A path : 2 B path : 16 C path : 16 B C A Block : Square region that texels are ordered consecutively Assumptions 1. Bilinear filtering block block Cache size = block size 3. 16entries / 1 line X16 conventional block texture map Min-wuk Lee 37
38 Proposed texture map representation Recursive Sub Block texture map representation (RSB4X4) Conflict change A path : 4 B path : 4 C path : 4 Assumptions 1. Bilinear filtering 2. Cache size = block size B A C block entries/ 1line block block recursive sub-block 4X4 method Min-wuk Lee 38
39 Simulation between representation methods Simulation results between texture representations Bilinear filtering, 2-way, 1KB texture cache 27% performance improvement in average Low miss rate doesn t mean high performance 8entries/line or 16entries/line shows good performance Miss rate Tiny Stealth Alien 4X4 8X8 16X16 RSB2X2 Normalized performance Tiny Stealth Alien 4X4 8X8 16X16 RSB2X E 8E 16E 32E 4E 8E 16E 32E 4E 8E 16E 32E 0.5 4E 8E 16E 32E 4E 8E 16E 32E 4E 8E 16E 32E Min-wuk Lee 39
40 Simulation between representation bilinear filtering Low miss rate doesn t mean low energy consumption RSB accomplish 17.4% energy saving compared to the best of conventional point sampling 25% performance improvement in average Normalized energy Tiny Stealth Alien 17.4% energy saving 4E 8E 16E 32E 4E 8E 16E 32E 4E 8E 16E 32E 4X4 8X8 16X16 RSB2X2 Normalized performance Tiny Stealth Alien 4E 8E 16E 32E 4E 8E 16E 32E 4E 8E 16E 32E 4X4 8X8 16X16 RSB2X2 Bilinear filtering, 1KB cache, 2way Point sampling, 1KB cache, 2way Min-wuk Lee 40
41 Morton order Multi-ported cache To access more than 1 texel in the same cycle Interleaving the cache lines across multi-banks Morton order RSB4X4 : Not free from bank conflict RSB2x2 Trilinear filtering : Not free from bank conflict Cache for even, odd LOD 4X4 block map Morton order LSB 2bits LSB 2bits LSB 2bits D0 D1 D2 D3 3 Not free from bank conflict Bank conflict free 2 D4 D5 D6 D Not free from bank conflict LSB 2bits Min-wuk Lee 41
42 Proposed texture map representation RSB2X2 map representation Bank conflict free A Conflict bank change A : B : C : B C RSB 4X4 block block RSB recursive sub-block 2X2 method Min-wuk Lee 42
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationDesign and Optimization of Geometry Acceleration for Portable 3D Graphics
M.S. Thesis Design and Optimization of Geometry Acceleration for Portable 3D Graphics Ju-ho Sohn 2002.12.20 oratory Department of Electrical Engineering and Computer Science Korea Advanced Institute of
More information2D/3D Graphics Accelerator for Mobile Multimedia Applications. Ramchan Woo, Sohn, Seong-Jun Song, Young-Don
RAMP-IV: A Low-Power and High-Performance 2D/3D Graphics Accelerator for Mobile Multimedia Applications Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae,, and Hoi-Jun Yoo oratory Dept. of EECS,
More informationReal-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Texture
Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://www.graphics.stanford.edu/courses/cs448a-01-fall Texture 1 Topics 1. Review of texture mapping 2. RealityEngine and InfiniteReality 3. Texture
More informationA 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip
A 120mW Embedded 3D Graphics Rendering Engine with 6Mb Logically Local Frame-Buffer and 3.2GByte/s Run-time Reconfigurable Bus for PDA-Chip Ramchan Woo*, Chi-Weon Yoon, Jeonghoon Kook, Se-Joong Lee, Kangmin
More informationCS 130 Final. Fall 2015
CS 130 Final Fall 2015 Name Student ID Signature You may not ask any questions during the test. If you believe that there is something wrong with a question, write down what you think the question is trying
More informationVertex Shader Design I
The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only
More informationMobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools
Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp
More informationDevelopment of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems
1020 IEEE Transactions on Consumer Electronics, Vol. 51, No. 3, AUGUST 2005 Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems Byeong-Gyu Nam, Min-wuk
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationOptimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June
Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge
More informationA Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU
for 3D Texture-based Volume Visualization on GPU Won-Jong Lee, Tack-Don Han Media System Laboratory (http://msl.yonsei.ac.k) Dept. of Computer Science, Yonsei University, Seoul, Korea Contents Background
More informationMattan Erez. The University of Texas at Austin
EE382V: Principles in Computer Architecture Parallelism and Locality Fall 2008 Lecture 10 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should we
More informationBuilding scalable 3D applications. Ville Miettinen Hybrid Graphics
Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationE.Order of Operations
Appendix E E.Order of Operations This book describes all the performed between initial specification of vertices and final writing of fragments into the framebuffer. The chapters of this book are arranged
More informationReal - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský
Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application
More informationLecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)
Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications
More informationTextures. Texture coordinates. Introduce one more component to geometry
Texturing & Blending Prof. Aaron Lanterman (Based on slides by Prof. Hsien-Hsin Sean Lee) School of Electrical and Computer Engineering Georgia Institute of Technology Textures Rendering tiny triangles
More informationModule 13C: Using The 3D Graphics APIs OpenGL ES
Module 13C: Using The 3D Graphics APIs OpenGL ES BREW TM Developer Training Module Objectives See the steps involved in 3D rendering View the 3D graphics capabilities 2 1 3D Overview The 3D graphics library
More informationFrom Vertices to Fragments: Rasterization. Reading Assignment: Chapter 7. Special memory where pixel colors are stored.
From Vertices to Fragments: Rasterization Reading Assignment: Chapter 7 Frame Buffer Special memory where pixel colors are stored. System Bus CPU Main Memory Graphics Card -- Graphics Processing Unit (GPU)
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationCS451Real-time Rendering Pipeline
1 CS451Real-time Rendering Pipeline JYH-MING LIEN DEPARTMENT OF COMPUTER SCIENCE GEORGE MASON UNIVERSITY Based on Tomas Akenine-Möller s lecture note You say that you render a 3D 2 scene, but what does
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationMonday Morning. Graphics Hardware
Monday Morning Department of Computer Engineering Graphics Hardware Ulf Assarsson Skärmen består av massa pixlar 3D-Rendering Objects are often made of triangles x,y,z- coordinate for each vertex Y X Z
More informationBaback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group
Introduction ti to Adreno Tools Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group Qualcomm HW Accelerated 3D: Adreno Moving content-quality forward requires
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationMali-400 MP: A Scalable GPU for Mobile Devices Tom Olson
Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling
More informationTutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI
Tutorial on GPU Programming #2 Joong-Youn Lee Supercomputing Center, KISTI Contents Graphics Pipeline Vertex Programming Fragment Programming Introduction to Cg Language Graphics Pipeline The process to
More informationRendering Objects. Need to transform all geometry then
Intro to OpenGL Rendering Objects Object has internal geometry (Model) Object relative to other objects (World) Object relative to camera (View) Object relative to screen (Projection) Need to transform
More informationOptimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research
Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of
More informationReal-Time Rendering (Echtzeitgraphik) Michael Wimmer
Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key
More informationReal-World Applications of Computer Arithmetic
1 Commercial Applications Real-World Applications of Computer Arithmetic Stuart Oberman General purpose microprocessors with high performance FPUs AMD Athlon Intel P4 Intel Itanium Application specific
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position
More informationAn Architecture Extension for Efficient Geometry Processing
An Architecture Extension for Efficient Geometry Processing Radhika Thekkath, Mike Uhler, Chandlee Harrell, Ying-wai Ho MIPS Technologies, Inc. 1225 Charleston Road Mountain View, CA 94043 Talk Outline
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationProgramming Graphics Hardware
Tutorial 5 Programming Graphics Hardware Randy Fernando, Mark Harris, Matthias Wloka, Cyril Zeller Overview of the Tutorial: Morning 8:30 9:30 10:15 10:45 Introduction to the Hardware Graphics Pipeline
More informationA SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision. Seok-Hoon Kim MVLSI Lab., KAIST
A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision Seok-Hoon Kim MVLSI Lab., KAIST Contents Background Motivation 3D Graphics + 3D Display Previous Works Conventional 3D Image
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationStructure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,
A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim,
More informationWhiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)
Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini
More informationLets assume each object has a defined colour. Hence our illumination model is looks unrealistic.
Shading Models There are two main types of rendering that we cover, polygon rendering ray tracing Polygon rendering is used to apply illumination models to polygons, whereas ray tracing applies to arbitrary
More informationEvolution of GPUs Chris Seitz
Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing
More informationComing to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation
Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP Chris Wynn NVIDIA Corporation What is GoForce 3D? Licensable 3D Core for Mobile Devices Discrete Solutions: GoForce 3D 4500/4800 OpenGL
More informationComputer Graphics. Texture Filtering & Sampling Theory. Hendrik Lensch. Computer Graphics WS07/08 Texturing
Computer Graphics Texture Filtering & Sampling Theory Hendrik Lensch Overview Last time Texture Parameterization Procedural Shading Today Texturing Filtering 2D Texture Mapping Forward mapping Object surface
More informationA Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on
A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced
More informationLec 11 How to improve cache performance
Lec 11 How to improve cache performance How to Improve Cache Performance? AMAT = HitTime + MissRate MissPenalty 1. Reduce the time to hit in the cache.--4 small and simple caches, avoiding address translation,
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:
More informationDesign and Implementation of High Performance Application Specific Memory
Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics
More informationTexture. Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan.
Texture Real-Time Graphics Architecture Kurt Akeley Pat Hanrahan http://graphics.stanford.edu/courses/cs448-07-spring/ Topics 1. Projective texture mapping 2. Texture filtering and mip-mapping 3. Early
More informationCSE 167: Introduction to Computer Graphics Lecture #8: Textures. Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016
CSE 167: Introduction to Computer Graphics Lecture #8: Textures Jürgen P. Schulze, Ph.D. University of California, San Diego Spring Quarter 2016 Announcements Project 2 due this Friday Midterm next Tuesday
More informationUsing Virtual Texturing to Handle Massive Texture Data
Using Virtual Texturing to Handle Massive Texture Data San Jose Convention Center - Room A1 Tuesday, September, 21st, 14:00-14:50 J.M.P. Van Waveren id Software Evan Hart NVIDIA How we describe our environment?
More informationTexture mapping. Computer Graphics CSE 167 Lecture 9
Texture mapping Computer Graphics CSE 167 Lecture 9 CSE 167: Computer Graphics Texture Mapping Overview Interpolation Wrapping Texture coordinates Anti aliasing Mipmaps Other mappings Including bump mapping
More informationC P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev
C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE UGRAD.CS.UBC.C A/~CS314 Mikhail Bessmeltsev 1 WHAT IS RENDERING? Generating image from a 3D scene 2 WHAT IS RENDERING? Generating image
More informationLecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014
Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms Visual Computing Systems Review: mechanisms to reduce aliasing in the graphics pipeline When sampling visibility?! -
More informationLecture 2. Shaders, GLSL and GPGPU
Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders
More information- Rasterization. Geometry. Scan Conversion. Rasterization
Computer Graphics - The graphics pipeline - Geometry Modelview Geometry Processing Lighting Perspective Clipping Scan Conversion Texturing Fragment Tests Blending Framebuffer Fragment Processing - So far,
More informationModule Introduction. Content 15 pages 2 questions. Learning Time 25 minutes
Purpose The intent of this module is to introduce you to the multimedia features and functions of the i.mx31. You will learn about the Imagination PowerVR MBX- Lite hardware core, graphics rendering, video
More informationMattan Erez. The University of Texas at Austin
EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 11 The Graphics Processing Unit Mattan Erez The University of Texas at Austin Outline What is a GPU? Why should
More informationHot Chips Bringing Workstation Graphics Performance to a Desktop Near You. S3 Incorporated August 18-20, 1996
Hot Chips 1996 Bringing Workstation Graphics Performance to a Desktop Near You S3 Incorporated August 18-20, 1996 Agenda ViRGE/VX Marketing Slide! Overview of ViRGE/VX accelerator features 3D rendering
More informationPoint based Rendering
Point based Rendering CS535 Daniel Aliaga Current Standards Traditionally, graphics has worked with triangles as the rendering primitive Triangles are really just the lowest common denominator for surfaces
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2
ISSCC 2001 / SESSION 9 / INTEGRATED MULTIMEDIA PROCESSORS / 9.2 9.2 A 80/20MHz 160mW Multimedia Processor integrated with Embedded DRAM MPEG-4 Accelerator and 3D Rendering Engine for Mobile Applications
More informationPOWERVR MBX. Technology Overview
POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied
More informationPipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11
Pipeline Operations CS 4620 Lecture 11 1 Pipeline you are here APPLICATION COMMAND STREAM 3D transformations; shading VERTEX PROCESSING TRANSFORMED GEOMETRY conversion of primitives to pixels RASTERIZATION
More informationRasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?
Where does a picture come from? Rasterization and Graphics Hardware CS559 Course Notes Not for Projection November 2007, Mike Gleicher Result: image (raster) Input 2D/3D model of the world Rendering term
More informationTexture Mapping and Sampling
Texture Mapping and Sampling CPSC 314 Wolfgang Heidrich The Rendering Pipeline Geometry Processing Geometry Database Model/View Transform. Lighting Perspective Transform. Clipping Scan Conversion Depth
More informationModule Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series PG Examination 2013-14 COMPUTER GAMES DEVELOPMENT CMPSME27 Time allowed: 2 hours Answer any THREE questions. (40 marks each) Notes are
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationCHAPTER 1 Graphics Systems and Models 3
?????? 1 CHAPTER 1 Graphics Systems and Models 3 1.1 Applications of Computer Graphics 4 1.1.1 Display of Information............. 4 1.1.2 Design.................... 5 1.1.3 Simulation and Animation...........
More informationReal-Time Shadows. Last Time? Textures can Alias. Schedule. Questions? Quiz 1: Tuesday October 26 th, in class (1 week from today!
Last Time? Real-Time Shadows Perspective-Correct Interpolation Texture Coordinates Procedural Solid Textures Other Mapping Bump Displacement Environment Lighting Textures can Alias Aliasing is the under-sampling
More information0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV
0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV Rajeev Jayavant Cyrix Corporation A National Semiconductor Company 8/18/98 1 0;L$UFKLWHFWXUDO)HDWXUHV ¾ Next-generation Cayenne Core Dual-issue pipelined
More informationRendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane
Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world
More information2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into
2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel
More informationFeeding the Beast: How to Satiate Your GoForce While Differentiating Your Game
GDC Europe 2005 Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game Lars M. Bishop NVIDIA Embedded Developer Technology 1 Agenda GoForce 3D capabilities Strengths and weaknesses
More informationOverview. A real-time shadow approach for an Augmented Reality application using shadow volumes. Augmented Reality.
Overview A real-time shadow approach for an Augmented Reality application using shadow volumes Introduction of Concepts Standard Stenciled Shadow Volumes Method Proposed Approach in AR Application Experimental
More informationPipeline Operations. CS 4620 Lecture 14
Pipeline Operations CS 4620 Lecture 14 2014 Steve Marschner 1 Pipeline you are here APPLICATION COMMAND STREAM 3D transformations; shading VERTEX PROCESSING TRANSFORMED GEOMETRY conversion of primitives
More informationThe Application Stage. The Game Loop, Resource Management and Renderer Design
1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data
More information3D rendering using FPGAs
3D rendering using FPGAs Péter Sántó, Béla Fehér Department of Measurement and Information Systems Budapest University of Technology and Economics H-7 Budapest, Magyar Tudósok krt. 2. santo@mit.bme.hu,
More informationPowerVR Performance Recommendations. The Golden Rules
PowerVR Performance Recommendations Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind. Redistribution
More informationChapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming
Chapter IV Fragment Processing and Output Merging Fragment Processing The per-fragment attributes may include a normal vector, a set of texture coordinates, a set of color values, a depth, etc. Using these
More informationComputer Graphics Shadow Algorithms
Computer Graphics Shadow Algorithms Computer Graphics Computer Science Department University of Freiburg WS 11 Outline introduction projection shadows shadow maps shadow volumes conclusion Motivation shadows
More informationOverview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size
Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman
More informationPerspective Projection and Texture Mapping
Lecture 7: Perspective Projection and Texture Mapping Computer Graphics CMU 15-462/15-662, Spring 2018 Perspective & Texture PREVIOUSLY: - transformation (how to manipulate primitives in space) - rasterization
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationCS 450: COMPUTER GRAPHICS TEXTURE MAPPING SPRING 2015 DR. MICHAEL J. REALE
CS 450: COMPUTER GRAPHICS TEXTURE MAPPING SPRING 2015 DR. MICHAEL J. REALE INTRODUCTION Texturing = process that takes a surface and modifies its appearance at each location using some image, function,
More informationgraphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1
graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline sequence of operations to generate an image using object-order processing primitives processed one-at-a-time
More informationHardware-driven Visibility Culling Jeong Hyun Kim
Hardware-driven Visibility Culling Jeong Hyun Kim KAIST (Korea Advanced Institute of Science and Technology) Contents Introduction Background Clipping Culling Z-max (Z-min) Filter Programmable culling
More informationgraphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1
graphics pipeline computer graphics graphics pipeline 2009 fabio pellacini 1 graphics pipeline sequence of operations to generate an image using object-order processing primitives processed one-at-a-time
More informationCS4620/5620: Lecture 14 Pipeline
CS4620/5620: Lecture 14 Pipeline 1 Rasterizing triangles Summary 1! evaluation of linear functions on pixel grid 2! functions defined by parameter values at vertices 3! using extra parameters to determine
More informationComputer System Components
Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCS 498 VR. Lecture 19-4/9/18. go.illinois.edu/vrlect19
CS 498 VR Lecture 19-4/9/18 go.illinois.edu/vrlect19 Review from previous lectures Image-order Rendering and Object-order Rendering Image-order Rendering: - Process: Ray Generation, Ray Intersection, Assign
More informationTEXTURE MAPPING. DVA338 Computer Graphics Thomas Larsson, Afshin Ameri
TEXTURE MAPPING DVA338 Computer Graphics Thomas Larsson, Afshin Ameri OVERVIEW Motivation Texture Mapping Coordinate Mapping (2D, 3D) Perspective Correct Interpolation Texture Filtering Mip-mapping Anisotropic
More informationAS THE MOBILE electronics market matures, third-generation
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 7, JULY 2004 1101 A Low-Power 3-D Rendering Engine With Two Texture Units and 29-Mb Embedded DRAM for 3G Multimedia Terminals Ramchan Woo, Student Member,
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationPOWERVR MBX & SGX OpenVG Support and Resources
POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,
More information3D Rasterization II COS 426
3D Rasterization II COS 426 3D Rendering Pipeline (for direct illumination) 3D Primitives Modeling Transformation Lighting Viewing Transformation Projection Transformation Clipping Viewport Transformation
More information