Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems. Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft)
|
|
- Steven Glenn
- 5 years ago
- Views:
Transcription
1 Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems Jordi Roca Monfort (Universitat Politècnica Catalunya) Mark Grossman (Microsoft) 0
2 Outline Introduction on Multi-GPU rendering RTT surface synchronization alternatives Multi-GPU performance models Scaling results Conclusion 1
3 Multi-GPU rendering Main purpose: Several GPUs collaborate to render frames of the same 3D scene. Rendering task is massive parallel. High resolution scenes require a lot of pixel fillrate and memory BW. GPUs partial renders are composed to the final frame sequence. Other different usages: Multi-display rendering: each GPU renders a different viewport/screen. GPUs take different tasks: graphics rendering, physics and AI processing. 2
4 A choice for enthusiast gaming Performance/scaling: Usually high for few GPUs and high resolutions (1600x1200 minimum) Greatly depends on driver maturity and the game workload: Crysis Warhead hits 1.6x and Lost Planet hits 1.9x with 2-GPU systems. Power: Two graphics cards spend more than the equivalent single GPU solution targeting the same performance. Upgrade cost: Double performance for next game generation by acquiring a second graphics card (same GPU family and similar range counterpart). Extra cost of acquiring a high end motherboard with multiple PCIe ports. 3
5 Rendering workload balance Split Frame Rendering Alternate Frame Rendering SuperAA Dynamic split line Fixed tiles (32x32) Geometry scaling problem Decreased interactivity problem Play at High AA modes (x16) 4
6 What do GPUs communicate to each other? Today s 3D engines don t just render the main backbuffer: Draw commands can render to special surfaces used later as textures: reflections, shadow maps, lens flare, post-filtering ops,.. New draw dependency chain with render-to-texture surfaces as synchronization points. GPUs must exchange updated surface contents at this points to ensure data integrity Inter-GPU syncs. 5
7 Render-to-Use sync analysis RTT copies diverge Start sync operation Sync cycles RTT copies are synchronized Frame 0 Swap frame RTT copies diverge Frame 1 SFR GPU0 GPU1 Render[0] Use[0] Repeat Use[0] Use[1] Render[1] RtU intraframe RtU interframe dep AFR GPU0 GPU1 RTT copies diverge Render[0] RtU intraframe dep RtU interframe dep Use[1] Use[0] RTT copies are synchronized Sync cycles Repeat Use[1] Frame 0 Swap frame RTT copies diverge Render[1] Frame 1 Command Buffer events 6
8 Render-to-Use sync analysis Game/Timedemo Engine Release date Screen resolution Frames RTT surfaces % RTT time 3DMark06/Canyon Flight Proprietary 2006/01 16 x 12 (1AA) % FEAR/PerformanceTest LithTech 2005/10 16 x 12 (1AA) % Call Of Duty 2/carentan Proprietary 2005/10 16 x 12 (1AA) % Call Of Duty 2/demo5 Proprietary 2005/10 16 x 12 (1AA) % Company Of Heroes/Intro Essence 2006/09 16 x 12 (1AA) % Half Life 2 Lost Coast/VST Source 2005/10 25 x 16 (8AA) % BattleField 2142/suez canal Refractor2 2006/10 25 x 16 (1AA) % BattleField 2/abl-chini Refractor2 2005/06 16 x 12 (1AA) % 7
9 Render-to-Use sync analysis RTT Surface Id Intraframe (SFR) Interframe (AFR) RtU deps syncs RtU deps syncs 2-GPU Game/Timedemo 306 Engine Release Screen 7405 Frames 0 RTT % RTT0 307 date35722resolution surfaces 0 time 0 3DMark06/Canyon Flight 308Proprietary 2006/ x (1AA) % 0 FEAR/PerformanceTest 309LithTech 2005/ x (1AA) % 0 Call Of Duty 2/carentan 30aProprietary 2005/ x 12 (1AA) % 0 Call Of Duty 2/demo5 30bProprietary 2005/ x 12 (1AA) % 0 Company Of Heroes/Intro 30dEssence 2006/ x (1AA) % 9 Half Life 2 Lost Coast/VST30eSource 2005/ x (8AA) % 0 BattleField 2142/suez canal Refractor2 2006/10 25 x 16 (1AA) % BattleField 2/abl-chini Refractor2 2005/06 16 x 12 (1AA) % Track per-surface RtU dependencies and required syncs. The more intraframe syncs, the worse SFR performs. The more interframe syncs, the worse AFR performs. 8
10 The contributions of this work Which characteristics make 3D Games suitable for multi-gpu systems? Render-to-texture sync analysis. Do they enable any optimization? (See next section) Can we measure multi-gpu scaling based on 3D game workload characteristics? Using a simplified model. Using real 3D workload data. Evaluate SFR, AFR and combined modes (4+ GPUs). 9
11 RTT surface synchronization alternatives
12 Leverage RtU gap: Early Copy Render Render Draw Draw Use GPU0 GPU1 D D D D Determine last render: Use Prediction table, Look ahead command buffer. Early Start of sync operation Swap frame Game/ Timedemo syncs RtU gap wrt frame duration RtU sync Gap % Pixel shading bound 3DM % 52.85% FEAR % 65.08% COD2c % 96.28% COD2d % 94.63% COH % 96.80% HL % 21.95% BF % 87.16% BF % 44.44% Penalization cost = % of time the RTT draw is pixel shading bound Sync cycles (delayed copy) RtU gap cycles 11
13 Pixel Shading bounds: Concurrent Update Pixel Shader R600 Shader: Clock: 750MHz ALUs: 4 x 16 SIMD Pixel Shader Pixel Shader Shader output PCIe 2.0 x16 6GB/sec Shader output Shader output ROP ROP ROP Local Mem Local Mem Local Mem Local GPU Remote GPUs Shading a thousand fragments (35 instructions) in the R600 SPUs takes 1000 x 35 / 64 = 547 cycles. Sending the corresponding color outputs through the PCIe 2.0 bus takes 1000 x 4 bytes x 750 MHz / 6 GB/s = 500 cycles Penalization cost = Remote sent cycles Pixel shading cycles 12
14 Multi-GPU Performance Models
15 EMPATHY analysis tool API state changes Vtx HW counters (N/I) GPUs in SFR mode /(N/I) Game execution in real Hw Downscaled Pixel counters Real app fps (FRAPS) Correlation (for a GPU arch. File describing the same real hardware) Estimated exec time (Single GPU) GPU arch. description file: R600 API state changes ` execution trace Pixel HW counters Collect data Propietary Analytical tool Vtx HW counters Single GPU arch. description file: R600 GPU Interconnection BW I GPU clusters in AFR mode Propietary Analytical tool Min SFR scaled exec time Add Inter-GPU sync cost (SFR) EMPATHY /I Add Inter-GPU sync cost (AFR) N: Total GPUs I: AFR interleaving Multi-GPU exec time 14
16 Multi-GPU interconnection network CPU PCIe x16 Link CPU PCIe x16 Link FSB FSB SouthBridge NorthBridge MemBus DDR SouthBridge NorthBridge MemBus DDR GPU 0 GPU 1 GPU 0 PCIe bridge GPU 1 GPU 2 PCIe bridge GPU 3 GDDR FB Board 0 GDDR Board 1 FB Display Link GDDR FB Board 0 GDDR FB GDDR FB Board 1 GDDR FB Display Link SFR sync: all-to-all GPU simulatenous transfers. The NorthBridge PCIe ports become the bottleneck. Each GPU sees reduced peak BW as function of the number of GPUs: 15
17 Performance scaling models SFR GPU0 GPU1 Penalization cycles Frame 0 Frame 1 Frame 2 Total estimated multigpu cycles AFR GPU0 Frame 0 GPU1 GPU0 ½ frame initialization interval Frame 1 R S sync cycles U Penalization cycles frame 2 Frame 2 Total estimated multigpu cycles 16
18 Cycles Early copy + concurrent cost (SFR) Penalization cycles Penalization cycles comparison BF2 - Surface 31d delayed copy early copy concurrent update 0 Frames Choose per surface the best sync alternative each frame that incurs in the minimum penalization cycles. 17
19 Scaling Results
20 SFR scaling (early copy + concurrent update) % 50% 40% 30% 20% 10% 0% SFR performance gain for early + concurrent (2 GPUs) delayed copy early copy early + concurrent 3DM06 FEAR COD2c COD2d COH HL2 BF2142 BF2 Avg Avg performance speed-up gain normalized to delayed copy early copy early + concurrent 3DM06 FEAR COD2c COD2d COH HL2 BF2142 BF2 Game syncs RtU sync Gap % Pixel shading bound 3DM % 52.85% FEAR % 65.08% COD2c % 96.28% COD2d % 94.63% COH % 96.80% HL % 21.95% BF % 87.16% BF % 44.44% 19
21 % perfect scaling Combined SFR/AFR modes scaling results Avg multi-gpu efficiency wrt Perfect Scaling (100%) 100% 80% 60% 40% 20% 0% 2-GPUs 3-GPUs 4-GPUs 6-GPUs 8-GPUs 12-GPUs 16-GPUs Pure SFR SFR>AFR SFR = AFR SFR < AFR Pure AFR SFR scaling was tested using Early Copy + Concurrent Update optimization. Low interframe syncs benefits mostly AFR configurations. Game Syncs i2 Interframe Syncs i4 3DM FEAR 0 0 COD2c 0 0 COD2d 0 0 Game Syncs i2 Interframe Syncs i4 COH HL BF BF
22 Conclusion
23 Conclusion Inter-GPU synchronization requirements of render-to-texture surfaces in 3D games impact multi-gpu performance/scaling. This work has evaluated the potential benefits of two proposed sync alternatives based on RTT update anticipation for a set of popular DX9 games. Leverage of the RtU gap and the pixel shading cost increases SFR scaling. This work has shown a simple multi-gpu performance analytic model based on real 3D game execution data, that allows to evaluate SFR, AFR and combined rendering modes. Observed low interframe syncs benefit mostly AFR configurations. 22
24 Thank you!
Windowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationRADEON X1000 Memory Controller
RADEON X1000 Memory Controller Ring Bus Memory Controller Supports today s fastest graphics memory devices GDDR3, 48+ GB/sec 512-bit Ring Bus Simplifies layout and enables extreme memory clock scaling
More informationAnatomy of AMD s TeraScale Graphics Engine
Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream
More informationA SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization
A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization Jordi Roca Victor Moya Carlos Gonzalez Vicente Escandell Albert Murciego Agustin Fernandez, Computer Architecture
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationTechnical Report. SLI Best Practices
Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling
More informationH.264 Decoding. University of Central Florida
1 Optimization Example: H.264 inverse transform Interprediction Intraprediction In-Loop Deblocking Render Interprediction filter data from previously decoded frames Deblocking filter out block edges Today:
More informationLPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)
A practitioner s view of challenges faced with power and performance on mobile GPU Prashant Sharma Samsung R&D Institute UK LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) SERI
More informationMattan Erez. The University of Texas at Austin
EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 12 GPU Architecture (NVIDIA G80) Mattan Erez The University of Texas at Austin Outline 3D graphics recap and
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationPROFESSIONAL VR: AN UPDATE. Robert Menzel, Ingo Esser GTC 2018, March
PROFESSIONAL VR: AN UPDATE Robert Menzel, Ingo Esser GTC 2018, March 26 2018 NVIDIA VRWORKS Comprehensive SDK for VR Developers GRAPHICS HEADSET TOUCH & PHYSICS AUDIO PROFESSIONAL VIDEO 2 NVIDIA VRWORKS
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex
More informationScanline Rendering 2 1/42
Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms
More informationThe NVIDIA GeForce 8800 GPU
The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline
More informationReal - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský
Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application
More informationTechnical Report. SLI Best Practices
Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationOptimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationStreaming Massive Environments From Zero to 200MPH
FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationNVIDIA Parallel Nsight. Jeff Kiel
NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more
More informationASYNCHRONOUS SHADERS WHITE PAPER 0
ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped
More informationTechnical Brief. AGP 8X Evolving the Graphics Interface
Technical Brief AGP 8X Evolving the Graphics Interface Increasing Graphics Bandwidth No one needs to be convinced that the overall PC experience is increasingly dependent on the efficient processing of
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationScheduling the Graphics Pipeline on a GPU
Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Today Real-time 3D graphics workload metrics Scheduling the graphics pipeline on a modern GPU Quick aside: tessellation Triangle
More informationSqueezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques
Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI
More informationGraphics Performance Optimisation. John Spitzer Director of European Developer Technology
Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance
More informationMobile HW and Bandwidth
Your logo on white Mobile HW and Bandwidth Andrew Gruber Qualcomm Technologies, Inc. Agenda and Goals Describe the Power and Bandwidth challenges facing Mobile Graphics Describe some of the Power Saving
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationAccelerating Realism with the (NVIDIA Scene Graph)
Accelerating Realism with the (NVIDIA Scene Graph) Holger Kunz Manager, Workstation Middleware Development Phillip Miller Director, Workstation Middleware Product Management NVIDIA application acceleration
More informationHardware-driven Visibility Culling Jeong Hyun Kim
Hardware-driven Visibility Culling Jeong Hyun Kim KAIST (Korea Advanced Institute of Science and Technology) Contents Introduction Background Clipping Culling Z-max (Z-min) Filter Programmable culling
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin
EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationXbox 360 Architecture. Lennard Streat Samuel Echefu
Xbox 360 Architecture Lennard Streat Samuel Echefu Overview Introduction Hardware Overview CPU Architecture GPU Architecture Comparison Against Competing Technologies Implications of Technology Introduction
More informationOptimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research
Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of
More informationNVIDIA nforce IGP TwinBank Memory Architecture
NVIDIA nforce IGP TwinBank Memory Architecture I. Memory Bandwidth and Capacity There s Never Enough With the recent advances in PC technologies, including high-speed processors, large broadband pipelines,
More informationTEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems
International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer
More informationReal-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university
Real-Time Buffer Compression Michael Doggett Department of Computer Science Lund university Project 3D graphics project Demo, Game Implement 3D graphics algorithm(s) C++/OpenGL(Lab2)/iOS/android/3D engine
More informationVulkan Multipass mobile deferred done right
Vulkan Multipass mobile deferred done right Hans-Kristian Arntzen Marius Bjørge Khronos 5 / 25 / 2017 Content What is multipass? What multipass allows... A driver to do versus MRT Developers to do Transient
More information4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.
Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationMany rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.
1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationLecture 25: Board Notes: Threads and GPUs
Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationIntroduction to Multicore architecture. Tao Zhang Oct. 21, 2010
Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)
More informationGRAPHICS PROCESSING UNITS
GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011
More informationECE 571 Advanced Microprocessor-Based Design Lecture 18
ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last
More informationSpring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett
Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationCENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL
CENG 477 Introduction to Computer Graphics Graphics Hardware and OpenGL Introduction Until now, we focused on graphic algorithms rather than hardware and implementation details But graphics, without using
More informationGraphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal
Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High
More informationCOMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.
COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationGraphics Hardware. Instructor Stephen J. Guy
Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationNVIDIA Case Studies:
NVIDIA Case Studies: OptiX & Image Space Photon Mapping David Luebke NVIDIA Research Beyond Programmable Shading 0 How Far Beyond? The continuum Beyond Programmable Shading Just programmable shading: DX,
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationLecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013
Lecture 9: Deferred Shading Visual Computing Systems The course so far The real-time graphics pipeline abstraction Principle graphics abstractions Algorithms and modern high performance implementations
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationECE 571 Advanced Microprocessor-Based Design Lecture 21
ECE 571 Advanced Microprocessor-Based Design Lecture 21 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 9 April 2013 Project/HW Reminder Homework #4 comments Good job finding references,
More informationIntroduction to Parallel Programming For Real-Time Graphics (CPU + GPU)
Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Aaron Lefohn, Intel / University of Washington Mike Houston, AMD / Stanford 1 What s In This Talk? Overview of parallel programming
More informationArchitectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1
Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics
More informationShrinath Shanbhag Senior Software Engineer Microsoft Corporation
Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade
More informationMali Developer Resources. Kevin Ho ARM Taiwan FAE
Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture
More informationComparing Reyes and OpenGL on a Stream Architecture
Comparing Reyes and OpenGL on a Stream Architecture John D. Owens Brucek Khailany Brian Towles William J. Dally Computer Systems Laboratory Stanford University Motivation Frame from Quake III Arena id
More informationMulti-Frame Rate Rendering for Standalone Graphics Systems
Multi-Frame Rate Rendering for Standalone Graphics Systems Jan P. Springer Stephan Beck Felix Weiszig Bernd Fröhlich Bauhaus-Universität Weimar Abstract: Multi-frame rate rendering is a technique for improving
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationThe Rasterization Pipeline
Lecture 5: The Rasterization Pipeline (and its implementation on GPUs) Computer Graphics CMU 15-462/15-662, Fall 2015 What you know how to do (at this point in the course) y y z x (w, h) z x Position objects
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationMobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools
Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp
More informationOverview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
More informationAntonio R. Miele Marco D. Santambrogio
Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First
More informationSLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES
SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ OVERVIEW Motivation Tools of the trade Multi-GPU driver functions Multi-GPU programming functions Multi threaded
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationMoving Mobile Graphics Advanced Real-time Shadowing. Marius Bjørge ARM
Moving Mobile Graphics Advanced Real-time Shadowing Marius Bjørge ARM Shadow algorithms Shadow filters Results Agenda SHADOW ALGORITHMS Shadow Algorithms Shadow mapping Depends on light type Directional,
More informationNext-Generation Graphics on Larrabee. Tim Foley Intel Corp
Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee
More informationFRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS
FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS Chris Wyman, Rama Hoetzlein, Aaron Lefohn 2015 Symposium on Interactive 3D Graphics & Games CONTRIBUTIONS Full scene, fully dynamic alias-free
More informationDirect3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation
Direct3D API Issues: Instancing and Floating-point Specials Cem Cebenoyan NVIDIA Corporation Agenda Really two mini-talks today Instancing API Usage Performance / pitfalls Floating-point specials DirectX
More informationRendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane
Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world
More informationReal-World Applications of Computer Arithmetic
1 Commercial Applications Real-World Applications of Computer Arithmetic Stuart Oberman General purpose microprocessors with high performance FPUs AMD Athlon Intel P4 Intel Itanium Application specific
More informationEECS 487: Interactive Computer Graphics
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationToday s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips
Today s Agenda DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips Optimization for DirectX 9 Graphics Mike Burrows, Microsoft - Performance
More informationECE 574 Cluster Computing Lecture 16
ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics
More informationOrad s DVG : solutions for scalable graphics clusters
Orad s DVG : solutions for scalable graphics clusters images courtesy MPI, Barco Scalable architecture e.g. : compositing 3 cards to one channel in HP workstation Hot3D presentations 2 1 The DVG architecture
More informationPart IV. Review of hardware-trends for real-time ray tracing
Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing
More informationParallel Programming for Graphics
Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models
More informationLecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)
Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications
More informationGPU A rchitectures Architectures Patrick Neill May
GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know
More information