Multicore Strategies for Games. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Similar documents
Understanding XNA Framework Performance. Shawn Hargreaves Software Development Engineer XNA Community Game Platform Microsoft

Operating Systems. Synchronization

The Art and Science of Memory Allocation

CSE 153 Design of Operating Systems

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

Intel Threading Tools

Real World Multithreading in PC Games Case Studies

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Rsyslog: going up from 40K messages per second to 250K. Rainer Gerhards

Streaming Massive Environments From Zero to 200MPH

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

5. Synchronization. Operating System Concepts with Java 8th Edition Silberschatz, Galvin and Gagn

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Synchronization I. Jo, Heeseung

Deadlock Prevention. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

Introduction to Multithreading. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Concurrency, Thread. Dongkun Shin, SKKU

Multithreading in C with OpenMP

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

CS3350B Computer Architecture

About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.

Dealing with Issues for Interprocess Communication

CS 261 Fall Mike Lam, Professor. Threads

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Fall 2014:: CSE 506:: Section 2 (PhD) Threading. Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)

Parallel Programming: Background Information

Parallel Programming: Background Information

CMSC 330: Organization of Programming Languages

Concurrent & Distributed Systems Supervision Exercises

The New Java Technology Memory Model

CS 153 Design of Operating Systems Winter 2016

Module 1. Introduction:

The University of Texas at Arlington

OS Structure. User mode/ kernel mode (Dual-Mode) Memory protection, privileged instructions. Definition, examples, how it works?

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 153 Design of Operating Systems Fall 2018

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Programmazione di sistemi multicore

OpenACC Course. Office Hour #2 Q&A

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

Vulkan Timeline Semaphores

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Multiprocessor Systems. COMP s1

Introduction to Parallel Programming Part 4 Confronting Race Conditions

Application Programming

Multithreading and Interactive Programs

! Readings! ! Room-level, on-chip! vs.!

Multiprocessor Systems. Chapter 8, 8.1

PERFORMANCE. Rene Damm Kim Steen Riber COPYRIGHT UNITY TECHNOLOGIES

OS Assignment II. The process of executing multiple threads simultaneously is known as multithreading.

Parallel Programming using OpenMP

Parallel Numerical Algorithms

Parallel Programming using OpenMP

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

Midterm Exam Amy Murphy 6 March 2002

Introduction to OpenMP

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Multi-core Architecture and Programming

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Chap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

QUESTION BANK UNIT I


Parallel Programming on Larrabee. Tim Foley Intel Corp

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

ECE 574 Cluster Computing Lecture 8

Chapter 3 Parallel Software

Computer Science 146. Computer Architecture

CPS 310 second midterm exam, 11/6/2013

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Page 1. Challenges" Concurrency" CS162 Operating Systems and Systems Programming Lecture 4. Synchronization, Atomic operations, Locks"

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

The Art and Science of Memory Alloca4on

Review: Easy Piece 1

Chapter 6: Process Synchronization. Operating System Concepts 9 th Edit9on

Threads. Threads The Thread Model (1) CSCE 351: Operating System Kernels Witawas Srisa-an Chapter 4-5

Chapter 4: Multi-Threaded Programming

A brief introduction to OpenMP

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

Hello! Welcome to Destiny s Multi-threaded Renderer Architecture talk. My name is Natalya Tatarchuk, and I m a graphics engineering architect at

Could you make the XNA functions yourself?

The University of Texas at Arlington

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Synchronization for Concurrent Tasks

Midterm Exam Amy Murphy 19 March 2003

CS 167 Final Exam Solutions

OPENMP TIPS, TRICKS AND GOTCHAS

Parallelizing Compilers

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Transcription:

Multicore Strategies for Games Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Bad multithreading Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Good multithreading Physics Game Thread Main Thread Rendering Thread Animation/ Skinning Particle Systems Networking File I/O Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Another paradigm: cascades Thread 1: Input Thread 2: Physics Thread 3: AI Thread 4: Rendering Thread 5: Present Advantages: Synchronization points are few and well-defined Disadvantages: Increases latency (for constant frame rate) Needs simple (one-way) data flow For balance, each chunk needs to take a similar amount of time Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Typical task: file decompression Most common CPU heavy thread on the Xbox 360 Easy to multithread Allows use of aggressive compression to improve load times Don t throw a thread at a problem better solved by offline processing Texture compression, file packing, etc. Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Typical task: rendering Separate update and render threads Rendering on multiple threads usually works poorly GPU can have trouble if multiple threads try to talk to it at once (Xbox 360 command buffers are supposed to be OK) Special case of cascades paradigm Pass render state from update to render Slideadapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Separate rendering thread Update Thread Buffer 0 Buffer 1 Render Thread Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Typical task: graphics fluff Extra graphics that doesn t affect play Procedurally generated animating cloud textures Cloth simulations Procedurally generated vegetation, etc. Extra particles, better particle physics, etc. Can run at lower frame rate Easy to synchronize One game had one thread manipulating cloth, another thread handling cloth shadows On single-core machines, can drop or simplify the fluff without effecting gameplay Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Typical tasks: physics? Could cascade from update to physics to rendering Makes use of three threads May be too much latency Could run physics on many threads Uses many threads while doing physics May leave threads mostly idle elsewhere Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Careful with simultaneous multi-threading Not the same as double the number of cores Can give a small performance boost if first thread is underutilizing execution resources because of dependency stalls Can cause a performance drop Two threads may fight over L1 cache Can avoid scheduler latency Have a thread that is ready to run but OS waits for current scheduling quantum to expire before running the thread Hardware threads can wake up faster; works well if you have a thread that mostly sleeps but needs to wake quickly on demand Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

How many threads? No more than one CPU intensive software thread per core 3-6 on Xbox 360 1-? On PC (1-4 for now, need to query) Too many busy threads adds complexity and lowers performance Context switches are not free Can have many non-cpu intensive threads I/O threads that block, or intermittent tasks Slide from from Bruce Dawson & Chuck Walbourn, Microsoft Game

Rare s Kameo Screenshots from www.rareware.com 12

Case study: Kameo (1) Started out as single threaded Was going to be an original Xbox game, but decided to and make it a 360 launch title CPU usage split was 51/49 for update/render, so rendering was put on separate thread Two render-description buffers created to communicate from update to render Linear read/write access for best cache usage Doesn t copy const data Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Case study: Kameo (2) Decompression thread: Saved space on DVD and improved load times Cost was some spare CPU cycles Actually two threads for file I/O One for reading and one for decompressing, because some calls can block for ~0.5s doing directory lookups Multithreading added about six months before launch - but it worked! Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Case study: Kameo (3) Core Thread Software threads 0 80-99% 1 50% 2 80-99% 0 Game update 1 File I/O 0 Rendering 1 0 XAudio 1 File decompression Total usage was ~2.2-2.5 cores Screenshot from www.rareware.com Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Bizarre Creations Project Gotham Racing 3 See http://media.xbox360.gamespy.com/media/741/741362/vids_1.html for movie clips Screenshot from projectgothamracing3.com/screenshots

Case Study: Project Gotham Racing 3 Screenshot from projectgothamracing3.com/screenshots Core Thread Software threads 0 1 2 0 Update, physics, rendering, UI 1 Audio update, networking 0 Crowd update, texture decompression 1 Texture decompression 0 XAudio 1 Total usage was ~2.0-3.0 cores Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game 17

Available synchronization objects Critical sections (locks) Semaphores (alas not in XNA) Mutexes Don t suspend threads Some games have used this for synchronization Can easily lead to deadlocks Interacts badly with Visual Studio debugger Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft Game

Synchronization tips/costs: Synchronization is moderately expensive when there is no contention Hundreds to thousands of cycles Synchronization can be arbitrarily expensive when there is contention! Goals: Synchronize rarely Hold locks briefly Minimize shared data Slide from Bruce Dawson & Chuck Walbourn, Microsoft Game

Avoid effective single-threading Requiring exclusive access to a popular resource can make multi-threading a complex way of doing single-threading on multiple threads Want to use synchronization primitives to guarantee multiple threads won t modify resources simultaneously, while designing so that they generally won't anyway. Notes from Bruce Dawson & Chuck Walbourn, Microsoft Game

Beware hidden synchronization Memory allocation (i.e., malloc in C) All sorts of ways to alleviate the problem File access Using D3DCREATE_MULTITHREADED if developing with unmanaged code False sharing - artefact of cache structure Performance issue, not a correctness issue Bruce Dawson, Multicore Memory Coherence: The Hidden Perils of Sharing Data, PowerPoint presentation! Information from Bruce Dawson & Chuck Walbourn, Microsoft Game

Things to avoid Threads terminating other threads Can t do it on Xbox 360, discouraged on Windows Mutexes Aren t as fast as critical section locks Information from Bruce Dawson & Chuck Walbourn, Microsoft Game

Lockless programming Spin locks Write-release/read-acquire semantics Interlocked instructions Difficult to get right: Very hard for native C++ Xbox 360 coding.net makes some of this easier Bruce Dawson, Lockless Programming Considerations for Xbox 360 and Microsoft Windows msdn2.microsoft.com/en-us/library/bb310595.aspx Information from Bruce Dawson & Chuck Walbourn, Microsoft Game

What about OpenMP? #pragma omp parallel default(none) shared(n,x,y) private(i)! {! #pragma omp for! for (i=0; i < n; i++)!!x[i] += y[i];! } Industry tends to shy away from OpenMP and similar solutions Prefers more direct control (Example from somewhere on web; can t remember where)

XNA specific notes (1) GraphicsDevice is somewhat thread-safe Cannot render from more than one thread at a time Can create resources and SetData while another thread renders ContentManager is not thread-safe OK to have multiple instances, but only one per thread Input is not threadable Windows games must read input on the main game thread Audio and networking are thread-safe Slide from Shawn Hargreaves, Understanding XNA Framework Performance

XNA specific notes (2) Catalin s suggestion: Keep rendering on main thread (Thread 1 on Xbox 360) Game class does some behind-thescenes graphics stuff Great article: Catalin Zima, Multi-threading for your XNA Game, http://www.ziggyware.com/readarticle.php?article_id=221

Common mistake Creating a new thread on every iteration of the game loop Creating and releasing threads has a lot of overhead especially if you are running in Visual Studio (i.e. in the debugger ) and especially if you are running on the Xbox 360 from Visual Studio Better to create the threads you need at the beginning

Take a step back Always ask: should I be doing this on the CPU at all? GPU has ridiculous amounts of computing power Look for tasks with high compute per CPU- GPU communication ratio HLSL is HLSL whether you re using managed or unmanaged code on the CPU