SPU Shaders. Mike Acton Engine Director Insomniac Games

Size: px
Start display at page:

Download "SPU Shaders. Mike Acton Engine Director Insomniac Games"

Transcription

1 SPU Shaders Mike Acton Engine Director Insomniac Games

2 State of Affairs Engine Systems on SPUs SPU Optimization Understood Remaining Systems Planned Out Still Have SPU Time to Spare PPU Still Driving The Game Arguing for More Parallelism Want More Customization, Iteration Insomniac Games, 2007

3 Needed Strategy Bridging the Cell Gap Reducing synchronization points Better data flow Encouraging SPU use by more systems and programmers. Keeping SPU code straightforward, fast and optimizable. Get as much on the SPUs as possible. More customization, iteration. Ultimately: Kill the main update loop.

4 What are SPU Shaders? SPU Shaders are: Fragments of code used in a larger system Code is injected at location pre-determined by system. Custom for any particular system. Custom modifications of system data. Feedback to other systems outside the scope of the current system.

5 What are SPU Shaders? Like scripts... Like callbacks... Like messages... Like overlays... Like iterators... BUT! On the SPU, and very simple.

6 What are SPU Shaders? SPU Shaders are NOT: Generic, general purpose system. A system of any kind, actually.

7 What are SPU Shaders? Why is it called a shader? Shares important similarities to GPU shaders. Native code fragments Part of a larger system In-context execution Independently optimizable Most important: Concept is approachable.

8 What are SPU Shaders? SPU Shaders as policy: Simplicity by force. Don't try to solve everyone's problems Solutions that try to solve all problems tend to cause more problems than they solve.

9 Advantages Easy to implement Put the programmer in the right place at the right time. Programmer writes to SPU, not to software layer. Core performance issues still managed by systems programmers. Fragments are optimizable. Insomniac Games, 2007

10 Advantages Systems don't need to provide functionality for every possible case More power in the hands of the gameplay programmers. More communication between gameplay and engine. Less work-around and more work-within. Insomniac Games, 2007

11 Advantages Add functionality without modifying the system. Less risk to core. Less risk to other systems. Less risk to other shaders. Insomniac Games, 2007

12 Advantages And the obvious... It's not on the PPU. It's faster. Insomniac Games, 2007

13 Costs Forced to think about data layout and synchronization. (But that's a good thing.) Haphazard access isn't going to work. Debugging can be trickier. Need manage shaders As code in project, linked-in? As data stored with instance data? How to reference shaders? Insomniac Games, 2007

14 Easy To Implement Pick stage(s) in system kernel to inject shaders. Define available inputs and outputs. Collect common functions. Compile shaders as data. Sort instance data based on shader type(s) Load shader on-demand based on data select. Call shaders.

15 Easy To Implement What data is being transformed? What are the inputs? What are the outputs? What can be modified?

16 Easy To Implement Collect the common functions... Always loaded by the system e.g. Dma wrapper functions Debugging functions Common transformation functions

17 struct CommonFunctions { } PrintfProc* m_print_proc; PrintVectorProc* m_print_vector; PrintIntegerProc* m_print_integer; PrintFloatProc* m_print_float; PrintMatrix4Proc* m_print_mtx4; PrintMatrix3Proc* m_print_mtx3; DmaGetProc* m_dma_get; DmaPutProc* m_dma_put;

18 struct common_t { void (*print_str)(const char *str); void (*dma_wait)(uint32_t tag); void (*dma_send)(void *ls, uint32_t ea, uint32_t size, uint32_t tag); void (*dma_recv)(void *ls, uint32_t ea, uint32_t size, uint32_t tag); char* ls; uint32_t ls_size; uint32_t data_ea; uint32_t data_size; };

19 Easy To Implement System Shader Configuration... System doesn't know what the fragments do. Fragments are in main RAM. Fragments don't need to be fixed. System knows where the fragments are. System knows when to call the fragments.

20 struct config_t { uint32_t max_frag_size; // so we can do double buffering uint32_t frag_count; // number of fragments in the list uint32_t frags_ea; // EA of list of fragments uint32_t pad_0; };

21 Easy To Implement System Shader Configuration. Manage fragment memory: Simplest method: Double buffer, On-demand, Fixed maximum size, By-index from array,...

22 struct fragment_t { uint32_t code_ea; // fragment's code EA uint32_t code_size; uint32_t entry_point; // in bytes, relative to code_ea uint32_t data_ea; uint32_t data_size; uint32_t pad[3]; }; // data in main RAM the fragment wants to process

23 Easy To Implement System Shader Configuration. Manage fragment memory: Alternate methods: Allocated, cached fragments (Lots of small shaders) Fixed locations (Offline analysis)... The best solution is specific to the system.

24 Easy To Implement Create the shader code... Code is just data Overlays or additional jobs are too complex and heavyweight. No special distinquishing feature on the SPUs Just want load and execute. No special system needed.

25 Easy To Implement Create the shader code.. Method 1: Shader as PPU header* Compile shader as normal, to obj file. Dump obj file using spu-objdump Convert dump to header using script. * This is what we're using now.

26 Easy To Implement Create the shader code.. Method 2: Shader as data Strip code from obj file and store as data. Put into data pipeline not code pipeline.

27 Easy To Implement Create the shader code.. Method 3: Use obj file Method 4: Use elf file Just pull code from obj file at runtime Requires extra compile step, but probably more debugger friendly. Other methods too, use whatever works for you.

28 Easy To Implement Calling the shader... Nothing could be easier. ShaderEntry* shader = (addr of fragment); shader( data, common );

29 Debugging Shaders Fragments are small Fragments have well defined inputs and outputs. Ideal for unit tests in separate framework. Test on PS3/Linux box.

30 Debugging Shaders Runtime debugging: Currently shaders have no debug info. Step through assembly. Often OK. Shaders are simple anyway, If written with intrinsics, not much difference. Alternatives: Debug on PPU (intrinsics are portable) Temporarily link in shader.

31 SPU Shader Rules Rule 1: Don't Manage Data for Shaders Just give shaders a buffer and fixed size. Shaders should depend on size, so leave room for system changes. Best size depends on system. (Maybe 4K, maybe 32K) Don't read or write from/to shader buffer. Insomniac Games, 2007

32 SPU Shader Data System-specific Shader-internal ( local ) Multiple list of instances to modify or transform Context data EA passed by system Fixed buffer Shader shared ( global ) EA passed by system Zero'd on system initialization Insomniac Games, 2007

33 SPU Shader Rules Rule 2: Don't Manage DMA for Shaders Give fixed number of DMA tags to shader Give DMA functions to shaders Grab them in the entry function and pass down) Avoid: GetDmaTagFromParentSystem() To allow system to run with any job manager, or none Don't use shader tags for other purposes Insomniac Games, 2007

34 SPU Shader Rules Rule 3: Enforce fixed maximum size for Shader code. System can be maintained. Rule 4: Shaders are always called in a clear, well defined context. i.e. Part of a larger system. Insomniac Games, 2007

35 SPU Shader Rules Rule 5: Fixed parameter list for shaders, persystem (or sub-system) Don't want to re-compile all shaders. Don't want to manage dynamic parameter lists. Rule 6: Shaders should be given as many instances as possible. More optimizable. Insomniac Games, 2007

36 SPU Shader Rules Rule 7: Don't break the rules. You'll end up with a new job manager. You'll end up re-inventing SPURS. You'll end up with a big headache. Insomniac Games, 2007

37 SPU Shader Guidelines Keep instance data contiguous Keep instance data separate from PPU data. Can pack/compress data. Don't fight with synchronization issues Keep shaders simple. Shaders should perform one simple tranformation (preferably branch-free) Switch shaders for new function. e.g. One shader per AI state. Insomniac Games, 2007

38 SPU Shader Guidelines Don't use globals in shaders. Requires code fixup Makes size less predictable Only use buffer provided by the system. Roll frequently used functions back into kernel Give space back to shaders. Insomniac Games, 2007

39 Integration with schedulers How are fragments scheduled? The aren't. The parent system is. Fragments are loaded and used on-demand. So, how is the parent system scheduled Enter SPURS rant..

40 Scheduling with SPURS SPURS... Wants to be a general purpose scheduler. Wants to solve all the allocation issues. Wants to be flexible for every possible case. Is overcomplicated. Is unnecessary. (With apologies to the SPURS team.)

41 Scheduling with SPURS How many parent systems are actually being scheduled? Dozens at best. Not thousands. There are always higher-level scheduling needs. Exactly like a general purpose memory allocator. Any scheduling needs, just like any dynamic memory needs, are going to be very simple. * Decide where in the frame the system will run, and for how long. Then stop.

42 Transitioning to Shaders Example from RCF: (FastPathFollower) Started with typical CPP update pattern: On PPU FastPathFollower* was Update instance All Updates called in a (sorted) list. (Sorted for icache hits) * Are classes derived from FastPathFollower Insomniac Games, 2007

43 Transitioning to Shaders Where there's one, there's more than one Grouped all FastPathFollowers together. Removed from Update list. Made single function to update all together. Update function now ignored. Insomniac Games, 2007

44 Transitioning to Shaders Know the inputs and outputs Most inputs were read-only Path update was read-write, but only here. Outputs: e.g. State (Exploded, Crashed, etc.) Animation (Change anim, frame, etc.) Collision database (Move, Update) A few more... Insomniac Games, 2007

45 Transitioning to Shaders Minimize Synchronization Points Collected all outputs to command buffers (Arrays of minimum data to execute) Advantage: Static cost limit Disadvantage: Always used space for worstcase (But small, so OK) e.g. Maximum number of exploded states e.g. Non-optional state changes need max instance count x command size. Executed command buffers after update. Insomniac Games, 2007

46 Transitioning to Shaders Minimize Synchronization Points Put FPF Update in separate PPU thread. Overlapped with standard Update hierarchy (Yes, time was completely swallowed but only because Update hierarchy has many dcache misses) Insomniac Games, 2007

47 Transitioning to Shaders Customization i.e. Proto-Shader Added callback function for customizations Moved all derived update functionality into callbacks. During update, registered callbacks placed into new command buffer for second pass. (Lose some dcache coherency, but overall better due to...) Insomniac Games, 2007

48 Transitioning to Shaders Limit Customization Static limit on custom callback command buffer size. So - Custom callbacks may not be called every frame. i.e. Selectively derived classes based on performance. (And can potentially select LOD callbacks.) Insomniac Games, 2007

49 Transitioning to Shaders Proto-Shader to Real Shader Now have async update on PPU with protoshader (w/ minimal code changes) Used as template for complete re-write: Today: AsyncMobyUpdate (Joe Valenzula) More generic (Not just one class) Update loop on SPU Lessons from Physics shaders And Joe's own contributions to the concept Insomniac Games, 2007

50 Use Cases Middleware as SPU Shaders Middleware can provide: i.e. SPURS is not the only solution to Middleware. PPU library (init, state changes, etc.) SPU shader(s) for update User provides: SPU entry (main) SPU LS buffer, dma tags, dma functions, and EA to main RAM data.

51 Use Cases Middleware as SPU Shaders Advantages: User can use any job manager (or none) Middleware is only does it's job. User in control of scheduling. Disadvantages: User spends less time bitching about SPURS....oh wait.

52 Use Cases System Pipeline Stage Management e.g. IgPhysics (Eric Christensen) Fixed pipeline, each stage as a shader. Shaders loaded in order by the kernel. Certain stages will themselves load shaders to manage specific data types. Result Completely deferred system, 2x speedup overall. (Mostly due to reduction of sync points)

53 Use Cases Others: Precondition data for GPU shaders Special Effects Animation Customizations Most PPU Code :)

54 Contact Info Mike Acton See also:

55 Credit Thanks guys for putting the SPU Shader ideas to the test and for your comments: Eric Christensen Joe Valenzuela Principal Engine Programmer Engine Programmer André De Leiradella Consultant for Engine Team

56 Credit Also thanks for the feedback from the guys on the Beyond3D CellPerformance forums: ebola, minty, patsu, Shifty Geezer (if just to tell me he didn't like the name), and LordOfThePing.

Developing Technology for Ratchet and Clank Future: Tools of Destruction

Developing Technology for Ratchet and Clank Future: Tools of Destruction Developing Technology for Ratchet and Clank Future: Tools of Destruction Mike Acton, Engine Director with Eric Christensen, Principal Programmer Sideline:

More information

Lecture 16. Introduction to Game Development IAP 2007 MIT

Lecture 16. Introduction to Game Development IAP 2007 MIT 6.189 IAP 2007 Lecture 16 Introduction to Game Development Mike Acton, Insomiac Games. 6.189 IAP 2007 MIT Introduction to Game Development (on the Playstation 3 / Cell ) Mike Acton Engine Director, Insomniac

More information

Visit me on the Web: Follow me on Twitter: Connect with me on LinkedIn:

Visit me on the Web:   Follow me on Twitter:   Connect with me on LinkedIn: Visit me on the Web: http://www.terrancecohen.com Follow me on Twitter: http://twitter.com/terrance_cohen Connect with me on LinkedIn: http://www.linkedin.com/in/terrancecohen Ask me a question: http://www.formspring.me/terrancecohen

More information

SPU Render. Arseny Zeux Kapoulkine CREAT Studios

SPU Render. Arseny Zeux Kapoulkine CREAT Studios SPU Render Arseny Zeux Kapoulkine CREAT Studios arseny.kapoulkine@gmail.com http://zeuxcg.org/ Introduction Smash Cars 2 project Static scene of moderate size Many dynamic objects Multiple render passes

More information

Caching Basics. Memory Hierarchies

Caching Basics. Memory Hierarchies Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby

More information

The Art and Science of Memory Allocation

The Art and Science of Memory Allocation Logical Diagram The Art and Science of Memory Allocation Don Porter CSE 506 Binary Formats RCU Memory Management Memory Allocators CPU Scheduler User System Calls Kernel Today s Lecture File System Networking

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

Physics Parallelization. Erwin Coumans SCEA

Physics Parallelization. Erwin Coumans SCEA Physics Parallelization A B C Erwin Coumans SCEA Takeaway Multi-threading the physics pipeline collision detection and constraint solver Refactoring algorithms and data for SPUs Maintain unified multi-platform

More information

Memory Management: The process by which memory is shared, allocated, and released. Not applicable to cache memory.

Memory Management: The process by which memory is shared, allocated, and released. Not applicable to cache memory. Memory Management Page 1 Memory Management Wednesday, October 27, 2004 4:54 AM Memory Management: The process by which memory is shared, allocated, and released. Not applicable to cache memory. Two kinds

More information

Insomniac Physics. Eric Christensen GDC 2009

Insomniac Physics. Eric Christensen GDC 2009 Insomniac hysics Eric Christensen GDC 2009 Overview Go over the evolution of IG physics system Shaders Library Shaders Custom event shaders Original Design Resistance: Fall of Man orted From C to S3 U

More information

Visual Profiler. User Guide

Visual Profiler. User Guide Visual Profiler User Guide Version 3.0 Document No. 06-RM-1136 Revision: 4.B February 2008 Visual Profiler User Guide Table of contents Table of contents 1 Introduction................................................

More information

Assignment 6: The Power of Caches

Assignment 6: The Power of Caches Assignment 6: The Power of Caches Due by: April 20, 2018 before 10:00 pm Collaboration: Individuals or Registered Pairs (see Piazza). It is mandatory for every student to register on Piazza. Grading: Packaging

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Network Simulator Project Guidelines Introduction

Network Simulator Project Guidelines Introduction Network Simulator Project Guidelines Introduction Project TAs: Ruijia Sun (rsun@caltech.edu), Zilong Chen (zcchen@caltech.edu) During the CS143 course, you will learn about the mechanics of communication

More information

Programming with Haiku

Programming with Haiku Programming with Haiku Lesson 4 Written by DarkWyrm All material 2010 DarkWyrm Source Control: What is It? In my early days as a developer on the Haiku project I had troubles on occasion because I had

More information

CS 3330 Final Exam Spring 2016 Computing ID:

CS 3330 Final Exam Spring 2016 Computing ID: S 3330 Spring 2016 Final xam Variant O page 1 of 10 mail I: S 3330 Final xam Spring 2016 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly:

More information

Advanced Programming & C++ Language

Advanced Programming & C++ Language Advanced Programming & C++ Language ~6~ Introduction to Memory Management Ariel University 2018 Dr. Miri (Kopel) Ben-Nissan Stack & Heap 2 The memory a program uses is typically divided into four different

More information

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration Who am I? I m a python developer who has been working on OpenStack since 2011. I currently work for Aptira, who do OpenStack, SDN, and orchestration consulting. I m here today to help you learn from my

More information

CPSC 320 Sample Solution, Playing with Graphs!

CPSC 320 Sample Solution, Playing with Graphs! CPSC 320 Sample Solution, Playing with Graphs! September 23, 2017 Today we practice reasoning about graphs by playing with two new terms. These terms/concepts are useful in themselves but not tremendously

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2017 Previous class What is logical address? Who use it? Describes a location in the logical memory address space Compiler

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between MITOCW Lecture 10A [MUSIC PLAYING] PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between all these high-level languages like Lisp and the query

More information

1 Getting used to Python

1 Getting used to Python 1 Getting used to Python We assume you know how to program in some language, but are new to Python. We'll use Java as an informal running comparative example. Here are what we think are the most important

More information

Game Development for

Game Development for Game Development for Who am I? Harry Krueger Senior Programmer at Housemarque Games Lead Programmer on Resogun Presentation Flow Intro to Housemarque and Resogun Housemarque Engine and Tools Platform-specific

More information

REMOTE PROCEDURE CALLS EE324

REMOTE PROCEDURE CALLS EE324 REMOTE PROCEDURE CALLS EE324 Administrivia Course feedback Midterm plan Reading material/textbook/slides are updated. Computer Systems: A Programmer's Perspective, by Bryant and O'Hallaron Some reading

More information

Clickteam Fusion 2.5 Creating a Debug System - Guide

Clickteam Fusion 2.5 Creating a Debug System - Guide INTRODUCTION In this guide, we will look at how to create your own 'debug' system in Fusion 2.5. Sometimes when you're developing and testing a game, you want to see some of the real-time values of certain

More information

gc_vm Joe Valenzuela 6/15/07

gc_vm Joe Valenzuela 6/15/07 gc_vm Joe Valenzuela 6/15/07 A virtual memory system for NGC Will cover implementing paging on 32-bit PowerPC Use of Segment Registers to setup page mapping Maintenance of page table and gotchas Predates

More information

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018 CIS 3207 - Operating Systems Memory Management Cache and Demand Paging Professor Qiang Zeng Spring 2018 Process switch Upon process switch what is updated in order to assist address translation? Contiguous

More information

Supporting Class / C++ Lecture Notes

Supporting Class / C++ Lecture Notes Goal Supporting Class / C++ Lecture Notes You started with an understanding of how to write Java programs. This course is about explaining the path from Java to executing programs. We proceeded in a mostly

More information

Going to cover; - Why we have SPIR-V - Brief history of SPIR-V - Some of the core required features we wanted - How OpenCL will use SPIR-V - How

Going to cover; - Why we have SPIR-V - Brief history of SPIR-V - Some of the core required features we wanted - How OpenCL will use SPIR-V - How 1 Going to cover; - Why we have SPIR-V - Brief history of SPIR-V - Some of the core required features we wanted - How OpenCL will use SPIR-V - How Vulkan will use SPIR-V - The differences between compute/graphics

More information

Ratchet and Clank Future: Tools of Destruction Technical Debriefing. February 2008

Ratchet and Clank Future: Tools of Destruction Technical Debriefing. February 2008 Ratchet and Clank Future: Tools of Destruction Technical Debriefing February 2008 60Hz Tech Goals Visuals that appeal to casual and experienced players Look and feel of a CG movie Lots of action on screen

More information

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Chapter 1: Solving Integration Problems Using Patterns 2 Introduction The Need for Integration Integration Challenges

More information

Data Communication and Synchronization

Data Communication and Synchronization Software Development Kit for Multicore Acceleration Version 3.0 Data Communication and Synchronization for Cell Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8407-00 Software Development

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous

More information

Intro to Segmentation Fault Handling in Linux. By Khanh Ngo-Duy

Intro to Segmentation Fault Handling in Linux. By Khanh Ngo-Duy Intro to Segmentation Fault Handling in Linux By Khanh Ngo-Duy Khanhnd@elarion.com Seminar What is Segmentation Fault (Segfault) Examples and Screenshots Tips to get Segfault information What is Segmentation

More information

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18 PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations

More information

Most of the class will focus on if/else statements and the logical statements ("conditionals") that are used to build them. Then I'll go over a few

Most of the class will focus on if/else statements and the logical statements (conditionals) that are used to build them. Then I'll go over a few With notes! 1 Most of the class will focus on if/else statements and the logical statements ("conditionals") that are used to build them. Then I'll go over a few useful functions (some built into standard

More information

Programming Assignment Multi-Threading and Debugging 2

Programming Assignment Multi-Threading and Debugging 2 Programming Assignment Multi-Threading and Debugging 2 Due Date: Friday, June 1 @ 11:59 pm PAMT2 Assignment Overview The purpose of this mini-assignment is to continue your introduction to parallel programming

More information

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012! " (0xE0)" " " " (0x70)" " (0x50)"

Paging! 2/22! Anthony D. Joseph and Ion Stoica CS162 UCB Fall 2012!  (0xE0)    (0x70)  (0x50) CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 22, 2011! Anthony D. Joseph and Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! Segmentation! Paging! Recap Segmentation

More information

Extending CircuitPython: An Introduction

Extending CircuitPython: An Introduction Extending CircuitPython: An Introduction Created by Dave Astels Last updated on 2018-11-15 11:08:03 PM UTC Guide Contents Guide Contents Overview How-To A Simple Example shared-module shared-bindings ports/atmel-samd

More information

Maven 2.1 Artifact Resolution Specification

Maven 2.1 Artifact Resolution Specification Maven 2.1 Artifact Resolution Specification Notes to work out in later sections: Graph-based artifact resolution Decouple from Maven's core Binary graph that is pre-resolved for a POM Artifacts should

More information

When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.

When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to. Refresher When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to. i.e. char *ptr1 = malloc(1); ptr1 + 1; // adds 1 to pointer

More information

My malloc: mylloc and mhysa. Johan Montelius HT2016

My malloc: mylloc and mhysa. Johan Montelius HT2016 1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work

More information

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015 CIS 5512 - Operating Systems Memory Management Cache Professor Qiang Zeng Fall 2015 Previous class What is logical address? Who use it? Describes a location in the logical address space Compiler and CPU

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Managing Storage: Above the Hardware

Managing Storage: Above the Hardware Managing Storage: Above the Hardware 1 Where we are Last time: hardware HDDs and SSDs Today: how the DBMS uses the hardware to provide fast access to data 2 How DBMS manages storage "Bottom" two layers

More information

(Refer Slide Time: 01:25)

(Refer Slide Time: 01:25) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual

More information

igmobybspheres jonathan garrett 3/7/08

igmobybspheres jonathan garrett 3/7/08 igmobybspheres jonathan garrett 3/7/08 introduction mobys have a bounding-sphere (bsphere) rough approximation to the extents of the character moves with the character approximation used as a faster alternative

More information

05. SINGLETON PATTERN. One of a Kind Objects

05. SINGLETON PATTERN. One of a Kind Objects BIM492 DESIGN PATTERNS 05. SINGLETON PATTERN One of a Kind Objects Developer: What use is that? Guru: There are many objects we only need one of: thread pools, caches, dialog boxes, objects that handle

More information

CSE 142/143 Unofficial Commenting Guide Eric Arendt, Alyssa Harding, Melissa Winstanley

CSE 142/143 Unofficial Commenting Guide Eric Arendt, Alyssa Harding, Melissa Winstanley CSE 142/143 Unofficial Commenting Guide Eric Arendt, Alyssa Harding, Melissa Winstanley In Brief: What You Need to Know to Comment Methods in CSE 143 Audience o A random person you don t know who wants

More information

Here's how you declare a function that returns a pointer to a character:

Here's how you declare a function that returns a pointer to a character: 23 of 40 3/28/2013 10:35 PM Violets are blue Roses are red C has been around, But it is new to you! ANALYSIS: Lines 32 and 33 in main() prompt the user for the desired sort order. The value entered is

More information

Programming Style and Optimisations - An Overview

Programming Style and Optimisations - An Overview Programming Style and Optimisations - An Overview Summary In this lesson we introduce some of the style and optimization features you may find useful to understand as a C++ Programmer. Note however this

More information

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache. ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your

More information

CS/COE 1550

CS/COE 1550 CS/COE 1550 www.cs.pitt.edu/~nlf4/cs1550/ Virtual Memory What if a program is too big for memory? Ye olde solution: Overlays! Programmers split their programs up into overlays containing a subset of the

More information

FAQ: Crawling, indexing & ranking(google Webmaster Help)

FAQ: Crawling, indexing & ranking(google Webmaster Help) FAQ: Crawling, indexing & ranking(google Webmaster Help) #contact-google Q: How can I contact someone at Google about my site's performance? A: Our forum is the place to do it! Googlers regularly read

More information

File systems, databases, cloud storage

File systems, databases, cloud storage File systems, databases, cloud storage file: a sequence of bytes stored on a computer content is arbitrary (just bytes); any structure is imposed by the creator of the file, not by the operating system

More information

D3D12 & Vulkan Done Right. Gareth Thomas Developer Technology Engineer, AMD

D3D12 & Vulkan Done Right. Gareth Thomas Developer Technology Engineer, AMD D3D12 & Vulkan Done Right Gareth Thomas Developer Technology Engineer, AMD Agenda Barriers Copy Queue Resources Pipeline Shaders What is *not* in this talk Async compute Check out Async Compute: Deep Dive

More information

Ext3/4 file systems. Don Porter CSE 506

Ext3/4 file systems. Don Porter CSE 506 Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers

More information

Table of Contents. Questions or problems?

Table of Contents. Questions or problems? 1 Introduction Overview Setting Up Occluders Shadows and Occlusion LODs Creating LODs LOD Selection Optimization Basics Controlling the Hierarchy MultiThreading Multiple Active Culling Cameras Umbra Comparison

More information

MITOCW watch?v=zm5mw5nkzjg

MITOCW watch?v=zm5mw5nkzjg MITOCW watch?v=zm5mw5nkzjg The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Outline for Today. Euler Tour Trees Revisited. The Key Idea. Dynamic Graphs. Implementation Details. Dynamic connectivity in forests.

Outline for Today. Euler Tour Trees Revisited. The Key Idea. Dynamic Graphs. Implementation Details. Dynamic connectivity in forests. Dynamic Graphs Outline for Today Euler Tour Trees Revisited Dynamic connectivity in forests. The Key Idea Maintaining dynamic connectivity in general graphs Dynamic Graphs A data structure for dynamic

More information

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P.

Concurrent Programming with the Cell Processor. Dietmar Kühl Bloomberg L.P. Concurrent Programming with the Cell Processor Dietmar Kühl Bloomberg L.P. dietmar.kuehl@gmail.com Copyright Notice 2009 Bloomberg L.P. Permission is granted to copy, distribute, and display this material,

More information

Creating User-Friendly Exploits

Creating User-Friendly Exploits 1 Creating User-Friendly Exploits Skylar Rampersaud skylar@immunityinc.com Security Research 2 What is a User-Friendly Exploit? An exploit that causes no distress to the user of the exploited program i.e.,

More information

C++ for Java Programmers

C++ for Java Programmers Lecture 6 More pointing action Yesterday we considered: Pointer Assignment Dereferencing Pointers to Pointers to Pointers Pointers and Array Pointer Arithmetic 2 Todays Lecture What do we know 3 And now

More information

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017

GPU 101. Mike Bailey. Oregon State University. Oregon State University. Computer Graphics gpu101.pptx. mjb April 23, 2017 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA How Can You Gain Access to GPU Power? 3

More information

GPU 101. Mike Bailey. Oregon State University

GPU 101. Mike Bailey. Oregon State University 1 GPU 101 Mike Bailey mjb@cs.oregonstate.edu gpu101.pptx Why do we care about GPU Programming? A History of GPU Performance vs. CPU Performance 2 Source: NVIDIA 1 How Can You Gain Access to GPU Power?

More information

Instruction Case Vba Excel Between Two Dates

Instruction Case Vba Excel Between Two Dates Instruction Case Vba Excel Between Two Dates Countdown Timer Between Two Times (Not Dates) I've had serious issues with how VBA handles times and dates before in this manner, is there something. In some

More information

CS 140 Project 4 File Systems Review Session

CS 140 Project 4 File Systems Review Session CS 140 Project 4 File Systems Review Session Prachetaa Due Friday March, 14 Administrivia Course withdrawal deadline today (Feb 28 th ) 5 pm Project 3 due today (Feb 28 th ) Review section for Finals on

More information

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

PS2 out today. Lab 2 out today. Lab 1 due today - how was it? 6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your

More information

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 09. Memory Management Part 1 Paul Krzyzanowski Rutgers University Spring 2015 March 9, 2015 2014-2015 Paul Krzyzanowski 1 CPU Access to Memory The CPU reads instructions and reads/write

More information

Page 1. Review: Address Segmentation " Review: Address Segmentation " Review: Address Segmentation "

Page 1. Review: Address Segmentation  Review: Address Segmentation  Review: Address Segmentation Review Address Segmentation " CS162 Operating Systems and Systems Programming Lecture 10 Caches and TLBs" February 23, 2011! Ion Stoica! http//inst.eecs.berkeley.edu/~cs162! 1111 0000" 1110 000" Seg #"

More information

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory Management q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not an option

More information

A program execution is memory safe so long as memory access errors never occur:

A program execution is memory safe so long as memory access errors never occur: A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories

More information

CSE 374 Programming Concepts & Tools. Brandon Myers Winter 2015 Lecture 11 gdb and Debugging (Thanks to Hal Perkins)

CSE 374 Programming Concepts & Tools. Brandon Myers Winter 2015 Lecture 11 gdb and Debugging (Thanks to Hal Perkins) CSE 374 Programming Concepts & Tools Brandon Myers Winter 2015 Lecture 11 gdb and Debugging (Thanks to Hal Perkins) Hacker tool of the week (tags) Problem: I want to find the definition of a function or

More information

Azon Master Class. By Ryan Stevenson Guidebook #5 WordPress Usage

Azon Master Class. By Ryan Stevenson   Guidebook #5 WordPress Usage Azon Master Class By Ryan Stevenson https://ryanstevensonplugins.com/ Guidebook #5 WordPress Usage Table of Contents 1. Widget Setup & Usage 2. WordPress Menu System 3. Categories, Posts & Tags 4. WordPress

More information

Learning to Program with Haiku

Learning to Program with Haiku Learning to Program with Haiku Lesson 21 Written by DarkWyrm All material 2010 DarkWyrm All of the projects that we have been working on have been small ones which didn't take very much time. Depending

More information

Using X-Particles with Team Render

Using X-Particles with Team Render Using X-Particles with Team Render Some users have experienced difficulty in using X-Particles with Team Render, so we have prepared this guide to using them together. Caching Using Team Render to Picture

More information

Contents. What's New. Upcoming new version. Newsletter #43 (Aug 6, 2017) A couple quick reminders:

Contents. What's New. Upcoming new version. Newsletter #43 (Aug 6, 2017) A couple quick reminders: Campground Master Newsletter #43 (Aug 6, 2017) 1 Newsletter #43 (Aug 6, 2017) Contents A couple quick reminders: Make Backups! It's so sad when we hear from someone whose computer has crashed and they

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

A Quick Introduction to IFF

A Quick Introduction to IFF A Quick Introduction to IFF Jerry Morrison, Electronic Arts 10-17-88 IFF is the Amiga-standard "Interchange File Format", designed to work across many machines. Why IFF? Did you ever have this happen to

More information

MITOCW watch?v=flgjisf3l78

MITOCW watch?v=flgjisf3l78 MITOCW watch?v=flgjisf3l78 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Lecture 10: Crash Recovery, Logging

Lecture 10: Crash Recovery, Logging 6.828 2011 Lecture 10: Crash Recovery, Logging what is crash recovery? you're writing the file system then the power fails you reboot is your file system still useable? the main problem: crash during multi-step

More information

Complex Lab Operating Systems 2015/16 Winter Term. Sessions & Dynamic Memory

Complex Lab Operating Systems 2015/16 Winter Term. Sessions & Dynamic Memory Faculty of Computer Science Institute for System Architecture, Operating Systems Group Complex Lab Operating Systems 2015/16 Winter Term Sessions & Dynamic Memory 1 st Assignment General Coding Style use

More information

IP subnetting made easy

IP subnetting made easy Version 1.0 June 28, 2006 By George Ou Introduction IP subnetting is a fundamental subject that's critical for any IP network engineer to understand, yet students have traditionally had a difficult time

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses. 1 Memory Management Address Binding The normal procedures is to select one of the processes in the input queue and to load that process into memory. As the process executed, it accesses instructions and

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

OpenACC 2.6 Proposed Features

OpenACC 2.6 Proposed Features OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively

More information

Standards for Test Automation

Standards for Test Automation Standards for Test Automation Brian Tervo Windows XP Automation Applications Compatibility Test Lead Microsoft Corporation Overview Over the last five years, I ve had the opportunity to work in a group

More information

IRIX is moving in the n32 direction, and n32 is now the default, but the toolchain still supports o32. When we started supporting native mode o32 was

IRIX is moving in the n32 direction, and n32 is now the default, but the toolchain still supports o32. When we started supporting native mode o32 was Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2002 Handout 23 Running Under IRIX Thursday, October 3 IRIX sucks. This handout describes what

More information

Enhanced Debugging with Traces

Enhanced Debugging with Traces Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult

More information

15 Sharing Main Memory Segmentation and Paging

15 Sharing Main Memory Segmentation and Paging Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

AEM Code Promotion and Content Synchronization Best Practices

AEM Code Promotion and Content Synchronization Best Practices AEM Code Promotion and Content Synchronization Best Practices Ian Reasor, Technical Architect, Adobe Partner Experience Introduction When considering the movement of content through environments in an

More information

Multicore Strategies for Games. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

Multicore Strategies for Games. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology Multicore Strategies for Games Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology Bad multithreading Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Slide

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 23 Hierarchical Memory Organization (Contd.) Hello

More information

VMem. By Stewart Lynch.

VMem. By Stewart Lynch. VMem By Stewart Lynch. 1 Contents Introduction... 3 Overview... 4 Getting started... 6 Fragmentation... 7 Virtual Regions... 8 The FSA... 9 Biasing... 10 The Coalesce allocator... 11 Skewing indices...

More information

Streaming Massive Environments From Zero to 200MPH

Streaming Massive Environments From Zero to 200MPH FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I

More information

Last Class: Deadlocks. Where we are in the course

Last Class: Deadlocks. Where we are in the course Last Class: Deadlocks Necessary conditions for deadlock: Mutual exclusion Hold and wait No preemption Circular wait Ways of handling deadlock Deadlock detection and recovery Deadlock prevention Deadlock

More information

The Stack, Free Store, and Global Namespace

The Stack, Free Store, and Global Namespace Pointers This tutorial is my attempt at clarifying pointers for anyone still confused about them. Pointers are notoriously hard to grasp, so I thought I'd take a shot at explaining them. The more information

More information