Using Software Logging To Support Multi-Version Buffering in Thread-Level Speculation

Size: px
Start display at page:

Download "Using Software Logging To Support Multi-Version Buffering in Thread-Level Speculation"

Transcription

1 Using Software Logging To Support Multi-Version Buffering in Thread-Level Spulation María Jesús Garzarán, M. Prvulovic, V. Viñals, J. M. Llabería *, L. Rauchwerger ψ, and J. Torrellas U. of Illinois at Urbana-Champaign U. of Zaragoza, Spain * U.P.C., Spain ψ Texas A&M University

2 Thread-Level Spulation (TLS) Exute potentially-dependent tasks in parallel Assume no cross-task dependence will be violated Track memory accesses; buffer unsafe state Dett any dependence violation Squash offending tasks, repair polluted state, restart tasks Example time for (i=0;i<n;i++){ = A[B[i]] A[C[i]] = } Task J = A[4] A[5] = Task J+1 = A[2] RAW A[2] = Task J+2 = A[5] A[6] = 2

3 State Buffering in TLS Spulative tasks generate spulative memory state Must buffer and manage this spulative state 3

4 Approaches to Buffering [HPCA-03] Archittural Main Memory (AMM): Spulative task state kept in caches Main memory only keeps safe program state Cached state merges with memory when a task commits Future Main Memory (FMM): Spulative task state can merge with memory at any time Main memory keeps the future state (unsafe) Previous state is saved into Undo Log 4

5 Our Contributions Software-only design for undo-log system in FMM Simplify the hardware implementation ti Very cost-efftive approach On average, it only introduces a 10% slow-down vs hardwareonly 5

6 Roadmap Taxonomy of Buffering: AMM vs FMM Hardware Support Software Support Evaluation Conclusions 6

7 Task Exution under TLS Non Sp Non Sp Sp Non Sp Sp Processor time task 1 Commit task 2 task 3 Token task 4 task 5 Commit Token 7

8 Archittural Main Memory [HPCA03] Tasks Caches Non Sp Main memory Archittural state Main memory keeps archittural or safe state Caches keep spulative state 8

9 Future Main Memory [HPCA03] Tasks Log 6 Caches Main memory Future state Main memory keeps future state Logs keep previous state 9

10 Future Main Memory value address Task i writes 2 to 0x400 Task i+j writes 10 to 0x400 Archittural MM Task ID Tag Data i 0x400 2 i+j 0x Task ID Tag Data i 0x400 2 Future MM Cache Cache 10

11 Future Main Memory value address Task i writes 2 to 0x400 Task i+j writes 10 to 0x400 Archittural MM Future MM Producer Task ID Tag Data i 0x400 2 i+j 0x Cache Task ID Tag Data i+j 0x Cache Task ID Overwriting Task ID Tag Data i+j i 0x400 2 Log (in cache or memory) Perf Cost Faster commit but slower version rovery Need log 11

12 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 12

13 Hardware Supports Existing TLS protocol Add hooks to support Software Logging 13

14 TLS protocol [Zhang99] Processor Cache RW X Network Local Memory MaxR MaxW X Task IDs 14

15 Hooks to Support Software Logging 1) Make Task ID pages visible to the software Cache Processor RW X Task ID Network Local Memory 2) Cache Task IDs MaxR MaxW X Task IDs Software Log Producer Task ID Data 15

16 Accessing Task IDs in Software Fixed offset between mapping of data pages and corresponding Task ID pages Memory Data pages Task ID pages pg Fixed offset Use two different loads: ld variable ld_tid variable 16

17 Accessing Data Virtual page Physical DataPage TLB ld variable To cache (data access) 17

18 Accessing Task IDs Virtual page Physical DataPage TLB Fixed offset ld_tid variable To cache (task ID access) 18

19 Caching Task IDs Bring Task ID to cache with ld_tid instruction as regular data Task IDs can be reused from the cache Task kids in memory are updated din hardware r by the TLS protocol To keep cached Task IDs up-to-date in software: st_tid instruction 19

20 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 20

21 Software Logs A compiler instruments the application: Insert entry in log: before a store operation, add instructions to save previous value, address and Task ID in log Rycle log entries: free up the log created when a task commits Interrupt handler: Rovery : In case of a o-o-o RAW and squash Undo the modifications using data from log 21

22 Software Data Structures Logs are allocated locally before spulation starts Task Pointer Table Log Buffer Valid Task ID End Next Task ID Vaddr i j Value 22

23 Instructions to insert entry in log addu r4, r3, offset ; address of the variable sw r4, 0(r2) ; store in the log ld_tid r4, offset(r3) ; load Task ID Logging sw r4, 4(r2) ; store in the log instr lw r4, offset(r3) ; load value of variable sw r4, 8(r2) ; store in the log addu r2, r2, log_rord_size st_tid ttid r5, offset t( (r3) ; store Task kid sw r5, offset(r3) 23

24 Reducing Overheads Only first-stores in the task need to create a log entry Solution: At run-time, chk if store is first-store Use cached Task ID to filter: If (Cached Task ID == Current Task ID) Skip Logging Else Log entry 24

25 Filtering First Spulative Store Logging instr no_insert: ld_tid r6, offset (r3) ; load Task ID beq r6, r5, no_insert ; first store? addu r4, r3, offset ; insert as usual sw r4, 0(r2)... addu r2, r2, log_rord_size st_tid r5, offset (r3) ; store Task ID sw r5, offset (r3) 25

26 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 26

27 Evaluation Environment Exution-driven simulator Multiprocessor with 16 processors 4 issue o-o-o superscalar processor + 2 levels of cache Compare against FMM with Hardware Logging [Zhang99] Advanced AMM system [HPCA03] 27

28 Applications Numerical applications: Apsi (Spfp2000) Dsmc3d and Euler (HPF-2) P3m (NCSA) Tree (Univ. of Hawaii) Track (Perft) The non-analyzable loops account on average for 61% of the serial exution time - Non-analyzable loops are identified with the Polaris parallelizing compiler - Speed-ups shown for the non-analyzable loops only 28

29 Exution Time Comparison Exut tion Time P3m Tree Apsi Track Dsmc3d Euler Average HW SW HW SW HW SW HW SW HW SW HW SW HW SW 27 FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM On average, FMM Sw only introduces a 10% slow-down over FMM Hw On average, FMM Sw is similar to the advanced AMM 29

30 First Store Filtering P3m Tree Apsi Track Dsmc3d Euler FMM.Sw NoFilter FMM.Sw NoFilter FMM.Sw NoFilter 10 FMM.Sw NoFilter NoFilter NoFilter FMM.Sw FMM.Sw Exution Time Significant impact in the exution time in Apsi Since filtering does not hurt, we rommend using it 30

31 Other Results in Paper Studied design tradeoffs: Log accesses bypass/do not bypass L1 cache Log space is / is not rycled Do not afft the performance of FMM.Sw when filtering is used 31

32 Conclusions FMM.Sw is a cost-efftive solution: Simplified design relative to FMM.Hw Introduce low exution overhead (10% over FMM.Hw) Filtering first stores is beneficial 32

33 Using Software Logging To Support Multi-Version Buffering in Thread-Level Spulation María Jesús Garzarán, M. Prvulovic, V. Viñals, J. M. Llabería *, L. Rauchwerger ψ, and J. Torrellas U. of Illinois at Urbana-Champaign U. of Zaragoza, Spain * U.P.C., Spain ψ Texas A&M University

34 FMM.Sw versus Advanced AMM Advanced AMM Needs Version Combining i Logic Collt all the versions that are committed Selt the youngest one and invalidate the others FMM.Sw Needs Task ID in main memory Comparator to pick up the youngest version 34

35 Problem: Address tak IDs in software TLB Non existent Physical addres Cache Data & ld/st bits Cache tagged with Non-existent physical addr Local Memory task ID Add a fixed offset 35

36 Problem: Address tak IDs in software TLB Non existent Physical addres Cache Data & ld/st bits NIC Physical addrs local shared Local Memory task ID Network Shared Data 36

37 Problem: Address time stamp in software OS Allocates a page of task IDs Map the virtual address to a non existent physical page TLB Cache Data & ld/st bits Non existent Physical addres NIC Physical addrs local shared Local Memory taskid Network Shared Data 37

38 Spulative protocol Cache: load and store bits per word in cache Local Memory: task ID per word ISA: new ld/st instructions 38

39 Problem: Address task IDs in Software The task ID is not mapped in virtual space How to make visible the task ID to the sw? Logging inst load r3, addr_task ID? Undo Log Vaddr task ID Value sw r5, offset(r3) 39

40 Problem: Address task IDs in Software The task ID is not mapped in virtual space How to make visible the task ID to the sw? Use spial instruction lh_tid Logging inst lh_tid r3, offset(r3) Undo Log Vaddr task ID Value sw r5, offset(r3) 40

41 Accessing Task IDs in Software Option 2) Map to a non-existent physical address Virtual page Physical Non existent DataPage Fixed offset lh_tid r3, offset(r3) To cache (task ID access) 41

42 Reducing Overheads Only first-stores in the task need to create a log entry Solution: At run-time, chk if store is first-store Use cached Task ID to filter: If Others Instrumented First stores Spulative Non spulative 42

43 Filtering first spulative store Using extended loads load store tag data 0 1 xlw r6, r1, offset (r3) ; Store bit goes to r6 bgtz r6, no_insert ; first store? addu r4, r3, offset ; insert as usual sw r4, 0(r2) Logging lh_ts r4, offset(r3) instr addu r2, r2, log_rord_size sh_tid r5, offset (r3) no_insert: sw r5, offset(r3) 43

44 Software handlers Rovery : Out-of order RAW Undo the modifications i using data from log Retrieval : Some in-order RAWs The exposed load needs dig version from log 44

45 P3m Tree Apsi Track Dsmc3d Euler Exution Time By. RBy.R By.NoR NoBy.R By. R NoBy.NoR By.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR c Filter Filter Filter All Filter Filter Filter Filter 45

46 Accessing Task IDs in Software Naïve solution: Add extra field in TLB Virtual Page Physical Data Page Physical Task ID Page ld var To cache (data access) 46

47 Accessing Task IDs in Software Naïve solution: Add extra field in TLB Virtual Page Physical Data Page Physical Task ID Page ld_tid var To cache (task ID access) 47

48 Accessing Task IDs in Software Fixed offset between mapping of data pages and corresponding Task ID pages No TLB modifications In ld_tid instruction, the hardware subtracts the offset to obtain the Physical Task ID page 48

Software Logging under Speculative Parallelization

Software Logging under Speculative Parallelization Software Logging under Speculative Parallelization María Jesús Garzarán, Milos Prvulovic y,josé María Llabería z, Víctor Viñals, Lawrence Rauchwerger x, and Josep Torrellas y Universidad de Zaragoza, Spain

More information

Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors

Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors MARíA JESÚS GARZARÁN University of Illinois at Urbana-Champaign MILOS PRVULOVIC Georgia Institute of Technology

More information

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors Λ

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors Λ Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors Λ María Jesús Garzarán, Milos Prvulovic y,josémaría Llabería z, Víctor Viñals, Lawrence Rauchwerger x, and Josep Torrellas

More information

POSH: A TLS Compiler that Exploits Program Structure

POSH: A TLS Compiler that Exploits Program Structure POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow

More information

Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization

Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization Appears in the Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA-28), June 200. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization

More information

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors Marcelo Cintra and Josep Torrellas University of Edinburgh http://www.dcs.ed.ac.uk/home/mc

More information

CS533: Speculative Parallelization (Thread-Level Speculation)

CS533: Speculative Parallelization (Thread-Level Speculation) CS533: Speculative Parallelization (Thread-Level Speculation) Josep Torrellas University of Illinois in Urbana-Champaign March 5, 2015 Josep Torrellas (UIUC) CS533: Lecture 14 March 5, 2015 1 / 21 Concepts

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Virtual Memory Review. Page faults. Paging system summary (so far)

Virtual Memory Review. Page faults. Paging system summary (so far) Lecture 22 (Wed 11/19/2008) Virtual Memory Review Lab #4 Software Simulation Due Fri Nov 21 at 5pm HW #3 Cache Simulator & code optimization Due Mon Nov 24 at 5pm More Virtual Memory 1 2 Paging system

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes

FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes Rishi Agarwal and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

15 Sharing Main Memory Segmentation and Paging

15 Sharing Main Memory Segmentation and Paging Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

Past: Making physical memory pretty

Past: Making physical memory pretty Past: Making physical memory pretty Physical memory: no protection limited size almost forces contiguous allocation sharing visible to program easy to share data gcc gcc emacs Virtual memory each program

More information

Dynamically Detecting and Tolerating IF-Condition Data Races

Dynamically Detecting and Tolerating IF-Condition Data Races Dynamically Detecting and Tolerating IF-Condition Data Races Shanxiang Qi (Google), Abdullah Muzahid (University of San Antonio), Wonsun Ahn, Josep Torrellas University of Illinois at Urbana-Champaign

More information

The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas

The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas The Design Complexity of Program Undo Support in a General Purpose Processor Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Processor with program

More information

DeAliaser: Alias Speculation Using Atomic Region Support

DeAliaser: Alias Speculation Using Atomic Region Support DeAliaser: Alias Speculation Using Atomic Region Support Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign http://iacoma.cs.illinois.edu Memory Aliasing Prevents Good

More information

Spring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic

Spring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control

More information

Translation Buffers (TLB s)

Translation Buffers (TLB s) Translation Buffers (TLB s) To perform virtual to physical address translation we need to look-up a page table Since page table is in memory, need to access memory Much too time consuming; 20 cycles or

More information

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra

More information

Virtual Memory. Virtual Memory

Virtual Memory. Virtual Memory Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical

More information

CSE 120 Principles of Operating Systems Spring 2017

CSE 120 Principles of Operating Systems Spring 2017 CSE 120 Principles of Operating Systems Spring 2017 Lecture 12: Paging Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient translations (TLBs)

More information

Pipelined processors and Hazards

Pipelined processors and Hazards Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

ShortCut: Architectural Support for Fast Object Access in Scripting Languages

ShortCut: Architectural Support for Fast Object Access in Scripting Languages Jiho Choi, Thomas Shull, Maria J. Garzaran, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu ISCA 2017 Overheads of Scripting Languages

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

DAT (cont d) Assume a page size of 256 bytes. physical addresses. Note: Virtual address (page #) is not stored, but is used as an index into the table

DAT (cont d) Assume a page size of 256 bytes. physical addresses. Note: Virtual address (page #) is not stored, but is used as an index into the table Assume a page size of 256 bytes 5 Page table size (determined by size of program) 1 1 0 1 0 0200 00 420B 00 xxxxxx 1183 00 xxxxxx physical addresses Residency Bit (0 page frame is empty) Note: Virtual

More information

Tradeoffs in Transactional Memory Virtualization

Tradeoffs in Transactional Memory Virtualization Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcement Lab1 is released Start early we only have limited computing

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

How to create a process? What does process look like?

How to create a process? What does process look like? How to create a process? On Unix systems, executable read by loader Compile time runtime Ken Birman ld loader Cache Compiler: generates one object file per source file Linker: combines all object files

More information

Operating Systems. Operating Systems Sina Meraji U of T

Operating Systems. Operating Systems Sina Meraji U of T Operating Systems Operating Systems Sina Meraji U of T Recap Last time we looked at memory management techniques Fixed partitioning Dynamic partitioning Paging Example Address Translation Suppose addresses

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,

Chapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007, Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors María Jesús Garzarán, Milos Prvulovic y, Ye Zhang y, Alin Jula z, Hao Yu z, Lawrence Rauchwerger z, and Josep Torrellas

More information

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Midterm #2 Solutions April 23, 1997

Midterm #2 Solutions April 23, 1997 CS152 Computer Architecture and Engineering Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Sp97 D.K. Jeong Midterm #2 Solutions

More information

Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi

Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi 18-547 Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi Electrical & Computer Engineering Carnegie Mellon University Slides developed by Profs. Hill, Wood, Sohi, and Smith of University of Wisconsin

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as

More information

The Bulk Multicore Architecture for Programmability

The Bulk Multicore Architecture for Programmability The Bulk Multicore Architecture for Programmability Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Acknowledgments Key contributors: Luis Ceze Calin

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

Exploiting Idle Floating-Point Resources for Integer Execution

Exploiting Idle Floating-Point Resources for Integer Execution Exploiting Idle Floating-Point Resources for Integer Execution, Subbarao Palacharla, James E. Smith University of Wisconsin, Madison Motivation Partitioned integer and floating-point resources on current

More information

Multi-level Page Tables & Paging+ segmentation combined

Multi-level Page Tables & Paging+ segmentation combined Multi-level Page Tables & Paging+ segmentation combined Basic idea: use two levels of mapping to make tables manageable o Each segment contains one or more pages Segments correspond to logical units: code,

More information

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality

Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

VIRTUAL MEMORY II. Jo, Heeseung

VIRTUAL MEMORY II. Jo, Heeseung VIRTUAL MEMORY II Jo, Heeseung TODAY'S TOPICS How to reduce the size of page tables? How to reduce the time for address translation? 2 PAGE TABLES Space overhead of page tables The size of the page table

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 10: Paging Geoffrey M. Voelker Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient

More information

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 17: Paging Lecture Overview Recap: Today: Goal of virtual memory management: map 2^32 byte address space to physical memory Internal fragmentation

More information

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2018 Lecture 10: Virtual Memory II Ryan Huang Slides adapted from Geoff Voelker s lectures Administrivia Next Tuesday project hacking day No class My office

More information

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:

More information

Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012

Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 Reminder: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 5: Precise Exceptions Prof. Onur Mutlu Carnegie Mellon University Last Time Performance Metrics Amdahl s Law Single-cycle, multi-cycle machines Pipelining Stalls

More information

Basic Pipelining Concepts

Basic Pipelining Concepts Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:

More information

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

ROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev.

ROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev. Exam Review 2 1 ROB: head/tail PC log. reg prev. phys. store? except? ready? A R3 X3 no none yes old tail B R1 X1 no none yes tail C R1 X6 no none yes D R4 X4 no none yes E --- --- yes none yes F --- ---

More information

Topic 18: Virtual Memory

Topic 18: Virtual Memory Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following

More information

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.

More information

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions Robert Wagner cps 104 Int.1 RW Fall 2000 Interrupts Exceptions and Traps Interrupts, Exceptions and Traps are asynchronous

More information

CS420: Operating Systems. Paging and Page Tables

CS420: Operating Systems. Paging and Page Tables Paging and Page Tables James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Paging Paging is a memory-management

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all

More information

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in

More information

19: I/O. Mark Handley. Direct Memory Access (DMA)

19: I/O. Mark Handley. Direct Memory Access (DMA) 19: I/O Mark Handley Direct Memory Access (DMA) 1 Interrupts Revisited Connections between devices and interrupt controller actually use interrupt lines on the bus rather than dedicated wires. Interrupts

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the

More information

Speculative Locks. Dept. of Computer Science

Speculative Locks. Dept. of Computer Science Speculative Locks José éf. Martínez and djosep Torrellas Dept. of Computer Science University it of Illinois i at Urbana-Champaign Motivation Lock granularity a trade-off: Fine grain greater concurrency

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Fall 2018 Baris Kasikci Slides by: Harsha V. Madhyastha Base and bounds Load each process into contiguous region of physical memory Prevent process from accessing

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 15: Caching: Demand Paged Virtual Memory 15.0 Main Points: Concept of paging to disk Replacement policies

More information

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem. CS3410 Spring 2014 Problem Set 2, due Saturday, April 19, 11:59 PM NetID: Name: 200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem. Problem 1 Data Hazards

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Operating Systems: Virtual Machines & Exceptions

Operating Systems: Virtual Machines & Exceptions Operating Systems: Machines & Exceptions Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L19-1 6.004 So Far: Single-User Machines Program Hardware ISA (e.g., RISC-V) Processor Memory

More information

Address spaces and memory management

Address spaces and memory management Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address

More information

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Memory Hierarchy Requirements. Three Advantages of Virtual Memory CS61C L12 Virtual (1) CS61CL : Machine Structures Lecture #12 Virtual 2009-08-03 Jeremy Huddleston Review!! Cache design choices: "! Size of cache: speed v. capacity "! size (i.e., cache aspect ratio)

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1 Memory Allocation Copyright : University of Illinois CS 241 Staff 1 Recap: Virtual Addresses A virtual address is a memory address that a process uses to access its own memory Virtual address actual physical

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Virtual Memory (1) Page Tables When and why do we access a page table? - On every instruction to translate virtual to physical addresses? Page Tables When and why

More information