Using Software Logging To Support Multi-Version Buffering in Thread-Level Speculation
|
|
- Leslie West
- 6 years ago
- Views:
Transcription
1 Using Software Logging To Support Multi-Version Buffering in Thread-Level Spulation María Jesús Garzarán, M. Prvulovic, V. Viñals, J. M. Llabería *, L. Rauchwerger ψ, and J. Torrellas U. of Illinois at Urbana-Champaign U. of Zaragoza, Spain * U.P.C., Spain ψ Texas A&M University
2 Thread-Level Spulation (TLS) Exute potentially-dependent tasks in parallel Assume no cross-task dependence will be violated Track memory accesses; buffer unsafe state Dett any dependence violation Squash offending tasks, repair polluted state, restart tasks Example time for (i=0;i<n;i++){ = A[B[i]] A[C[i]] = } Task J = A[4] A[5] = Task J+1 = A[2] RAW A[2] = Task J+2 = A[5] A[6] = 2
3 State Buffering in TLS Spulative tasks generate spulative memory state Must buffer and manage this spulative state 3
4 Approaches to Buffering [HPCA-03] Archittural Main Memory (AMM): Spulative task state kept in caches Main memory only keeps safe program state Cached state merges with memory when a task commits Future Main Memory (FMM): Spulative task state can merge with memory at any time Main memory keeps the future state (unsafe) Previous state is saved into Undo Log 4
5 Our Contributions Software-only design for undo-log system in FMM Simplify the hardware implementation ti Very cost-efftive approach On average, it only introduces a 10% slow-down vs hardwareonly 5
6 Roadmap Taxonomy of Buffering: AMM vs FMM Hardware Support Software Support Evaluation Conclusions 6
7 Task Exution under TLS Non Sp Non Sp Sp Non Sp Sp Processor time task 1 Commit task 2 task 3 Token task 4 task 5 Commit Token 7
8 Archittural Main Memory [HPCA03] Tasks Caches Non Sp Main memory Archittural state Main memory keeps archittural or safe state Caches keep spulative state 8
9 Future Main Memory [HPCA03] Tasks Log 6 Caches Main memory Future state Main memory keeps future state Logs keep previous state 9
10 Future Main Memory value address Task i writes 2 to 0x400 Task i+j writes 10 to 0x400 Archittural MM Task ID Tag Data i 0x400 2 i+j 0x Task ID Tag Data i 0x400 2 Future MM Cache Cache 10
11 Future Main Memory value address Task i writes 2 to 0x400 Task i+j writes 10 to 0x400 Archittural MM Future MM Producer Task ID Tag Data i 0x400 2 i+j 0x Cache Task ID Tag Data i+j 0x Cache Task ID Overwriting Task ID Tag Data i+j i 0x400 2 Log (in cache or memory) Perf Cost Faster commit but slower version rovery Need log 11
12 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 12
13 Hardware Supports Existing TLS protocol Add hooks to support Software Logging 13
14 TLS protocol [Zhang99] Processor Cache RW X Network Local Memory MaxR MaxW X Task IDs 14
15 Hooks to Support Software Logging 1) Make Task ID pages visible to the software Cache Processor RW X Task ID Network Local Memory 2) Cache Task IDs MaxR MaxW X Task IDs Software Log Producer Task ID Data 15
16 Accessing Task IDs in Software Fixed offset between mapping of data pages and corresponding Task ID pages Memory Data pages Task ID pages pg Fixed offset Use two different loads: ld variable ld_tid variable 16
17 Accessing Data Virtual page Physical DataPage TLB ld variable To cache (data access) 17
18 Accessing Task IDs Virtual page Physical DataPage TLB Fixed offset ld_tid variable To cache (task ID access) 18
19 Caching Task IDs Bring Task ID to cache with ld_tid instruction as regular data Task IDs can be reused from the cache Task kids in memory are updated din hardware r by the TLS protocol To keep cached Task IDs up-to-date in software: st_tid instruction 19
20 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 20
21 Software Logs A compiler instruments the application: Insert entry in log: before a store operation, add instructions to save previous value, address and Task ID in log Rycle log entries: free up the log created when a task commits Interrupt handler: Rovery : In case of a o-o-o RAW and squash Undo the modifications using data from log 21
22 Software Data Structures Logs are allocated locally before spulation starts Task Pointer Table Log Buffer Valid Task ID End Next Task ID Vaddr i j Value 22
23 Instructions to insert entry in log addu r4, r3, offset ; address of the variable sw r4, 0(r2) ; store in the log ld_tid r4, offset(r3) ; load Task ID Logging sw r4, 4(r2) ; store in the log instr lw r4, offset(r3) ; load value of variable sw r4, 8(r2) ; store in the log addu r2, r2, log_rord_size st_tid ttid r5, offset t( (r3) ; store Task kid sw r5, offset(r3) 23
24 Reducing Overheads Only first-stores in the task need to create a log entry Solution: At run-time, chk if store is first-store Use cached Task ID to filter: If (Cached Task ID == Current Task ID) Skip Logging Else Log entry 24
25 Filtering First Spulative Store Logging instr no_insert: ld_tid r6, offset (r3) ; load Task ID beq r6, r5, no_insert ; first store? addu r4, r3, offset ; insert as usual sw r4, 0(r2)... addu r2, r2, log_rord_size st_tid r5, offset (r3) ; store Task ID sw r5, offset (r3) 25
26 Roadmap Taxonomy of Buffering Hardware Support Software Support Evaluation Conclusions 26
27 Evaluation Environment Exution-driven simulator Multiprocessor with 16 processors 4 issue o-o-o superscalar processor + 2 levels of cache Compare against FMM with Hardware Logging [Zhang99] Advanced AMM system [HPCA03] 27
28 Applications Numerical applications: Apsi (Spfp2000) Dsmc3d and Euler (HPF-2) P3m (NCSA) Tree (Univ. of Hawaii) Track (Perft) The non-analyzable loops account on average for 61% of the serial exution time - Non-analyzable loops are identified with the Polaris parallelizing compiler - Speed-ups shown for the non-analyzable loops only 28
29 Exution Time Comparison Exut tion Time P3m Tree Apsi Track Dsmc3d Euler Average HW SW HW SW HW SW HW SW HW SW HW SW HW SW 27 FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM FMM AMM On average, FMM Sw only introduces a 10% slow-down over FMM Hw On average, FMM Sw is similar to the advanced AMM 29
30 First Store Filtering P3m Tree Apsi Track Dsmc3d Euler FMM.Sw NoFilter FMM.Sw NoFilter FMM.Sw NoFilter 10 FMM.Sw NoFilter NoFilter NoFilter FMM.Sw FMM.Sw Exution Time Significant impact in the exution time in Apsi Since filtering does not hurt, we rommend using it 30
31 Other Results in Paper Studied design tradeoffs: Log accesses bypass/do not bypass L1 cache Log space is / is not rycled Do not afft the performance of FMM.Sw when filtering is used 31
32 Conclusions FMM.Sw is a cost-efftive solution: Simplified design relative to FMM.Hw Introduce low exution overhead (10% over FMM.Hw) Filtering first stores is beneficial 32
33 Using Software Logging To Support Multi-Version Buffering in Thread-Level Spulation María Jesús Garzarán, M. Prvulovic, V. Viñals, J. M. Llabería *, L. Rauchwerger ψ, and J. Torrellas U. of Illinois at Urbana-Champaign U. of Zaragoza, Spain * U.P.C., Spain ψ Texas A&M University
34 FMM.Sw versus Advanced AMM Advanced AMM Needs Version Combining i Logic Collt all the versions that are committed Selt the youngest one and invalidate the others FMM.Sw Needs Task ID in main memory Comparator to pick up the youngest version 34
35 Problem: Address tak IDs in software TLB Non existent Physical addres Cache Data & ld/st bits Cache tagged with Non-existent physical addr Local Memory task ID Add a fixed offset 35
36 Problem: Address tak IDs in software TLB Non existent Physical addres Cache Data & ld/st bits NIC Physical addrs local shared Local Memory task ID Network Shared Data 36
37 Problem: Address time stamp in software OS Allocates a page of task IDs Map the virtual address to a non existent physical page TLB Cache Data & ld/st bits Non existent Physical addres NIC Physical addrs local shared Local Memory taskid Network Shared Data 37
38 Spulative protocol Cache: load and store bits per word in cache Local Memory: task ID per word ISA: new ld/st instructions 38
39 Problem: Address task IDs in Software The task ID is not mapped in virtual space How to make visible the task ID to the sw? Logging inst load r3, addr_task ID? Undo Log Vaddr task ID Value sw r5, offset(r3) 39
40 Problem: Address task IDs in Software The task ID is not mapped in virtual space How to make visible the task ID to the sw? Use spial instruction lh_tid Logging inst lh_tid r3, offset(r3) Undo Log Vaddr task ID Value sw r5, offset(r3) 40
41 Accessing Task IDs in Software Option 2) Map to a non-existent physical address Virtual page Physical Non existent DataPage Fixed offset lh_tid r3, offset(r3) To cache (task ID access) 41
42 Reducing Overheads Only first-stores in the task need to create a log entry Solution: At run-time, chk if store is first-store Use cached Task ID to filter: If Others Instrumented First stores Spulative Non spulative 42
43 Filtering first spulative store Using extended loads load store tag data 0 1 xlw r6, r1, offset (r3) ; Store bit goes to r6 bgtz r6, no_insert ; first store? addu r4, r3, offset ; insert as usual sw r4, 0(r2) Logging lh_ts r4, offset(r3) instr addu r2, r2, log_rord_size sh_tid r5, offset (r3) no_insert: sw r5, offset(r3) 43
44 Software handlers Rovery : Out-of order RAW Undo the modifications i using data from log Retrieval : Some in-order RAWs The exposed load needs dig version from log 44
45 P3m Tree Apsi Track Dsmc3d Euler Exution Time By. RBy.R By.NoR NoBy.R By. R NoBy.NoR By.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR By.R By.NoR NoBy.R NoBy.NoR c Filter Filter Filter All Filter Filter Filter Filter 45
46 Accessing Task IDs in Software Naïve solution: Add extra field in TLB Virtual Page Physical Data Page Physical Task ID Page ld var To cache (data access) 46
47 Accessing Task IDs in Software Naïve solution: Add extra field in TLB Virtual Page Physical Data Page Physical Task ID Page ld_tid var To cache (task ID access) 47
48 Accessing Task IDs in Software Fixed offset between mapping of data pages and corresponding Task ID pages No TLB modifications In ld_tid instruction, the hardware subtracts the offset to obtain the Physical Task ID page 48
Software Logging under Speculative Parallelization
Software Logging under Speculative Parallelization María Jesús Garzarán, Milos Prvulovic y,josé María Llabería z, Víctor Viñals, Lawrence Rauchwerger x, and Josep Torrellas y Universidad de Zaragoza, Spain
More informationTradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors
Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors MARíA JESÚS GARZARÁN University of Illinois at Urbana-Champaign MILOS PRVULOVIC Georgia Institute of Technology
More informationTradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors Λ
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors Λ María Jesús Garzarán, Milos Prvulovic y,josémaría Llabería z, Víctor Viñals, Lawrence Rauchwerger x, and Josep Torrellas
More informationPOSH: A TLS Compiler that Exploits Program Structure
POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign
More informationSpeculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois
Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow
More informationRemoving Architectural Bottlenecks to the Scalability of Speculative Parallelization
Appears in the Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA-28), June 200. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization
More informationEliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors Marcelo Cintra and Josep Torrellas University of Edinburgh http://www.dcs.ed.ac.uk/home/mc
More informationCS533: Speculative Parallelization (Thread-Level Speculation)
CS533: Speculative Parallelization (Thread-Level Speculation) Josep Torrellas University of Illinois in Urbana-Champaign March 5, 2015 Josep Torrellas (UIUC) CS533: Lecture 14 March 5, 2015 1 / 21 Concepts
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationVirtual Memory Review. Page faults. Paging system summary (so far)
Lecture 22 (Wed 11/19/2008) Virtual Memory Review Lab #4 Software Simulation Due Fri Nov 21 at 5pm HW #3 Cache Simulator & code optimization Due Mon Nov 24 at 5pm More Virtual Memory 1 2 Paging system
More informationReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard
More informationFlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes
FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes Rishi Agarwal and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More information15 Sharing Main Memory Segmentation and Paging
Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More information16 Sharing Main Memory Segmentation and Paging
Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationPast: Making physical memory pretty
Past: Making physical memory pretty Physical memory: no protection limited size almost forces contiguous allocation sharing visible to program easy to share data gcc gcc emacs Virtual memory each program
More informationDynamically Detecting and Tolerating IF-Condition Data Races
Dynamically Detecting and Tolerating IF-Condition Data Races Shanxiang Qi (Google), Abdullah Muzahid (University of San Antonio), Wonsun Ahn, Josep Torrellas University of Illinois at Urbana-Champaign
More informationThe Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas
The Design Complexity of Program Undo Support in a General Purpose Processor Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Processor with program
More informationDeAliaser: Alias Speculation Using Atomic Region Support
DeAliaser: Alias Speculation Using Atomic Region Support Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign http://iacoma.cs.illinois.edu Memory Aliasing Prevents Good
More informationSpring 2010 Prof. Hyesoon Kim. Thanks to Prof. Loh & Prof. Prvulovic
Spring 2010 Prof. Hyesoon Kim Thanks to Prof. Loh & Prof. Prvulovic C/C++ program Compiler Assembly Code (binary) Processor 0010101010101011110 Memory MAR MDR INPUT Processing Unit OUTPUT ALU TEMP PC Control
More informationTranslation Buffers (TLB s)
Translation Buffers (TLB s) To perform virtual to physical address translation we need to look-up a page table Since page table is in memory, need to access memory Much too time consuming; 20 cycles or
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationVirtual Memory. Virtual Memory
Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical
More informationCSE 120 Principles of Operating Systems Spring 2017
CSE 120 Principles of Operating Systems Spring 2017 Lecture 12: Paging Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient translations (TLBs)
More informationPipelined processors and Hazards
Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationShortCut: Architectural Support for Fast Object Access in Scripting Languages
Jiho Choi, Thomas Shull, Maria J. Garzaran, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu ISCA 2017 Overheads of Scripting Languages
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationDAT (cont d) Assume a page size of 256 bytes. physical addresses. Note: Virtual address (page #) is not stored, but is used as an index into the table
Assume a page size of 256 bytes 5 Page table size (determined by size of program) 1 1 0 1 0 0200 00 420B 00 xxxxxx 1183 00 xxxxxx physical addresses Residency Bit (0 page frame is empty) Note: Virtual
More informationTradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationAnnouncement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory
ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcement Lab1 is released Start early we only have limited computing
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationSoftware-Controlled Multithreading Using Informing Memory Operations
Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University
More informationCache Performance (H&P 5.3; 5.5; 5.6)
Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st
More informationHow to create a process? What does process look like?
How to create a process? On Unix systems, executable read by loader Compile time runtime Ken Birman ld loader Cache Compiler: generates one object file per source file Linker: combines all object files
More informationOperating Systems. Operating Systems Sina Meraji U of T
Operating Systems Operating Systems Sina Meraji U of T Recap Last time we looked at memory management techniques Fixed partitioning Dynamic partitioning Paging Example Address Translation Suppose addresses
More informationMemory Management. Disclaimer: some slides are adopted from book authors slides with permission 1
Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk
More informationChapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,
Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationArchitectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors
Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors María Jesús Garzarán, Milos Prvulovic y, Ye Zhang y, Alin Jula z, Hao Yu z, Lawrence Rauchwerger z, and Josep Torrellas
More informationAddress Translation. Tore Larsen Material developed by: Kai Li, Princeton University
Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationMidterm #2 Solutions April 23, 1997
CS152 Computer Architecture and Engineering Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Sp97 D.K. Jeong Midterm #2 Solutions
More informationLecture 15 Pipelining & ILP Instructor: Prof. Falsafi
18-547 Lecture 15 Pipelining & ILP Instructor: Prof. Falsafi Electrical & Computer Engineering Carnegie Mellon University Slides developed by Profs. Hill, Wood, Sohi, and Smith of University of Wisconsin
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationMultithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others
Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as
More informationThe Bulk Multicore Architecture for Programmability
The Bulk Multicore Architecture for Programmability Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Acknowledgments Key contributors: Luis Ceze Calin
More informationCS 152 Computer Architecture and Engineering
CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term
More informationPrecise Exceptions and Out-of-Order Execution. Samira Khan
Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take
More informationExploiting Idle Floating-Point Resources for Integer Execution
Exploiting Idle Floating-Point Resources for Integer Execution, Subbarao Palacharla, James E. Smith University of Wisconsin, Madison Motivation Partitioned integer and floating-point resources on current
More informationMulti-level Page Tables & Paging+ segmentation combined
Multi-level Page Tables & Paging+ segmentation combined Basic idea: use two levels of mapping to make tables manageable o Each segment contains one or more pages Segments correspond to logical units: code,
More informationPick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality
Pick a time window size w. In time span w, are there, Multiple References, to nearby addresses: Spatial Locality Repeated References, to a set of locations: Temporal Locality Take advantage of behavior
More informationVirtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1
Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationVIRTUAL MEMORY II. Jo, Heeseung
VIRTUAL MEMORY II Jo, Heeseung TODAY'S TOPICS How to reduce the size of page tables? How to reduce the time for address translation? 2 PAGE TABLES Space overhead of page tables The size of the page table
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 10: Paging Geoffrey M. Voelker Lecture Overview Today we ll cover more paging mechanisms: Optimizations Managing page tables (space) Efficient
More informationAdvanced Caching Techniques (2) Department of Electrical Engineering Stanford University
Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics
More informationCS 153 Design of Operating Systems Winter 2016
CS 153 Design of Operating Systems Winter 2016 Lecture 17: Paging Lecture Overview Recap: Today: Goal of virtual memory management: map 2^32 byte address space to physical memory Internal fragmentation
More informationAddress Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Address Translation Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to reduce the size of page tables? How to reduce the time for
More informationLecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"
Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3
More informationCS 318 Principles of Operating Systems
CS 318 Principles of Operating Systems Fall 2018 Lecture 10: Virtual Memory II Ryan Huang Slides adapted from Geoff Voelker s lectures Administrivia Next Tuesday project hacking day No class My office
More informationCPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner
CPS104 Computer Organization and Programming Lecture 16: Virtual Memory Robert Wagner cps 104 VM.1 RW Fall 2000 Outline of Today s Lecture Virtual Memory. Paged virtual memory. Virtual to Physical translation:
More informationFall 2012 Parallel Computer Architecture Lecture 15: Speculation I. Prof. Onur Mutlu Carnegie Mellon University 10/10/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 15: Speculation I Prof. Onur Mutlu Carnegie Mellon University 10/10/2012 Reminder: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More information15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 5: Precise Exceptions Prof. Onur Mutlu Carnegie Mellon University Last Time Performance Metrics Amdahl s Law Single-cycle, multi-cycle machines Pipelining Stalls
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:
More informationFast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names
Fast access ===> use map to find object HW == SW ===> map is in HW or SW or combo Extend range ===> longer, hierarchical names How is map embodied: --- L1? --- Memory? The Environment ---- Long Latency
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev.
Exam Review 2 1 ROB: head/tail PC log. reg prev. phys. store? except? ready? A R3 X3 no none yes old tail B R1 X1 no none yes tail C R1 X6 no none yes D R4 X4 no none yes E --- --- yes none yes F --- ---
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationChapter 4 The Processor (Part 4)
Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More informationCPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt
CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions Robert Wagner cps 104 Int.1 RW Fall 2000 Interrupts Exceptions and Traps Interrupts, Exceptions and Traps are asynchronous
More informationCS420: Operating Systems. Paging and Page Tables
Paging and Page Tables James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Paging Paging is a memory-management
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all
More informationCSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management
CSE 120 July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware Cache to map virtual page numbers to page frame Associative memory: HW looks up in
More information19: I/O. Mark Handley. Direct Memory Access (DMA)
19: I/O Mark Handley Direct Memory Access (DMA) 1 Interrupts Revisited Connections between devices and interrupt controller actually use interrupt lines on the bus rather than dedicated wires. Interrupts
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationSpeculative Locks. Dept. of Computer Science
Speculative Locks José éf. Martínez and djosep Torrellas Dept. of Computer Science University it of Illinois i at Urbana-Champaign Motivation Lock granularity a trade-off: Fine grain greater concurrency
More informationSuggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining
More informationEECS 482 Introduction to Operating Systems
EECS 482 Introduction to Operating Systems Fall 2018 Baris Kasikci Slides by: Harsha V. Madhyastha Base and bounds Load each process into contiguous region of physical memory Prevent process from accessing
More informationCS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory
CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 15: Caching: Demand Paged Virtual Memory 15.0 Main Points: Concept of paging to disk Replacement policies
More information200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.
CS3410 Spring 2014 Problem Set 2, due Saturday, April 19, 11:59 PM NetID: Name: 200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem. Problem 1 Data Hazards
More informationCPU issues address (and data for write) Memory returns data (or acknowledgment for write)
The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives
More informationOperating Systems: Virtual Machines & Exceptions
Operating Systems: Machines & Exceptions Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L19-1 6.004 So Far: Single-User Machines Program Hardware ISA (e.g., RISC-V) Processor Memory
More informationAddress spaces and memory management
Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address
More informationMemory Hierarchy Requirements. Three Advantages of Virtual Memory
CS61C L12 Virtual (1) CS61CL : Machine Structures Lecture #12 Virtual 2009-08-03 Jeremy Huddleston Review!! Cache design choices: "! Size of cache: speed v. capacity "! size (i.e., cache aspect ratio)
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationMemory Allocation. Copyright : University of Illinois CS 241 Staff 1
Memory Allocation Copyright : University of Illinois CS 241 Staff 1 Recap: Virtual Addresses A virtual address is a memory address that a process uses to access its own memory Virtual address actual physical
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCS161 Design and Architecture of Computer Systems. Cache $$$$$
CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCS399 New Beginnings. Jonathan Walpole
CS399 New Beginnings Jonathan Walpole Virtual Memory (1) Page Tables When and why do we access a page table? - On every instruction to translate virtual to physical addresses? Page Tables When and why
More information