The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas

Size: px
Start display at page:

Download "The Design Complexity of Program Undo Support in a General Purpose Processor. Radu Teodorescu and Josep Torrellas"

Transcription

1 The Design Complexity of Program Undo Support in a General Purpose Processor Radu Teodorescu and Josep Torrellas University of Illinois at Urbana-Champaign

2 Processor with program undo support Safe Speculative Safe Code Speculative execution over large code sections Begin Spec Rollback/re-play of program execution Speculative code Some Code Very low overhead End Spec Speculation exposed to the software 2 Safe Code

3 Applications of program undo Safety net for speculating over: correctness of aggressive optimizations: thread level speculation, value prediction, speculative synchronization system reliability: software - primitive for software debugging 3

4 How complex is program undo support? Determined the hardware needed Implemented it in a simple processor Prototyped it using FPGA technology Estimated complexity using three metrics: Hardware overhead, development time, VHDL code size 4

5 Implementation of program undo support Save/restore processor state, buffer speculative data, control transitions: RF Safe Speculative CKPT CPU 1. Register checkpointing and restoration 2. Data cache that buffers speculative data Data Cache 3. Instructions enable/disable speculation on-the-fly Memory 5

6 Register checkpointing Beginning of the speculative section RF saved into a Shadow RF PC, state registers are saved End of speculative section Commit: discard checkpoint IDLE CHECKPOINT ROLLBACK RESTORE Rollback: restore RF & PC 6

7 Data cache extensions Holds both speculative and nonspeculative data Safe Speculative Speculative lines will not be evicted to memory Data Cache Cache walk state machine: Memory Commit: merge lines IDLE Rollback: invalidate speculative lines 7 WALK RESTORE

8 Software control Give the compiler control over speculative execution Added control instructions: Begin speculation End speculation (commit or rollback) We use SPARC s special access load LDA [r0] code, r1 8

9 Hardware prototype LEON2 - SPARC V8 compliant processor In-order, single issue, 5-stage pipeline Windowed register file L1 instruction and data caches Synthesizable, open source VHDL code Fully functional, runs Linux embedded 9

10 System deployment Processor Image J T A G C O M PCI I/O Terminal Binaries 10 Control App.

11 Evaluating design complexity Hardware overhead, development time, VHDL code size Major components: register checkpointing speculative cache software control Comparison: write-back cache controller 11

12 Hardware overhead Avg. 4.5% overhead 7000 software control speculative cache register checkpointing write back extensions baseline processor CLBs KB 8KB 16KB 32KB 64KB Data cache size 12

13 Development time Time (man-hours) Testing Implementation 300 Design WB Cache Controller Speculative Cache Register Checkpointing 13 Software Control

14 Lines of code % Lines of VHDL code Software control Speculative cache Register checkpointing Write back extensions Baseline unit % Data Cache Controller Pipeline 14

15 Conclusions Program undo support is reasonably easy to implement Complexity similar to adding write-back support to a write-through cache controller Qualifying factor: Relatively simple processor 15

16 Thank you! Short demo and questions... 16

Prototyping Architectural Support for Program Rollback Using FPGAs

Prototyping Architectural Support for Program Rollback Using FPGAs Prototyping Architectural Support for Program Rollback Using FPGAs Radu Teodorescu and Josep Torrellas http://iacoma.cs.uiuc.edu University of Illinois at Urbana-Champaign Motivation Problem: Software

More information

The Design Complexity of Program Undo Support in a General-Purpose Processor

The Design Complexity of Program Undo Support in a General-Purpose Processor The Design Complexity of Program Undo Support in a General-Purpose Processor Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Empowering Software Debugging Through Architectural Support for Program Rollback

Empowering Software Debugging Through Architectural Support for Program Rollback Empowering Software Debugging Through Architectural Support for Program Rollback Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

SWICH: A PROTOTYPE FOR EFFICIENT CACHE-LEVEL CHECKPOINTING AND ROLLBACK

SWICH: A PROTOTYPE FOR EFFICIENT CACHE-LEVEL CHECKPOINTING AND ROLLBACK SWICH: A PROTOTYPE FOR EFFICIENT CACHE-LEVEL CHECKPOINTING AND ROLLBACK EXISTING CACHE-LEVEL CHECKPOINTING SCHEMES DO NOT CONTINUOUSLY SUPPORT A LARGE ROLLBACK WINDOW. IMMEDIATELY AFTER A CHECKPOINT, THE

More information

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow

More information

SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback

SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback Radu Teodorescu, Jun Nakano, and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Abstract Low-overhead

More information

DeAliaser: Alias Speculation Using Atomic Region Support

DeAliaser: Alias Speculation Using Atomic Region Support DeAliaser: Alias Speculation Using Atomic Region Support Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign http://iacoma.cs.illinois.edu Memory Aliasing Prevents Good

More information

FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes

FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes Rishi Agarwal and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard

More information

POSH: A TLS Compiler that Exploits Program Structure

POSH: A TLS Compiler that Exploits Program Structure POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

CS533: Speculative Parallelization (Thread-Level Speculation)

CS533: Speculative Parallelization (Thread-Level Speculation) CS533: Speculative Parallelization (Thread-Level Speculation) Josep Torrellas University of Illinois in Urbana-Champaign March 5, 2015 Josep Torrellas (UIUC) CS533: Lecture 14 March 5, 2015 1 / 21 Concepts

More information

RAMP-White / FAST-MP

RAMP-White / FAST-MP RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White

More information

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization li for Multiprocessors Marcelo Cintra and Josep Torrellas University of Edinburgh http://www.dcs.ed.ac.uk/home/mc

More information

DeLorean: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently

DeLorean: Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently : Recording and Deterministically Replaying Shared Memory Multiprocessor Execution Efficiently, Luis Ceze* and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection Shan Lu, Pin Zhou, Wei Liu, Yuanyuan Zhou and Josep Torrellas Department of Computer Science, University of

More information

Speculative Locks. Dept. of Computer Science

Speculative Locks. Dept. of Computer Science Speculative Locks José éf. Martínez and djosep Torrellas Dept. of Computer Science University it of Illinois i at Urbana-Champaign Motivation Lock granularity a trade-off: Fine grain greater concurrency

More information

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra

More information

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Mark Willey, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Mark Willey, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun The Stanford Hydra CMP Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Mark Willey, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu

More information

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun The Stanford Hydra CMP Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu

More information

Code Reordering and Speculation Support for Dynamic Optimization Systems

Code Reordering and Speculation Support for Dynamic Optimization Systems Code Reordering and Speculation Support for Dynamic Optimization Systems Erik M. Nystrom, Ronald D. Barnes, Matthew C. Merten, Wen-mei W. Hwu Center for Reliable and High-Performance Computing University

More information

Dynamically Detecting and Tolerating IF-Condition Data Races

Dynamically Detecting and Tolerating IF-Condition Data Races Dynamically Detecting and Tolerating IF-Condition Data Races Shanxiang Qi (Google), Abdullah Muzahid (University of San Antonio), Wonsun Ahn, Josep Torrellas University of Illinois at Urbana-Champaign

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More

More information

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Attacks : Defending Against Cache-Based Side Channel Attacks Mengjia Yan, Bhargava Gopireddy, Thomas Shull, Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Presented by Mengjia

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

PageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge: A Near-Memory Content- Aware Page-Merging Architecture PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC) Lecture 8: Transactional Memory TCC Topics: lazy implementation (TCC) 1 Other Issues Nesting: when one transaction calls another flat nesting: collapse all nested transactions into one large transaction

More information

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University Lecture 16: Checkpointed Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 18-1 Announcements Reading for today: class notes Your main focus:

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

Thread-Level Speculation on a CMP Can Be Energy Efficient

Thread-Level Speculation on a CMP Can Be Energy Efficient Thread-Level Speculation on a CMP Can Be Energy Efficient Jose Renau Karin Strauss Luis Ceze Wei Liu Smruti Sarangi James Tuck Josep Torrellas ABSTRACT Dept. of Computer Engineering, University of California

More information

Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware

Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware Phoenix: Detecting and Recovering from Permanent Processor Design Bugs with Programmable Hardware Smruti R. Sarangi Abhishek Tiwari Josep Torrellas University of Illinois at Urbana-Champaign Can a Processor

More information

Improving Cloud Application Performance with Simulation-Guided CPU State Management

Improving Cloud Application Performance with Simulation-Guided CPU State Management Improving Cloud Application Performance with Simulation-Guided CPU State Management Mathias Gottschlag, Frank Bellosa April 23, 2017 KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - OPERATING SYSTEMS GROUP KIT

More information

Operating System Support for Shared-ISA Asymmetric Multi-core Architectures

Operating System Support for Shared-ISA Asymmetric Multi-core Architectures Operating System Support for Shared-ISA Asymmetric Multi-core Architectures Tong Li, Paul Brett, Barbara Hohlt, Rob Knauerhase, Sean McElderry, Scott Hahn Intel Corporation Contact: tong.n.li@intel.com

More information

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor 1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi Outline Motivation & Background Hardware DIFT

More information

LEON4: Fourth Generation of the LEON Processor

LEON4: Fourth Generation of the LEON Processor LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com

More information

Defining a High-Level Programming Model for Emerging NVRAM Technologies

Defining a High-Level Programming Model for Emerging NVRAM Technologies Defining a High-Level Programming Model for Emerging NVRAM Technologies Thomas Shull, Jian Huang, Josep Torrellas University of Illinois at Urbana-Champaign September 13, 2018 Shull et al. Defining a High-Level

More information

Multithreaded Value Prediction

Multithreaded Value Prediction Multithreaded Value Prediction N. Tuck and D.M. Tullesn HPCA-11 2005 CMPE 382/510 Review Presentation Peter Giese 30 November 2005 Outline Motivation Multithreaded & Value Prediction Architectures Single

More information

SPECULATIVE SYNCHRONIZATION: PROGRAMMABILITY AND PERFORMANCE FOR PARALLEL CODES

SPECULATIVE SYNCHRONIZATION: PROGRAMMABILITY AND PERFORMANCE FOR PARALLEL CODES SPECULATIVE SYNCHRONIZATION: PROGRAMMABILITY AND PERFORMANCE FOR PARALLEL CODES PROPER SYNCHRONIZATION IS VITAL TO ENSURING THAT PARALLEL APPLICATIONS EXECUTE CORRECTLY. A COMMON PRACTICE IS TO PLACE SYNCHRONIZATION

More information

Rebootless Kernel Updates

Rebootless Kernel Updates Rebootless Kernel Updates Srivatsa S. Bhat VMware srivatsa@csail.mit.edu University of Washington 3 Dec 2018 Why are reboots undesirable? Why are reboots undesirable? Remember this? J Why are reboots undesirable?

More information

Thread-Level Speculation on a CMP Can Be Energy Efficient

Thread-Level Speculation on a CMP Can Be Energy Efficient Thread-Level Speculation on a CMP Can Be Energy Efficient Jose Renau Karin Strauss Luis Ceze Wei Liu Smruti Sarangi James Tuck Josep Torrellas ABSTRACT Dept. of Computer Engineering, University of California

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era

More information

Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun

Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu A Chip Multiprocessor Implementation

More information

CMP Support for Large and Dependent Speculative Threads

CMP Support for Large and Dependent Speculative Threads TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 CMP Support for Large and Dependent Speculative Threads Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Member, IEEE, and Todd C. Mowry

More information

Spectre and Meltdown. Clifford Wolf q/talk

Spectre and Meltdown. Clifford Wolf q/talk Spectre and Meltdown Clifford Wolf q/talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative

More information

DeAliaser: Alias Speculation Using Atomic Region Support

DeAliaser: Alias Speculation Using Atomic Region Support Deliaser: lias Speculation Using tomic Region Support Wonsun hn Yuelu Duan Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu bstract lias analysis is a critical component

More information

A Chip-Multiprocessor Architecture with Speculative Multithreading

A Chip-Multiprocessor Architecture with Speculative Multithreading 866 IEEE TRANSACTIONS ON COMPUTERS, VOL. 48, NO. 9, SEPTEMBER 1999 A Chip-Multiprocessor Architecture with Speculative Multithreading Venkata Krishnan, Member, IEEE, and Josep Torrellas AbstractÐMuch emphasis

More information

Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors Λ

Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors Λ Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors Λ Marcelo Cintra, José F. Martínez, and Josep Torrellas Department of Computer Science University of Illinois

More information

Universal Parallel Computing Research Center at Illinois

Universal Parallel Computing Research Center at Illinois Universal Parallel Computing Research Center at Illinois Making parallel programming synonymous with programming Marc Snir 08-09 The UPCRC@ Illinois Team BACKGROUND 3 Moore s Law Pre 2004 Number of transistors

More information

The Bulk Multicore Architecture for Programmability

The Bulk Multicore Architecture for Programmability The Bulk Multicore Architecture for Programmability Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu Acknowledgments Key contributors: Luis Ceze Calin

More information

CAVA: Using Checkpoint-Assisted Value Prediction to Hide L2 Misses

CAVA: Using Checkpoint-Assisted Value Prediction to Hide L2 Misses CAVA: Using Checkpoint-Assisted Value Prediction to Hide L2 Misses Luis Ceze, Karin Strauss, James Tuck, Jose Renau and Josep Torrellas University of Illinois at Urbana-Champaign {luisceze, kstrauss, jtuck,

More information

Next Generation Multi-Purpose Microprocessor

Next Generation Multi-Purpose Microprocessor Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Bulk Disambiguation of Speculative Threads in Multiprocessors

Bulk Disambiguation of Speculative Threads in Multiprocessors Bulk Disambiguation of Speculative Threads in Multiprocessors Luis Ceze, James Tuck, Călin Caşcaval and Josep Torrellas University of Illinois at Urbana-Champaign {luisceze, jtuck, torrellas}@cs.uiuc.edu

More information

Capriccio : Scalable Threads for Internet Services

Capriccio : Scalable Threads for Internet Services Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate

More information

ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes

ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes Appears in the Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA-30), June 2003. ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded

More information

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help

More information

Actors in the Small. Making Actors more Useful. Bill La

Actors in the Small. Making Actors more Useful. Bill La Actors in the Small Making Actors more Useful Bill La Forge laforge49@gmail.com @laforge49 Actors in the Small I. Introduction II. Making Actors Fast III.Making Actors Easier to Program IV. Tutorial I.

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

Ultra Low-Cost Defect Protection for Microprocessor Pipelines

Ultra Low-Cost Defect Protection for Microprocessor Pipelines Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key

More information

RelaxReplay: Record and Replay for Relaxed-Consistency Multiprocessors

RelaxReplay: Record and Replay for Relaxed-Consistency Multiprocessors RelaxReplay: Record and Replay for Relaxed-Consistency Multiprocessors Nima Honarmand and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/ 1 RnR: Record and Deterministic

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect

More information

IA-64, P4 HT and Crusoe Architectures Ch 15

IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64, P4 HT and Crusoe Architectures Ch 15 IA-64 General Organization Predication, Speculation Software Pipelining Example: Itanium Pentium 4 HT Crusoe General Architecture Emulated Precise Exceptions

More information

Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor

Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip, Haakan

More information

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors Supporting Speculative Multithreading on Simultaneous Multithreaded Processors Venkatesan Packirisamy, Shengyue Wang, Antonia Zhai, Wei-Chung Hsu, and Pen-Chung Yew Department of Computer Science, University

More information

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points

More information

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 5: Precise Exceptions. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 5: Precise Exceptions Prof. Onur Mutlu Carnegie Mellon University Last Time Performance Metrics Amdahl s Law Single-cycle, multi-cycle machines Pipelining Stalls

More information

Shared Memory Consistency Models: A Tutorial

Shared Memory Consistency Models: A Tutorial Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The

More information

Stochastic Processors (or processors that do not always compute correctly by design)

Stochastic Processors (or processors that do not always compute correctly by design) Stochastic Processors (or processors that do not always compute correctly by design) Rakesh Kumar Department of Electrical and Computer Engineering University of Illinois, Urbana-Champaign Insisting on

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

NOW that the microprocessor industry has shifted its

NOW that the microprocessor industry has shifted its IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 18, NO. 8, AUGUST 2007 1 CMP Support for Large and Dependent Speculative Threads Christopher B. Colohan, Anastasia Ailamaki, Member, IEEE Computer

More information

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu

Light64: Ligh support for data ra. Darko Marinov, Josep Torrellas.   a.cs.uiuc.edu : Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

I/O Buffering and Streaming

I/O Buffering and Streaming I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks

More information

Care and Feeding of Oracle Rdb Hot Standby

Care and Feeding of Oracle Rdb Hot Standby Care and Feeding of Oracle Rdb Hot Standby Paul Mead / Norman Lastovica Oracle New England Development Center Copyright 2001, 2003 Oracle Corporation 2 Overview What Hot Standby provides Basic architecture

More information

Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory

Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory Rei Odaira Takuya Nakaike IBM Research Tokyo Thread-Level Speculation (TLS) [Franklin et al., 92] or Speculative Multithreading (SpMT)

More information

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam TokuDB internals Percona team, Vlad Lesin, Sveta Smirnova Slides plan Introduction in Fractal Trees and TokuDB Files Block files Fractal

More information

The DragonBeam Framework: Hardware-Protected Security Modules for In-Place Intrusion Detection

The DragonBeam Framework: Hardware-Protected Security Modules for In-Place Intrusion Detection : Hardware-Protected Security Modules for In-Place Intrusion Detection Man-Ki Yoon, Mihai Christodorescu, Lui Sha, Sibin Mohan University of Illinois at Urbana-Champaign Qualcomm Research Silicon Valley

More information

Deterministic Shared Memory Multiprocessing

Deterministic Shared Memory Multiprocessing Deterministic Shared Memory Multiprocessing Luis Ceze, University of Washington joint work with Owen Anderson, Tom Bergan, Joe Devietti, Brandon Lucia, Karin Strauss, Dan Grossman, Mark Oskin. Safe MultiProcessing

More information

Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay

Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay Pablo Montesinos Matthew Hicks Samuel T. King Josep Torrellas University of Illinois at Urbana-Champaign {pmontesi,

More information

ECE 505 Computer Architecture

ECE 505 Computer Architecture ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems

More information

Predictable Programming on a Precision Timed Architecture

Predictable Programming on a Precision Timed Architecture Predictable Programming on a Precision Timed Architecture Ben Lickly, Isaac Liu, Hiren Patel, Edward Lee, University of California, Berkeley Sungjun Kim, Stephen Edwards, Columbia University, New York

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

An Empirical Study of High Availability in Stream Processing Systems

An Empirical Study of High Availability in Stream Processing Systems An Empirical Study of High Availability in Stream Processing Systems Yu Gu, Zhe Zhang, Fan Ye, Hao Yang, Minkyong Kim, Hui Lei, Zhen Liu Stream Processing Model software operators (PEs) Ω Unexpected machine

More information

OS Extensibility: SPIN and Exokernels. Robert Grimm New York University

OS Extensibility: SPIN and Exokernels. Robert Grimm New York University OS Extensibility: SPIN and Exokernels Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? OS Abstraction Barrier

More information

Lessons Learned During the Development of the CapoOne Deterministic Multiprocessor Replay System

Lessons Learned During the Development of the CapoOne Deterministic Multiprocessor Replay System Lessons Learned During the Development of the CapoOne Deterministic Multiprocessor System Pablo Montesinos, Matthew Hicks, Wonsun Ahn, Samuel T. King and Josep Torrellas Department of Computer Science

More information

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Secure Virtual Architecture. John Criswell University of Illinois at Urbana- Champaign

Secure Virtual Architecture. John Criswell University of Illinois at Urbana- Champaign Secure Virtual Architecture John Criswell University of Illinois at Urbana- Champaign Secure Virtual Machine Software LLVA VM Hardware Virtual ISA Native ISA What is it? A compiler-based virtual machine

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

ShortCut: Architectural Support for Fast Object Access in Scripting Languages

ShortCut: Architectural Support for Fast Object Access in Scripting Languages Jiho Choi, Thomas Shull, Maria J. Garzaran, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu ISCA 2017 Overheads of Scripting Languages

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information