EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION

Size: px
Start display at page:

Download "EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION"

Transcription

1 EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016

2 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit commutativity incur high runtime overheads (STM is 2-6x slower than HTM) Prior hardware exploits only single-instruction commutative operations (e.g., addition) CommTM exploits multi-instruction commutativity Extends coherence protocol to perform commutative operations locally and concurrently Leverages HTM to support multi-instruction updates Benefits speculative execution by reducing conflicts Accelerates full applications by up to 3.4x at 128 cores

3 Commutativity Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Single-instruction commutativity ADD MIN OR Coup [Zhang et al, MICRO 2015] Multi-instruction commutativity Ordered put Set insertion Top-K insertion 3

4 Commutativity 4 Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Multi-instruction example: set (linked-list) insertion head null insert(a); insert(b); head b a null insert(b); insert(a); head a b null Different but semantically equivalent states

5 Commutativity Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Single-instruction commutativity ADD MIN OR Coup [Zhang et al, MICRO 2015] Multi-instruction commutativity Ordered put Set insertion Top-K insertion CommTM 5

6 Example: addition in conventional HTM 6 read A: 20 read add(a, 1): Txn 0 load A add(a, 1): Txn 1 load A void add (int* counter, int delta) { int v = load(counter); store(counter, nv);

7 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A store A A: 20 read write read Conflict! load A add(a, 1): Txn 1 void add (int* counter, int delta) { int v = load(counter); store(counter, nv);

8 Example: addition in conventional HTM 6 read write add(a, 1): Txn 0 load A store A A: 20 load A add(a, 1): Txn 1 void add (int* counter, int delta) { int v = load(counter); store(counter, nv);

9 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A add(a, 1): Txn 2 restart store A load A A: 24 load A load A restart store A load A store A add(a, 1): Txn 1 load A add(a, 1): Txn 3 store A load A restart void add (int* counter, int delta) { int v = load(counter); store(counter, nv);

10 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A add(a, 1): Txn 2 restart store A load A A: 24 load A load A restart store A load A store A add(a, 1): Txn 1 load A add(a, 1): Txn 3 store A load A restart void add (int* counter, int delta) { int v = load(counter); store(counter, nv); Traffic Serialization Wasted transactional work

11 Example: addition in CommTM 7 A: 20 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

12 Example: addition in CommTM 7 ADD A: 20 ADD A: 0 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

13 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 20 ADD A: 0 read read load[add] A load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

14 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 20 read write load[add] A ADD A: 0 read write load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

15 Example: addition in CommTM 7 ADD A: 21 add(a, 1): Txn0 load[add] A ADD A: 1 load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

16 Example: addition in CommTM 7 ADD A: 22 add(a, 1): Txn0 load[add] A add(a, 1): Txn2 load[add] A ADD A: 2 load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

17 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 22 load[add] A add(a, 1): Txn2 load[add] A ADD A: 2 load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A

18 Example: addition in CommTM 7 add(a, 1): Txn0 A: 24 reduction load[add] A add(a, 1): Txn2 load[add] A load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A User-defined reduction

19 Example: addition in CommTM 7 add(a, 1): Txn0 A: 24 reduction load[add] A add(a, 1): Txn2 load[add] A load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A User-defined reduction Less traffic Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM

20 CommTM

21 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load(counter); store(counter, nv);

22 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores

23 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores Non-transactional reduction handler void reduce[add] (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);

24 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Non-transactional reduction handler void reduce[add] (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores counter delta reduce[add] counter 36

25 Handling arbitrary object sizes 10 For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[add] (int* counterline, int[] deltas) { for (int i = 0; I < intspercacheline; i++) { int v = load[add](counterline[i]); int nv = v + deltas[i]; store[add](counterline[i], nv); counterline deltas reduce[add] counterline For objects larger than a cache line, use a level of indirection

26 Example: set (linked-list) insertion void insert (SetDesc* s, Node* n) { Node* head = load[insert](s); n->next = head; store[insert](s, n); if (head == nullptr) store[insert](s+sizeof(node*), n); Struct SetDesc { Node* head; Node* tail; ; Struct Node { Node* next; ; 11 void reduce[insert] (SetDesc* s0, SetDesc* s1) { if (s1->head == nullptr) return; Node* head0 = load[insert](s0); if (head0 == nullptr) { store[insert](s0, s1->head); else { Node* tail0 = load[insert](s0+sizeof(node*)); tail0->next = s1->head; store[insert](s0+sizeof(node*), s1->tail);

27 Implementation 12 Baseline HTM MESI coherence protocol Eager conflict detection Timestamp-based conflict resolution Lazy version management (buffer speculative data in L1s) Shared L3 / directory L1I L2 L1D L1I L2 L1D Core Core CommTM can be applied to other HTMs and hardware speculation techniques

28 Coherence protocol 13 M MSI M CommTM-MSI W R W R L W, R W R S W W W, R S R W, L L W, R L U label W I I Legend Initiated by own core (gain permissions) Transitions Initiated by others (lose permissions) States Modified Shared Invalid User-defined (read-only) reducible Requests Read Write Labeled load/store

29 Reducible-state transitions 14 Entering U state triggered by labeled load/store Leaving U state with nontransactional reductions load[add](a) L2 L1 Shared cache A: 0 A: 0 L2 L1 A: 20 Hardware shadow thread/interrupt Cannot access other reducible data load(a) L2 L1 Shared cache A: -- 2 A: 24 2 L2 L1 A: 22 Core A: 241 Core 2 add_reduce Legend Modified A: 20 User-defined reducible A: 3

30 Transactional execution 15 Speculative value management for U state is analogous to M state L2 L1 A: 3 A: A: Core tx_begin() ld[add](a) st[add](a) L2 A: 3 L1 A: 4 11 TX (ts:0) Core tx_end() L2 L1 A: 3 A: 4 00 Core tx_begin() ld[add](a) st[add](a) L2 A: 4 L1 A: 5 11 TX (ts:1) Core speculatively-read speculatively-written Invalidation to speculatively accessed data in U state triggers a conflict

31 Gather requests allows more concurrency 16 Motivation Conditional commutativity: Operations commute only when reducible data meets some conditions Frequent reductions triggered by condition checks limit concurrency Gather requests allow partial updates to move across caches without leaving the reducible state Achieves higher concurrency (e.g., for reference counting)

32 Evaluation

33 Evaluation on microbenchmarks 17 Baseline CommTM counter set insertion ordered put top-k insertion Speedup Threads Threads Threads Threads Up to 128x speedup over baseline TM

34 Evaluation on full applications 18 Speedup Baseline CommTM boruvka kmeans ssca2 genome vacation x Threads Threads Threads Speedup Breakdown of core cycles at 128 threads (lower is better) Threads Threads Non-transactional Transactional, ted Transactional, aborted Normalized core cycles Normalized core cycles Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM

35 Conclusions 19 Leverages HTM to support multi-instruction operations Extends coherence protocol to allow local and concurrent updates Bridges the gap between software and hardware speculation Reduces conflicts and serialized transactions significantly Accelerates challenging workloads by up to 3.4x at 128 cores

36 THANKS FOR YOUR ATTENTION! QUESTIONS ARE WELCOME!

Exploiting Semantic Commutativity in Hardware Speculation

Exploiting Semantic Commutativity in Hardware Speculation Appears in the Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 16 Exploiting Semantic Commutativity in Hardware Speculation Guowei Zhang Virginia Chiu Daniel

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics Pervasive Parallelism Laboratory, Stanford University Sungpack Hong Tayo Oguntebi Jared Casper Nathan Bronson Christos Kozyrakis

More information

Potential violations of Serializability: Example 1

Potential violations of Serializability: Example 1 CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same

More information

Data-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016

Data-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016 Data-entric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANHEZ MIRO 26 Executive summary Many-cores must exploit cache locality to

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

Performance Improvement via Always-Abort HTM

Performance Improvement via Always-Abort HTM 1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing

More information

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded

More information

Performance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich

Performance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich Performance Evaluation of Adaptivity in STM Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich Motivation STM systems rely on many assumptions Often contradicting for different

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Introduction // Parallelize the outer loop for(i=0;i

More information

HTM in the wild. Konrad Lai June 2015

HTM in the wild. Konrad Lai June 2015 HTM in the wild Konrad Lai June 2015 Industrial Considerations for HTM Provide a clear benefit to customers Improve performance & scalability Ease programmability going forward Improve something common

More information

DBT Tool. DBT Framework

DBT Tool. DBT Framework Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu

More information

A Scalable Architecture for Ordered Parallelism

A Scalable Architecture for Ordered Parallelism Scalable rchitecture for Ordered Parallelism Mark Jeffrey, Suvinay Subramanian, Cong Yan, Joel mer, aniel Sanchez MICRO 2015 Multicores Target asy Parallelism 2 Multicores Target asy Parallelism 2 Regular:

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Scheduling Transactions in Replicated Distributed Transactional Memory

Scheduling Transactions in Replicated Distributed Transactional Memory Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly

More information

Cache Coherence and Atomic Operations in Hardware

Cache Coherence and Atomic Operations in Hardware Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some

More information

Proactive Transaction Scheduling for Contention Management

Proactive Transaction Scheduling for Contention Management Proactive Transaction for Contention Management Geoffrey Blake University of Michigan Ann Arbor, MI blakeg@umich.edu Ronald G. Dreslinski University of Michigan Ann Arbor, MI rdreslin@umich.edu Trevor

More information

EazyHTM: Eager-Lazy Hardware Transactional Memory

EazyHTM: Eager-Lazy Hardware Transactional Memory EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,

More information

FlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science

FlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science FlexTM Flexible Decoupled Transactional Memory Support Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science 1 Transactions: Our Goal Lazy Txs (i.e., optimistic conflict

More information

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

LogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory

LogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory LogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory Lois Orosa and Rodolfo zevedo Institute of Computing, University of Campinas (UNICMP) {lois.orosa,rodolfo}@ic.unicamp.br bstract

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Remote Transaction Commit: Centralizing Software Transactional Memory Commits

Remote Transaction Commit: Centralizing Software Transactional Memory Commits IEEE TRANSACTIONS ON COMPUTERS 1 Remote Transaction Commit: Centralizing Software Transactional Memory Commits Ahmed Hassan, Roberto Palmieri, and Binoy Ravindran Abstract Software Transactional Memory

More information

AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM SUVINAY SUBRAMANIAN, MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ ISCA 017 Current

More information

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors By: Anvesh Polepalli Raj Muchhala Introduction Integrating CPU and GPU into a single chip for performance

More information

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional

More information

An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees

An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle

More information

Scalable Software Transactional Memory for Chapel High-Productivity Language

Scalable Software Transactional Memory for Chapel High-Productivity Language Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL

More information

Work Report: Lessons learned on RTM

Work Report: Lessons learned on RTM Work Report: Lessons learned on RTM Sylvain Genevès IPADS September 5, 2013 Sylvain Genevès Transactionnal Memory in commodity hardware 1 / 25 Topic Context Intel launches Restricted Transactional Memory

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Performance Improvement via Always-Abort HTM

Performance Improvement via Always-Abort HTM 1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing

More information

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion

More information

Lecture 12 Transactional Memory

Lecture 12 Transactional Memory CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited

More information

Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez

Rethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai, Yee Ling Gan, and Daniel Sanchez Memory systems expose an inexpressive interface 2 Memory systems expose an inexpressive interface Flat

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in

More information

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mohamed Mohamedin Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow

More information

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Fernando Rui, Márcio Castro, Dalvan Griebler, Luiz Gustavo Fernandes Email: fernando.rui@acad.pucrs.br,

More information

6 Transactional Memory. Robert Mullins

6 Transactional Memory. Robert Mullins 6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel

More information

DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM

DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing

More information

Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing

Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Richard Yoo, Christopher Hughes: Intel Labs Konrad Lai, Ravi Rajwar: Intel Architecture Group Agenda

More information

Managing Resource Limitation of Best-Effort HTM

Managing Resource Limitation of Best-Effort HTM Managing Resource Limitation of Best-Effort HTM Mohamed Mohamedin, Roberto Palmieri, Ahmed Hassan, Binoy Ravindran Abstract The first release of hardware transactional memory (HTM) as commodity processor

More information

Transactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41

Transactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41 Transactional Memory Yaohua Li and Siming Chen Yaohua Li and Siming Chen Transactional Memory 1 / 41 Background Processor hits physical limit on transistor density Cannot simply put more transistors to

More information

Log-Based Transactional Memory

Log-Based Transactional Memory Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing

More information

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev MIT amatveev@csail.mit.edu Nir Shavit MIT shanir@csail.mit.edu Abstract Because of hardware TM limitations, software

More information

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Exploiting object structure in hardware transactional memory

Exploiting object structure in hardware transactional memory Comput Syst Sci & Eng (2009) 5: 303 35 2009 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering Exploiting object structure in hardware transactional memory Behram Khan,

More information

Hardware Transactional Memory on Haswell

Hardware Transactional Memory on Haswell Hardware Transactional Memory on Haswell Viktor Leis Technische Universität München 1 / 15 Introduction transactional memory is a very elegant programming model transaction { transaction { a = a 10; c

More information

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden" Banks" Payment Processing" Airline Reservations" E-Commerce" Web 2.0" Problem:" Millions

More information

JIGSAW: SCALABLE SOFTWARE-DEFINED CACHES

JIGSAW: SCALABLE SOFTWARE-DEFINED CACHES JIGSAW: SCALABLE SOFTWARE-DEFINED CACHES NATHAN BECKMANN AND DANIEL SANCHEZ MIT CSAIL PACT 13 - EDINBURGH, SCOTLAND SEP 11, 2013 Summary NUCA is giving us more capacity, but further away 40 Applications

More information

SELF-TUNING HTM. Paolo Romano

SELF-TUNING HTM. Paolo Romano SELF-TUNING HTM Paolo Romano 2 Based on ICAC 14 paper N. Diegues and Paolo Romano Self-Tuning Intel Transactional Synchronization Extensions 11 th USENIX International Conference on Autonomic Computing

More information

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT

TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT FOR FIFTY YEARS, WE HAVE RIDDEN MOORE S LAW Moore s Law and the scaling of clock frequency = printing press for the currency

More information

RETCON: Transactional Repair Without Replay

RETCON: Transactional Repair Without Replay University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 11-2-2009 RETCON: Transactional Repair Without Replay Colin lundell University of Pennsylvania,

More information

What Is STM and How Well It Performs? Aleksandar Dragojević

What Is STM and How Well It Performs? Aleksandar Dragojević What Is STM and How Well It Performs? Aleksandar Dragojević 1 Hardware Trends 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU CPU 2 Hardware

More information

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

RETCON: Transactional Repair without Replay

RETCON: Transactional Repair without Replay RETCON: Transactional Repair without Replay Colin lundell University of Pennsylvania blundell@cis.upenn.edu run Raghavan University of Pennsylvania arraghav@cis.upenn.edu Milo M. K. Martin University of

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Improvements in Hardware Transactional Memory for GPU Architectures

Improvements in Hardware Transactional Memory for GPU Architectures Improvements in Hardware Transactional Memory for GPU Architectures Alejandro Villegas, Angeles Navarro, Rafael Asenjo, and Oscar Plata Dept. Computer Architecture University of Malaga, Andalucia Tech,

More information

Transactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017

Transactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017 Lecture 18: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Credit: many slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford

More information

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising

More information

Optimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations

Optimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations Optimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations Torvald Riegel Technische Universität Dresden, Germany torvald.riegel@tudresden.de Patrick Marlier Université de Neuchâtel,

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Rethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University

Rethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University Rethinking Serializable Multi-version Concurrency Control Jose Faleiro and Daniel Abadi Yale University Theory: Single- vs Multi-version Systems Single-version system T r Read X X 0 T w Write X Multi-version

More information

Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 2001 >>> SOLUTIONS <<<

Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 2001 >>> SOLUTIONS <<< Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 001 >>> SOLUTIONS

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

OS Support for Virtualizing Hardware Transactional Memory

OS Support for Virtualizing Hardware Transactional Memory OS Support for Virtualizing Hardware Transactional Memory Michael M. Swift, Haris Volos, Luke Yen, Neelam Goyal, Mark D. Hill and David A. Wood University of Wisconsin Madison The Virtualization Problem

More information

Dependence-Aware Transactional Memory for Increased Concurrency

Dependence-Aware Transactional Memory for Increased Concurrency Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, and Emmett Witchel Department of Computer Sciences University of Texas at Austin Austin, TX, USA

More information

Lecture 25: Synchronization. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 25: Synchronization. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 25: Synchronization James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L25 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping be introduced to synchronization

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris

Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Early Experiences on Accelerating Dijkstra s Algorithm Using Transactional Memory Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Computing Systems Laboratory School of Electrical

More information

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Devices Jiacheng Zhang, Jiwu Shu, Youyou Lu Tsinghua University 1 Outline Background and Motivation ParaFS Design Evaluation

More information

Tradeoffs in Transactional Memory Virtualization

Tradeoffs in Transactional Memory Virtualization Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford

More information

Stretching Transactional Memory

Stretching Transactional Memory Stretching Transactional Memory Aleksandar Dragojević Rachid Guerraoui Michał Kapałka Ecole Polytechnique Fédérale de Lausanne, School of Computer and Communication Sciences, I&C, Switzerland {aleksandar.dragojevic,

More information

Can Mainstream Processors Support Hardware Transactional Memory?

Can Mainstream Processors Support Hardware Transactional Memory? Can Mainstream Processors Support Hardware Transactional Memory? Brown University CS IPP Symposium April 30, 2009 Dave Christie AMD Research & Advanced Development Lab Agenda The Disconnect Implementation

More information

Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory

Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory Adrià Armejach Anurag Negi Adrián Cristal Osman S. Unsal Per Stenstrom Barcelona Supercomputing Center Universitat

More information

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Agenda Introduction Memory Hierarchy Design CPU Speed vs.

More information

Module 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.

Module 10: Design of Shared Memory Multiprocessors Lecture 20: Performance of Coherence Protocols MOESI protocol. MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

Reducing Memory Buffering Overhead in Software Thread-Level Speculation

Reducing Memory Buffering Overhead in Software Thread-Level Speculation Reducing Memory Buffering Overhead in Software Thread-Level Speculation Zhen Cao Clark Verbrugge School of Computer Science, McGill University, Montréal, Québec, Canada HA E9 zhen.cao@mail.mcgill.ca, clump@cs.mcgill.ca

More information

Speculative Locks. Dept. of Computer Science

Speculative Locks. Dept. of Computer Science Speculative Locks José éf. Martínez and djosep Torrellas Dept. of Computer Science University it of Illinois i at Urbana-Champaign Motivation Lock granularity a trade-off: Fine grain greater concurrency

More information

Logical Leases: Scalable Hardware and Software Systems through Time Traveling. Xiangyao Yu

Logical Leases: Scalable Hardware and Software Systems through Time Traveling. Xiangyao Yu Logical Leases: Scalable Hardware and Software Systems through Time Traveling by Xiangyao Yu B.S. in Electronic Engineering, Tsinghua University (2012) S.M. in Electrical Engineering and Computer Science,

More information

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh

Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,

More information

ABORTING CONFLICTING TRANSACTIONS IN AN STM

ABORTING CONFLICTING TRANSACTIONS IN AN STM Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS

More information