EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION
|
|
- Primrose Cummings
- 5 years ago
- Views:
Transcription
1 EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016
2 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit commutativity incur high runtime overheads (STM is 2-6x slower than HTM) Prior hardware exploits only single-instruction commutative operations (e.g., addition) CommTM exploits multi-instruction commutativity Extends coherence protocol to perform commutative operations locally and concurrently Leverages HTM to support multi-instruction updates Benefits speculative execution by reducing conflicts Accelerates full applications by up to 3.4x at 128 cores
3 Commutativity Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Single-instruction commutativity ADD MIN OR Coup [Zhang et al, MICRO 2015] Multi-instruction commutativity Ordered put Set insertion Top-K insertion 3
4 Commutativity 4 Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Multi-instruction example: set (linked-list) insertion head null insert(a); insert(b); head b a null insert(b); insert(a); head a b null Different but semantically equivalent states
5 Commutativity Commutative operations produce equivalent results when reordered No true data dependence No need for communication Software exploits commutativity but incurs high run-time overheads Single-instruction commutativity ADD MIN OR Coup [Zhang et al, MICRO 2015] Multi-instruction commutativity Ordered put Set insertion Top-K insertion CommTM 5
6 Example: addition in conventional HTM 6 read A: 20 read add(a, 1): Txn 0 load A add(a, 1): Txn 1 load A void add (int* counter, int delta) { int v = load(counter); store(counter, nv);
7 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A store A A: 20 read write read Conflict! load A add(a, 1): Txn 1 void add (int* counter, int delta) { int v = load(counter); store(counter, nv);
8 Example: addition in conventional HTM 6 read write add(a, 1): Txn 0 load A store A A: 20 load A add(a, 1): Txn 1 void add (int* counter, int delta) { int v = load(counter); store(counter, nv);
9 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A add(a, 1): Txn 2 restart store A load A A: 24 load A load A restart store A load A store A add(a, 1): Txn 1 load A add(a, 1): Txn 3 store A load A restart void add (int* counter, int delta) { int v = load(counter); store(counter, nv);
10 Example: addition in conventional HTM 6 add(a, 1): Txn 0 load A add(a, 1): Txn 2 restart store A load A A: 24 load A load A restart store A load A store A add(a, 1): Txn 1 load A add(a, 1): Txn 3 store A load A restart void add (int* counter, int delta) { int v = load(counter); store(counter, nv); Traffic Serialization Wasted transactional work
11 Example: addition in CommTM 7 A: 20 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
12 Example: addition in CommTM 7 ADD A: 20 ADD A: 0 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
13 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 20 ADD A: 0 read read load[add] A load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
14 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 20 read write load[add] A ADD A: 0 read write load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
15 Example: addition in CommTM 7 ADD A: 21 add(a, 1): Txn0 load[add] A ADD A: 1 load[add] A add(a, 1): Txn1 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
16 Example: addition in CommTM 7 ADD A: 22 add(a, 1): Txn0 load[add] A add(a, 1): Txn2 load[add] A ADD A: 2 load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
17 Example: addition in CommTM 7 ADD add(a, 1): Txn0 A: 22 load[add] A add(a, 1): Txn2 load[add] A ADD A: 2 load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A
18 Example: addition in CommTM 7 add(a, 1): Txn0 A: 24 reduction load[add] A add(a, 1): Txn2 load[add] A load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A User-defined reduction
19 Example: addition in CommTM 7 add(a, 1): Txn0 A: 24 reduction load[add] A add(a, 1): Txn2 load[add] A load[add] A load[add] A add(a, 1): Txn1 add(a, 1): Txn3 void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); load A User-defined reduction Less traffic Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM
20 CommTM
21 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load(counter); store(counter, nv);
22 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores
23 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores Non-transactional reduction handler void reduce[add] (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv);
24 Programming interface 9 Transactional update void add (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Non-transactional reduction handler void reduce[add] (int* counter, int delta) { int v = load[add](counter); store[add](counter, nv); Labeled loads/stores counter delta reduce[add] counter 36
25 Handling arbitrary object sizes 10 For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[add] (int* counterline, int[] deltas) { for (int i = 0; I < intspercacheline; i++) { int v = load[add](counterline[i]); int nv = v + deltas[i]; store[add](counterline[i], nv); counterline deltas reduce[add] counterline For objects larger than a cache line, use a level of indirection
26 Example: set (linked-list) insertion void insert (SetDesc* s, Node* n) { Node* head = load[insert](s); n->next = head; store[insert](s, n); if (head == nullptr) store[insert](s+sizeof(node*), n); Struct SetDesc { Node* head; Node* tail; ; Struct Node { Node* next; ; 11 void reduce[insert] (SetDesc* s0, SetDesc* s1) { if (s1->head == nullptr) return; Node* head0 = load[insert](s0); if (head0 == nullptr) { store[insert](s0, s1->head); else { Node* tail0 = load[insert](s0+sizeof(node*)); tail0->next = s1->head; store[insert](s0+sizeof(node*), s1->tail);
27 Implementation 12 Baseline HTM MESI coherence protocol Eager conflict detection Timestamp-based conflict resolution Lazy version management (buffer speculative data in L1s) Shared L3 / directory L1I L2 L1D L1I L2 L1D Core Core CommTM can be applied to other HTMs and hardware speculation techniques
28 Coherence protocol 13 M MSI M CommTM-MSI W R W R L W, R W R S W W W, R S R W, L L W, R L U label W I I Legend Initiated by own core (gain permissions) Transitions Initiated by others (lose permissions) States Modified Shared Invalid User-defined (read-only) reducible Requests Read Write Labeled load/store
29 Reducible-state transitions 14 Entering U state triggered by labeled load/store Leaving U state with nontransactional reductions load[add](a) L2 L1 Shared cache A: 0 A: 0 L2 L1 A: 20 Hardware shadow thread/interrupt Cannot access other reducible data load(a) L2 L1 Shared cache A: -- 2 A: 24 2 L2 L1 A: 22 Core A: 241 Core 2 add_reduce Legend Modified A: 20 User-defined reducible A: 3
30 Transactional execution 15 Speculative value management for U state is analogous to M state L2 L1 A: 3 A: A: Core tx_begin() ld[add](a) st[add](a) L2 A: 3 L1 A: 4 11 TX (ts:0) Core tx_end() L2 L1 A: 3 A: 4 00 Core tx_begin() ld[add](a) st[add](a) L2 A: 4 L1 A: 5 11 TX (ts:1) Core speculatively-read speculatively-written Invalidation to speculatively accessed data in U state triggers a conflict
31 Gather requests allows more concurrency 16 Motivation Conditional commutativity: Operations commute only when reducible data meets some conditions Frequent reductions triggered by condition checks limit concurrency Gather requests allow partial updates to move across caches without leaving the reducible state Achieves higher concurrency (e.g., for reference counting)
32 Evaluation
33 Evaluation on microbenchmarks 17 Baseline CommTM counter set insertion ordered put top-k insertion Speedup Threads Threads Threads Threads Up to 128x speedup over baseline TM
34 Evaluation on full applications 18 Speedup Baseline CommTM boruvka kmeans ssca2 genome vacation x Threads Threads Threads Speedup Breakdown of core cycles at 128 threads (lower is better) Threads Threads Non-transactional Transactional, ted Transactional, aborted Normalized core cycles Normalized core cycles Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM 0.0 Baseline CommTM
35 Conclusions 19 Leverages HTM to support multi-instruction operations Extends coherence protocol to allow local and concurrent updates Bridges the gap between software and hardware speculation Reduces conflicts and serialized transactions significantly Accelerates challenging workloads by up to 3.4x at 128 cores
36 THANKS FOR YOUR ATTENTION! QUESTIONS ARE WELCOME!
Exploiting Semantic Commutativity in Hardware Speculation
Appears in the Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 16 Exploiting Semantic Commutativity in Hardware Speculation Guowei Zhang Virginia Chiu Daniel
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationEigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics
EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics Pervasive Parallelism Laboratory, Stanford University Sungpack Hong Tayo Oguntebi Jared Casper Nathan Bronson Christos Kozyrakis
More informationPotential violations of Serializability: Example 1
CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same
More informationData-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016
Data-entric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANHEZ MIRO 26 Executive summary Many-cores must exploit cache locality to
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationPerformance Improvement via Always-Abort HTM
1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing
More informationGoldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea
Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded
More informationPerformance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich
Performance Evaluation of Adaptivity in STM Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich Motivation STM systems rely on many assumptions Often contradicting for different
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationImplementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University
Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Introduction // Parallelize the outer loop for(i=0;i
More informationHTM in the wild. Konrad Lai June 2015
HTM in the wild Konrad Lai June 2015 Industrial Considerations for HTM Provide a clear benefit to customers Improve performance & scalability Ease programmability going forward Improve something common
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More informationA Scalable Architecture for Ordered Parallelism
Scalable rchitecture for Ordered Parallelism Mark Jeffrey, Suvinay Subramanian, Cong Yan, Joel mer, aniel Sanchez MICRO 2015 Multicores Target asy Parallelism 2 Multicores Target asy Parallelism 2 Regular:
More informationLecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations
Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,
More informationScheduling Transactions in Replicated Distributed Transactional Memory
Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly
More informationCache Coherence and Atomic Operations in Hardware
Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some
More informationProactive Transaction Scheduling for Contention Management
Proactive Transaction for Contention Management Geoffrey Blake University of Michigan Ann Arbor, MI blakeg@umich.edu Ronald G. Dreslinski University of Michigan Ann Arbor, MI rdreslin@umich.edu Trevor
More informationEazyHTM: Eager-Lazy Hardware Transactional Memory
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,
More informationFlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science
FlexTM Flexible Decoupled Transactional Memory Support Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science 1 Transactions: Our Goal Lazy Txs (i.e., optimistic conflict
More informationCost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)
Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today
More informationMcRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime
McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs
More informationLogTM: Log-Based Transactional Memory
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet
More informationLogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory
LogSI-HTM: Log Based Snapshot Isolation in Hardware Transactional Memory Lois Orosa and Rodolfo zevedo Institute of Computing, University of Campinas (UNICMP) {lois.orosa,rodolfo}@ic.unicamp.br bstract
More informationLecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation
Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end
More informationRelaxing Concurrency Control in Transactional Memory. Utku Aydonat
Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationRemote Transaction Commit: Centralizing Software Transactional Memory Commits
IEEE TRANSACTIONS ON COMPUTERS 1 Remote Transaction Commit: Centralizing Software Transactional Memory Commits Ahmed Hassan, Roberto Palmieri, and Binoy Ravindran Abstract Software Transactional Memory
More informationAN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM
FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM SUVINAY SUBRAMANIAN, MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ ISCA 017 Current
More informationReview on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala
Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors By: Anvesh Polepalli Raj Muchhala Introduction Integrating CPU and GPU into a single chip for performance
More informationFall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More informationTransactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems
Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional
More informationAn Effective Hybrid Transactional Memory System with Strong Isolation Guarantees
An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle
More informationScalable Software Transactional Memory for Chapel High-Productivity Language
Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL
More informationWork Report: Lessons learned on RTM
Work Report: Lessons learned on RTM Sylvain Genevès IPADS September 5, 2013 Sylvain Genevès Transactionnal Memory in commodity hardware 1 / 25 Topic Context Intel launches Restricted Transactional Memory
More informationSummary: Open Questions:
Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization
More informationPerformance Improvement via Always-Abort HTM
1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing
More informationLecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM
Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion
More informationLecture 12 Transactional Memory
CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited
More informationRethinking the Memory Hierarchy for Modern Languages. Po-An Tsai, Yee Ling Gan, and Daniel Sanchez
Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai, Yee Ling Gan, and Daniel Sanchez Memory systems expose an inexpressive interface 2 Memory systems expose an inexpressive interface Flat
More informationAgenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?
Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationByteSTM: Java Software Transactional Memory at the Virtual Machine Level
ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mohamed Mohamedin Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationSpeculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois
Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow
More informationEvaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications
Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications Fernando Rui, Márcio Castro, Dalvan Griebler, Luiz Gustavo Fernandes Email: fernando.rui@acad.pucrs.br,
More information6 Transactional Memory. Robert Mullins
6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2
More informationSoftware-Controlled Multithreading Using Informing Memory Operations
Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University
More informationDependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin
Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel
More informationDESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM
DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT
More informationLecture: Consistency Models, TM
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency
More informationTransactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28
Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and
More informationMutex Locking versus Hardware Transactional Memory: An Experimental Evaluation
Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing
More informationPerformance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing
Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Richard Yoo, Christopher Hughes: Intel Labs Konrad Lai, Ravi Rajwar: Intel Architecture Group Agenda
More informationManaging Resource Limitation of Best-Effort HTM
Managing Resource Limitation of Best-Effort HTM Mohamed Mohamedin, Roberto Palmieri, Ahmed Hassan, Binoy Ravindran Abstract The first release of hardware transactional memory (HTM) as commodity processor
More informationTransactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41
Transactional Memory Yaohua Li and Siming Chen Yaohua Li and Siming Chen Transactional Memory 1 / 41 Background Processor hits physical limit on transistor density Cannot simply put more transistors to
More informationLog-Based Transactional Memory
Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing
More informationReduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev MIT amatveev@csail.mit.edu Nir Shavit MIT shanir@csail.mit.edu Abstract Because of hardware TM limitations, software
More informationA Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationExploiting object structure in hardware transactional memory
Comput Syst Sci & Eng (2009) 5: 303 35 2009 CRL Publishing Ltd International Journal of Computer Systems Science & Engineering Exploiting object structure in hardware transactional memory Behram Khan,
More informationHardware Transactional Memory on Haswell
Hardware Transactional Memory on Haswell Viktor Leis Technische Universität München 1 / 15 Introduction transactional memory is a very elegant programming model transaction { transaction { a = a 10; c
More informationLow Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"
Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden" Banks" Payment Processing" Airline Reservations" E-Commerce" Web 2.0" Problem:" Millions
More informationJIGSAW: SCALABLE SOFTWARE-DEFINED CACHES
JIGSAW: SCALABLE SOFTWARE-DEFINED CACHES NATHAN BECKMANN AND DANIEL SANCHEZ MIT CSAIL PACT 13 - EDINBURGH, SCOTLAND SEP 11, 2013 Summary NUCA is giving us more capacity, but further away 40 Applications
More informationSELF-TUNING HTM. Paolo Romano
SELF-TUNING HTM Paolo Romano 2 Based on ICAC 14 paper N. Diegues and Paolo Romano Self-Tuning Intel Transactional Synchronization Extensions 11 th USENIX International Conference on Autonomic Computing
More informationTransactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis
More informationLecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks
Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock
More informationTIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT
TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT FOR FIFTY YEARS, WE HAVE RIDDEN MOORE S LAW Moore s Law and the scaling of clock frequency = printing press for the currency
More informationRETCON: Transactional Repair Without Replay
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 11-2-2009 RETCON: Transactional Repair Without Replay Colin lundell University of Pennsylvania,
More informationWhat Is STM and How Well It Performs? Aleksandar Dragojević
What Is STM and How Well It Performs? Aleksandar Dragojević 1 Hardware Trends 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU 2 Hardware Trends CPU CPU 2 Hardware
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationSpeculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding
More informationRETCON: Transactional Repair without Replay
RETCON: Transactional Repair without Replay Colin lundell University of Pennsylvania blundell@cis.upenn.edu run Raghavan University of Pennsylvania arraghav@cis.upenn.edu Milo M. K. Martin University of
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationImprovements in Hardware Transactional Memory for GPU Architectures
Improvements in Hardware Transactional Memory for GPU Architectures Alejandro Villegas, Angeles Navarro, Rafael Asenjo, and Oscar Plata Dept. Computer Architecture University of Malaga, Andalucia Tech,
More informationTransactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017
Lecture 18: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Credit: many slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford
More informationLecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising
More informationOptimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations
Optimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations Torvald Riegel Technische Universität Dresden, Germany torvald.riegel@tudresden.de Patrick Marlier Université de Neuchâtel,
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationRethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University
Rethinking Serializable Multi-version Concurrency Control Jose Faleiro and Daniel Abadi Yale University Theory: Single- vs Multi-version Systems Single-version system T r Read X X 0 T w Write X Multi-version
More informationComprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 2001 >>> SOLUTIONS <<<
Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 001 >>> SOLUTIONS
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationOS Support for Virtualizing Hardware Transactional Memory
OS Support for Virtualizing Hardware Transactional Memory Michael M. Swift, Haris Volos, Luke Yen, Neelam Goyal, Mark D. Hill and David A. Wood University of Wisconsin Madison The Virtualization Problem
More informationDependence-Aware Transactional Memory for Increased Concurrency
Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, and Emmett Witchel Department of Computer Sciences University of Texas at Austin Austin, TX, USA
More informationLecture 25: Synchronization. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 25: Synchronization James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L25 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping be introduced to synchronization
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationNikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris
Early Experiences on Accelerating Dijkstra s Algorithm Using Transactional Memory Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Computing Systems Laboratory School of Electrical
More informationParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices
ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Devices Jiacheng Zhang, Jiwu Shu, Youyou Lu Tsinghua University 1 Outline Background and Motivation ParaFS Design Evaluation
More informationTradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford
More informationStretching Transactional Memory
Stretching Transactional Memory Aleksandar Dragojević Rachid Guerraoui Michał Kapałka Ecole Polytechnique Fédérale de Lausanne, School of Computer and Communication Sciences, I&C, Switzerland {aleksandar.dragojevic,
More informationCan Mainstream Processors Support Hardware Transactional Memory?
Can Mainstream Processors Support Hardware Transactional Memory? Brown University CS IPP Symposium April 30, 2009 Dave Christie AMD Research & Advanced Development Lab Agenda The Disconnect Implementation
More informationTransactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory
Transactional Prefetching: Narrowing the Window of Contention in Hardware Transactional Memory Adrià Armejach Anurag Negi Adrián Cristal Osman S. Unsal Per Stenstrom Barcelona Supercomputing Center Universitat
More informationAnalyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009
Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Agenda Introduction Memory Hierarchy Design CPU Speed vs.
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationReducing Memory Buffering Overhead in Software Thread-Level Speculation
Reducing Memory Buffering Overhead in Software Thread-Level Speculation Zhen Cao Clark Verbrugge School of Computer Science, McGill University, Montréal, Québec, Canada HA E9 zhen.cao@mail.mcgill.ca, clump@cs.mcgill.ca
More informationSpeculative Locks. Dept. of Computer Science
Speculative Locks José éf. Martínez and djosep Torrellas Dept. of Computer Science University it of Illinois i at Urbana-Champaign Motivation Lock granularity a trade-off: Fine grain greater concurrency
More informationLogical Leases: Scalable Hardware and Software Systems through Time Traveling. Xiangyao Yu
Logical Leases: Scalable Hardware and Software Systems through Time Traveling by Xiangyao Yu B.S. in Electronic Engineering, Tsinghua University (2012) S.M. in Electrical Engineering and Computer Science,
More informationTransparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh
Transparent Offloading and Mapping () Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar,
More informationABORTING CONFLICTING TRANSACTIONS IN AN STM
Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS
More information